Big Data Hadoop Developer Training Chennai

(Hadoop ,Spark , NoSQL , Cloud)

Module 1

Introduction to Big Data


Why, How and What s of Big data


Module 2

Introduction to Hadoop Ecosystem


Sharding , Distributed and Replication factor  (SDR)


Map reduce (MRV1) and Yarn

Hadoop v1 and v2

Hadoop Data fedaration

Module 3

Prerequisite for Installation

Single node , Pseudo distributed and Multinode cluster

Virtual machine using Linux ubuntu/CentOS

Installation of hadoop in cloud (Azure/AWS)

Installation of Java ,ssh,eclipse

Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons

High Availability (Active and Standby)

Automatic and manual failover

Hadoop Fs shell commands

Writing Data to HDFS

Reading Data from DFS

Module 4

Rack awareness policy and Replica placement Strategy

Failure Handling



Block-Safe mode

Rebalancing and load optimization

Trouble shooting and error rectification

Hadoop fs shell commands-Unix and Java-Basics

Assessment 1

Module 5

Introduction to Mapreduce

Architecture of Map reduce

Execution Map reduce in YARN

App Master ,Resource Manager and Node manager

Input format , Input split and Key Value Pairs

class and  methods of Mapreduce paradigm




Custom and Default partition

Shuffle and Sort


App Master /manager

Container-Node manager

Module 6

Map reduce Hands on

word count program/ log analytics

Hadoop streaming in R/Python

Data processing Transformations

Map only jobs and uber jobs

Inverted index and searches

Module 7

MR Programs 2

Structured and Unstructured Data handling

optimizing using Combiner


Single and multiple column

Inverted Index

XML -semi structure

Map side joins

Reduce side join

Module 8

Introduction to Hive Data warehouse

Installation hive and metastore database

Configure metastore to mysql

Hive QL Commands

Module 9

Manipulation and anlytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands


Integration with Hbase

SerDe / Regular Expression

File formats : Parquet , sequence file , RCF , ORC file

Assessment 2

Module 10

Introduction to PIG

Installation-Bags and collections

Commands and Scripts


Module 11

Introduction to NOSQL


Key value pair

Map reduce

Column family

Hbase Documennt


Graph DB


Module 12

Introduction to HBASE and installation

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

The HBase API

HBase Configuration and Tuning

Module 13

Ingest data from RDB

Introduction to Sqoop and installation

Import and export data from and to RDB

Bulk loading , Incremental load , Split by , Conditional query

Sqoop validation and jobs

Module 14

Ingest streaming data

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Assessment 3

Module 15

Integrate With ETL

Talend Big data edition – Components of big data

Module 16

Big data Analytics

Dimensional modelling

Data Visualization

Tableau – Hive and spark sql connectors

Module 17

Spark core and Components

Spark Shell

Create RDD from HDFS /Local

Creating new RDD-Transformations on RDD

Lineage Graph – DAG

Actions on RDD

RDD Concepts on Persist and Cache-Lazy evaluation of RDD

Hands on and core concepts of map() transformation

Hands on and core concepts of filter() transformation

Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action

Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action

Basics of Accumulator

Hands on and core concepts of collect() action

Hands on and core concepts of take() action

Apache Spark Execution Model

How Spark execute program

Concepts of RDD partitioning

RDD data shuffling and performance issue

Module 18

Data frames and dataset

Spark SQL


Module 19

Spark jobs

Build scala program using SBT /Maven

Spark submit and spark Application

Module 20

KAFKA-Publisher /Subcrriber

Consumer and producer

Module 21


Monitoring and scheduling

Module 22

OOZIE-Workflow and Co-ordinator

Module 23

Distribution Installation or Sandbox

Cloudera -cloudera manager

Horton works -ambari server

MapR – MCS

Module 24

Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis

Module 25

Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller

Module 26

Use cases, Case studies and Proof of Concept-Working on different Distributions

Module 27 (Certification guidance)

CCA Spark and Hadoop Developer Exam (CCA175)

CCP Data Engineer (DE575)