Big Data Hadoop Developer Training Chennai

Module 1

Introduction to Big Data

Characteristics

Why, How and What s of Big data

Existing OLTP, ETL,DWH,OLAP

Module 2

Introduction to Hadoop Ecosystem

Architecture-HDFS

Map reduce (MRV1)

Hadoop v1 and v2 Hadoop Data fedaration

Module 3

Pre Requisite for Installation

VM Linux ubuntu/CentOS JDK,ssh,eclipse

Installation and configuration of Hadoop,HDFS,Daemons,YARN Daemons

High Availability

Automatic and manual failover

Writing Data to HDFS

Reading Data from DFS

Module 4

Replica placement Strategy

Failure Handling

Namenode

Datanode

Block-Safe mode

Rebalancing and load optimization

Trouble shooting and error rectification

Hadoop fs shell commands-Unix and Java-Basics

Module 5

Introduction to Mapreduce

Architecture of Map reduce

Execution Map reduce in YARN

App Master ,Resource Manager and Node manager-Inputformat and Key Value Pairs

Mapper

Reducer

Partitioner

Custom and Default

Shuffle and Sort

Combiner-Scheduler

App Master /manager

Container-Node manager

Module 6

Map reduce Hands on

word count program/ log analytics

Hadoop streaming in R and Python

Data processing Transformations

Map only jobs and uber jobs

Inverted index and searches

Module 7

MR Programs 2

Structured and Unstructured Data handling

Combiner

Partitioner

Single and multiple column

Inverted Index

XML -semi structure

Map side joins

Reduce side join

Module 8

Introduction to HIVE Datawarehouse

Installation

Configure metastore to mysql- Hive QL Commands

Module 9

Manipulation and anlytical function in hive

Managed table and external tables

Partitioning and Bucketing

Complex data types and Unstructured data

Advance HQL commands

UDF and UDAF

Integration with Hbase

SerDe / Regular Expression

Module 10

Introduction to PIG

Installation-Bags and collections

Commands and Scripts

Pig UDF

Module 11

Introduction to NOSQL

ACID /CAP/BASE

Key value pair

Map reduce

Column family

Hbase Documennt

MongoDB

Graph DB

Neo4j

Module 12

Introduction to HBASE and installation

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

The HBase API

HBase Configuration and Tuning

Module 13

Introduction to Sqoop and installation

Bulk loading

Hadoop Streaming

Module 14

Flume Architecture

Agent ,Source,sink channel

Ingest log file

Collecting data from twitter for Sentimental analysis

Module 15

Integrate With ETL-Talend open Data studio

BD

Module 16

Big data Analytics

Visualization Dimensional modelling Tableau

Module 17

Spark

Spark Shell Hands On Using HDFS

Create RDD from HDFS file

Creating new RDD-Transformations on RDD

Lineage Graph

Actions on RDD

RDD Concepts on Persist and Cache-Lazy evaluation of RDD

Hands on and core concepts of map() transformation

Hands on and core concepts of filter() transformation

Hands on and core concepts of flatMap() transformation Compare map and flatMap transformation Hands on and core concepts of reduce() action

Hands on and core concepts of fold() action-Hands on and core concepts of aggregate() action

Basics of Accumulator

Hands on and core concepts of collect() action

Hands on and core concepts of take() action

Apache Spark Execution Model

How Spark execute program

Concepts of RDD partitioning

RDD data shuffling and performance issue

Module 18

Spark SQL

Module 19

Spark submit and spark Application

20

KAFKA-Publisher /Subcrriber

Consumer and producer

Module 22

Cloudera manager and VM-HUE

Module 23

OOZIE-Workflow and Co-ordinator

Module 24

Introduction to Data science-Machine learning-Statistical Analysis-Sentiment Analysis-Cloudera-/Hortonworks/Greenplum

Module 25

Use Multinode cluster setup-High Availabilty-Hadoop data federation-Commissioning and-decommissioning-Automatic and manual failover-Zookeeper failover controller-Use cases, Case studies and Proof of Concept-Working on different Distributions

Module 26 (Optional)

Cloudera and Horton works Certification Questions Discussion

Back