Big Data Analytics Training Chennai

Module 1

Introduction to Big Data

Existing OLTP,ETL,DWH,OLAP

Analysis and Analytics

Distributed systems

Module 2

Introduction to Hadoop V1 Architecture

Daemons

Read and write in DFS

Rackawareness policy

Safemode

Rebalancing and optimization

Failure handling

Module 3

Single node cluster

Pseudo Distributed

Multinode cluster

Pre Requisite for Installation

Virtual machine

Vmware

Ubuntu/CentOS

Installation and configuration of Hadoop V-

HDFS Namenode

Datanode

Secondary Namenode

Trouble shooting and error rectification

Block size and replication factor in conf files

Module 4

Unix and Java Basics

HDFS shell commands

Customized HDFS operations in Java

Module 5

Introduction to Mapreduce

Architecture of MR V1

Key Value Pairs

Inputformat/output format

Mapper

Partitioner

Default and custom

Shuffle and Sort

Combiner

Reducer

0, 1 and multiple

Mapred-sitexml properties

Failure handling

Module 6

Mapreduce V2(YARN)

YARN Architecture

Installation and configuration of YARN (220)

Resource manager-Scheduler

FIFO , capacity and Fair scheduler

App Master /manager

Node manager

Container

Yarn child

MR-Uber jobs

Module 7 and 8

Hands on Map reduce

Map reduce sample

word count program

Structured and Unstructured Data

Single and multiple column

Combiner and Partitioner

Inverted Index

Map side joins and Reduce side join

UDF and UDAF

Analysis in Mapreduce

Hadoop Streaming

Module 9

Introduction to HIVE Datawarehouse

Architecture and Installation

Basic HQL Commands

Load,Join,external table

Metastore

Advance HQL commands

regexp ,date type

UDF in Hive

Log file Analysis

Performance tuning

Module 10

Introduction to PIG

Installation and configuration

Latin Scripts

Analysis in Pig

UDF in pig

Log file ,XML ,JSON handling

Performance tuning

Module 11

Introduction to NOSQL

ACID /CAP/BASE-Key value pair

Column family

Hbase

Document

Introduction to MongoDB

Graph DB

Neo4jDB

Module 12

Introduction to HBASE and installation

The HBase Data Model

The HBase Shell

HBase Architecture

Schema Design

The HBase API

HBase Configuration and Tuning

Module 13

Introduction to Sqoop and installation-Bulk loading ,append and incremental

OLTP

RDB to HDFS,HIVE,HBASE

Module 14

Introduction about other ecosystem

Flume

Log collector

Zookeeper

Workflow manager

Oozie

Workflow engine

Presto SQL on Hadoop

Splunk

Monitoring and reporting

Ganglia

Cluster and job monitoring

Module 15

Integrate With ETL

Talend Data studio

Module 16

Big data Analytics

Visualization Tableau

Connection with R

Module 17

Introduction to Data science

Machine learning

NLP

Statistical Analysis

Sentiment Analysis

Module 18

Hadoop Distributions overview

Cloudera

Hortonworks

Pivotal(Greenplum)

MapR

Module 19

Use cases, Case studies and Proof of Concepts

Module 20

CCD-410

Cloudera Certification Questions Discussion

Back