Hadoop Administrator Training Chennai

Module 1

What is Big Data ?

Big Data Facts

The Three V’s of Big Data

Understanding Hadoop

What is Hadoop ?,Why learn Hadoop ?

Relational Databases Vs. Hadoop

Motivation for Hadoop

6 Key Hadoop Data Types

The Hadoop Distributed File system (HDFS)

Module 2

What is HDFS ?

HDFS components

Understanding Block storage

The Name Node

The Data Nodes

Data Node Failures

HDFS Commands

HDFS File Permissions

The MapReduce Framework

Module 3

Overview of MapReduce

Understanding MapReduce

The Map Phase

The Reduce Phase

WordCount in MapReduce

Running MapReduce Job

Planning Your Hadoop Cluster

Module 4

Single Node Cluster Configuration

Multi-Node Cluster Configuration

Checking HDFS Status

Breaking the cluster

Copying Data Between Clusters

Adding and Removing Cluster Nodes

Rebalancing the cluster

Name Node Metadata Backup

Cluster Upgrading

Module 5

Installing and Managing Hadoop Ecosystem Projects

Sqoop

Flume

Hive

Pig

HBase

Oozie

Module 6

Managing and Scheduling Jobs

Managing Jobs

The FIFO Scheduler

The Fair Schedule

How to stop and start jobs running on the cluster

Cluster Monitoring, Troubleshooting, and Optimizing

Module 7

General System conditions to Monitor

Name Node and Job Tracker Web Uis

View and Manage Hadoop’s Log files

Ganglia Monitoring Tool

Common cluster issues and their resolutions

Benchmark your cluster’s performance

Populating HDFS from External Sources

How to use Sqoop to import data from RDBMSs to HDFS

How to gather logs from multiple systems using Flume

Features of Hive, Hbase and Pig

How to populate HDFS from external Sources

Hadoop Administrator Training Chennai

About Big Data

Big data is a broad term for data sets so large or complex that traditional dataprocessing applications are inadequate. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy.Apache Hadoop is 100% open source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big.

Introduction to Hadoop Ecosystem – Architecture – HDFS -Mapreduce (MRV1)-Hadoop v1 and v2-Hadoop Data fedaration-VM Linux ubuntu/CentOS-JDK,ssh,eclipse-Installation and config of Hadoop,-HDFS Daemons-YARN Daemons-High Availability-Automatica and manual failover-Writing Data to HDFS-Reading Data from DFS-Replica placement Strategy-Failure Handling-Namenode ,Datanode ,Block-Safe mode ,Re-balancing and load optimization-Trouble shooting and error rectification-Hadoop fs shell commands-Unix and Java Basics.

Back