Apache Spark Training

Training in Chennai

Module 1

  • Bigdata Landscape
  • Why Bigdata-3 v s-Hadoop Ecosystem
  • Introduction to Apache Spark
  • Features of Apache Spark
  • Apache Spark Stack
  • Introduction to RDD’s
  • RDD’s Transformation
  • What is good and bad In MapReduce?
  • Why to use Apache Spark

Module 2

  • Installation
  • Single node
  • Include Hadoop
  • Include Apache Spark
  • Include Hive
  • Include Sqoop
  • Include Hue

Module 3

  • Deep Dive in HDFS
  • HDFS Design
  • Fundamental of HDFS
  • Rack Awareness
  • Read/Write from HDFS
  • HDFS Federation and High Availability (Hadoop 2xx)
  • HDFS Command Line Interface

Module 4

  • Spark Shell Hands On Using HDFS
  • Spark Shell Introduction
  • Create file using Hue-Spark Shell extracting file from HDFS
  • Create RDD from HDFS file

Module 5

  • Programming with RDD Part-1
  • Creating new RDD
  • Transformations on RDD
  • Lineage Graph
  • Actions on RDD
  • RDD Concepts on Persist and Cache
  • Lazy evaluation of RDD

Module 6

  • Scala/Spark Functional Programming
  • Using Function Literals
  • Anonymous Functions
  • Define a function which accepts another function

Module 7

  • RDD Transformation Programming in Depth
  • Hands on and core concepts of map() transformation
  • Hands on and core concepts of filter() transformation
  • Hands on and core concepts of flatMap() transformation
  • Compare map and flatMap transformation

Module 8

  • Apache Spark in Action
  • Hands on and core concepts of reduce() action
  • Hands on and core concepts of fold() action
  • Hands on and core concepts of aggregate() action
  • Basics of Accumulator-Hands on and core concepts of collect() action
  • Hands on and core concepts of take() action
  • Ordered access of RDD

Module 9

  • Apache Spark Execution Model
  • How Spark execute program
  • Concepts of RDD partitioning
  • RDD data shuffling and performance issue

Module 10

  • Apache Spark PairRDD
  • Core concepts of PairRDD
  • Creation of PairRDD
  • Aggregation in PairRDD
  • Aggregation functions understanding in depth
  • How reduceByKey() work conceptually?
  • How foldByKey() work conceptually?
  • How combineByKey()work conceptually?

Module 11

  • Spark PairRDD HandsOn Lab
  • reduceByKey
  • foldByKey
  • combineByKey
  • groupByKey

Module 12

  • Spark PairRDD Joining, Zipping and
  • reduceByKey versus groupByKey performance issue
  • cogroup
  • zip
  • joining (left, right, inner etc)

Module 13

  • Understanding Hadoop SequenceFile
  • Creating Seqnce File and Processing using SPark
  • Creating SequenceFile using TSV file
  • Loading Data in Apache Hive
  • Processing SequnceFile as an RDD

Module 14

  • Spark Shared Variables
  • Shared Variables: Broadcast Variables-Shared Variables: Accumulators

Module 15

  • Spark Accumulator
  • Word count and Character Count
  • Counting Bad records in a file

Module 16

  • Spark BroadCast Variable
  • Joining two csv files one as a Broadcasted Lookup table

Module 17

  • Spark API
  • BroadCast Variable, Filter Functions and Saving File

Module 18

  • Spark API
  • Spark Join, GroupBy and Swap function

Module 19

  • Spark API
  • Remove Header from CSV file and Map Each column to Row Data

Module 20

  • Spark SQL
  • HiveContext
  • Schema RDD replaced by DataFrame API
  • History of SparkSQL
  • Catalyst Optimizer

Module 21

  • SparkSQL HandsOn Sessions
  • Hive Configuration
  • Create Hive table using Spark
  • Load Data in HIve table using Spark
  • Create another table using DataFrame

Module 22

  • Implementing Business Logic using SparkSQL
  • Loading CSV file
  • Spark Case classes (To create schema for csv file)
  • Convert RDD to DataFrame using DataFrmae API for query data
  • Using SQL query on DataFrame

Module 23

  • Spark Loading and Saving Your Data
  • TextFiles
  • CSV and TSV files
  • JSON Files

Module 24

  • Spark Loading and Saving Your Data SQL and NOSQL
  • JDBC (MySQL)
  • HBase (NoSQL)

Module 25

  • Writing Spark Applications
  • Spark Applications vs Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Logging

Module 26

  • Spark Streaming in Depth Part-1
  • Spark Streaming Overview-Example: Streaming Word Count

Module 27

  • Spark Streaming in Depth Part-2
  • Other Streaming Operations
  • Sliding Window Operation
  • Developing Spark Streaming Applications

Module 28

  • Spark Algorithms Part-1
  • Iterative Algorithm
  • Graph Analysis
  • Machine Learning

Module 29

  • Case studies
Developed by God Particles
Back to Top