This 4 day intensive fast paced course will deliver a technical overview of the Hadoop landscape. No prior knowledge of databases is assumed. However, previous basic Java programming experience and linux command line experience will be very useful for this course.
The course is targeted towards technical people who want to understand the emerging world of Big Data, with a specific focus on Hadoop.
Audience: Data Analysts, Business Analysts, Developers, Data Managers, Business Intelligence Analysts, IT Administrators, Data Architects
Course Syllabus:
Day 1
Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS
- Big Data, Factors constituting Big Data
- Hadoop and Hadoop Ecosystem
- Map Reduce – Concepts of Map, Reduce, Ordering, Concurrency, Shuffle , Reducing, Concurrency
- Hadoop Distributed File System (HDFS) Concepts and its Importance
- Deep Dive in Map Reduce – Execution Framework, Partioner, Combiner, Data Types, Key pairs
- HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
- Parallel Copying with DISTCP, Hadoop Archives
Hands on Exercises
- Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
- Accessing HDFS from Command Line
- Map Reduce – Basic Exercises
- Understanding Hadoop Eco-system
- Introduction to Sqoop , use cases and Installation
- Introduction to Hive , use cases and Installation
- Introduction to Pig , use cases and Installation
- Introduction to Oozie , use cases and Installation
- Introduction to Flume , use cases and Installation
- Introduction to Yarn
Day 2
Deep Dive in Map Reduce and Yarn
- How to develop Map Reduce Application , writing unit test
- Best Practices for developing and writing , Debugging Map Reduce applications
- Joining Data sets in Map Reduce
- Algorithms – Traversing Graph, etc
- Hadoop API’s
Deep Dive in Pig
- Grunt, Script Mode, Data Model
- Advance Pig Latin, Evaluation and Filter functions, Pig and Ecosystem
- Real time use cases – Gaming Industry, Oil and Gas Sector
Day 3
Deep Dive in Hive
- Understanding Hive , Architecture, Physical Model, Data Model, Data Types
- Hive QL- DDL, DML, other Operations
- Understanding Tables in Hive, Partitioning, Indexes, Bucketing, Sub Queries, Joining Tables, Data Load and appending data to existing Table
- Hands on Exercises – Playing with huge data and Querying extensively.
- User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning
Introduction to Hbase architecture
- Introduction to HBase, Architecture, Map Reduce Integration, Different Client API – Features and Administration.
Day 4
Deep Dive into Ooze
- Understanding Oozie
- Designing and Implementing Workflow
- Oozie Coordinator application Implementation
Hadoop Cluster Setup and Running Map Reduce Jobs
- Hadoop Multi Node Cluster Setup using Amazon ec2 – Creating 4 node cluster setup
- Running Map Reduce Jobs on Cluster
Major Project – Putting it all together and Connecting Dots
- Putting it all together and Connecting Dots
- Working with Large data sets, Steps involved in analysing large data
Advance Map reduce
- Delving Deeper Into The Hadoop API
- More Advanced Map Reduce Programming, Joining Data Sets in Map Reduce
- Graph Manipulation in Hadoop