Reading Time: 2 minutes

Apache Hadoop

    Module 1: Introduction to Hadoop

    • High Availability
    • Scaling
    • Advantages and Challenges

    Module 2: Introduction to Big Data

    • What is Big data
    • Big Data opportunities,Challenges
    • Characteristics of Big data

    Module 3: Introduction to Hadoop

    • Hadoop Distributed File System
    • Comparing Hadoop & SQL
    • Industries using Hadoop
    • Data Locality
    • Hadoop Architecture
    • Map Reduce & HDFS
    • Using the Hadoop single node image (Clone)

    Module 4: Hadoop Distributed File System (HDFS)

    • HDFS Design & Concepts
    • Blocks, Name nodes and Data nodes
    • HDFS High-Availability and HDFS Federation
    • Hadoop DFS The Command-Line Interface
    • Basic File System Operations
    • Anatomy of File Read,File Write
    • Block Placement Policy and Modes
    • More detailed explanation about Configuration files
    • Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
    • How to add New Data Node dynamically,decommission a Data Node dynamically (Without stopping cluster)
    • FSCK Utility. (Block report)
    • How to override default configuration at system level and Programming level
    • HDFS Federation
    • ZOOKEEPER Leader Election Algorithm
    • Exercise and small use case on HDFS

    Module 5: Map Reduce

    • Map Reduce Functional Programming Basics
    • Map and Reduce Basics
    • How Map Reduce Works
    • Anatomy of a Map Reduce Job Run
    • Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
    • Job Completion, Failures
    • Shuffling and Sorting
    • Splits, Record reader, Partition, Types of partitions & Combiner
    • Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
    • Types of Schedulers and Count