IOE SYLLABUS: BIG DATA TECHNOLOGIES


Subject code: CT 765 07

Course Objectives:


The course "Big Data Technologies" aims to provide students with a comprehensive understanding of key concepts and technologies in the field of big data analytics. In the introductory segment, students will gain insights into the background of data analytics, the role of distributed systems, and the emerging trends in big data analytics. The course will delve into the Google File System, elucidating its architecture, availability, fault tolerance, and optimization strategies for large-scale data handling. The Map-Reduce Framework will be thoroughly explored, covering the basics of functional programming, real-world problem modeling, scalability goals, fault tolerance, and parallel efficiency. The NoSQL module will introduce students to both structured and unstructured data, discussing the taxonomy of NoSQL implementations with a focus on the architectures of Hbase, Cassandra, and MongoDB. The course will further delve into searching and indexing big data, emphasizing full-text indexing and searching, and utilizing tools like Lucene and Elasticsearch for distributed searching. The culmination of the course will involve a case study on Hadoop, providing an introduction to the Hadoop environment, data flow, Hadoop I/O, query languages, and its integration with the Amazon Cloud. Overall, the course aims to equip students with the knowledge and practical skills necessary to navigate and harness the potential of big data technologies in various applications.


 

  1. Introduction to Big Data [7 hours]
    1. Big Data Overview
    2. Background of Data Analytics
    3. Role of Distributed System in Big Data
    4. Role of Data Scientist
    5. Current Trend in Big Data Analytics
  2. Google File System[7 hours]
    1. Architecture
    2. Availability
    3. Fault tolerance
    4. Optimization for large scale data
  3. Map-Reduce Framework[10 hours]
    1. Basics of functional programming
    2. Fundamentals of functional programming
    3. Real world problems modeling in functional style
    4. Map reduce fundamentals
    5. Data flow (Architecture)
    6. Real world problems
    7.  Scalability goal
    8. Fault tolerance
    9. Optimization and data locality
    10. Parallel Efficiency of Map-Reduce
  4. NoSQL[6 hours]
    1. Structured and Unstructured Data
    2. Taxonomy of NoSQL Implementation
    3. Discussion of basic architecture of Hbase, Cassandra and MongoDb
  5. Searching and Indexing Big Data[7 hours]
    1. Full text Indexing and Searching
    2. Indexing with Lucene
    3. Distributed Searching with elasticsearch
  6. Case Study: Hadoop[8 hours]
    1. Introduction to Hadoop Environment
    2. Data Flow
    3. Hadoop I/O
    4. Query languages for Hadoop
    5. Hadoop and Amazon Cloud