Hadoop

Big Data Hadoop is an open-source framework that supports the processing of large data sets in a distributed computing environment. It is designed to expand from single servers to thousands of machines, each providing computation and storage. As compared to the traditional tool, Hadoop provides more accurate facts and figures. Hadoop supports advanced features like data visualization and predictive analytics in order to provide and represent the useful insights in a best graphical manner. It can help to optimize the performance using a single server and handle huge volume of information.

Hadoop is considered affordable for both enterprise and small business which makes it an attractive solution with endless potential. With the passage of time, companies and enterprises are getting closer to Hadoop. They are moving to implement big data to support the marketing and other efforts and resources.

Main Features

  • Design distributed systems that manage “big data” using Hadoop and related technologies.
  • Use Pig and Spark to create scripts to process data on a Hadoop cluster in more complex ways.
  • Analyze relational data using Hive and MySQL.
  • Understand other Hadoop-based technologies, including Hive, Pig, and Spark
  • You would learn how to write programs using MapReduce.
  • You would become proficient in administering or managing the Hadoop Cluster.
  • Querying and managing large datasets that reside in distributed storage.
  • Learn Full In and out of Apache HIVE (From Basic to Advance level).

Introduction to Big Data

1
Understanding Big Data
2
Types of Big Data – Black Box Data, Social Media Data
3
Stock Exchange Data, Power Grid Data, Structured, semistructured and unstructured
4
Big data technologies- Operational big data, Analytical Big Data, Operational versus Analytical
5
Big Data challenges (Limitations and solutions)
6
Industry example in Big data – Government Agencies(US)
7
Getting started with HADOOP – Introduction and history of Hadoop

Hadoop Introduction and Basic Concepts

1
Hadoop basics – Why and what made the Hadoop come into existence
2
Amount of data that Hadoop can handle, concepts of petabyte and exabyte
3
Apache framework – Basic component to the framework
4
HDFS – concepts of name node, data node and meta data
5
Hadoop Map Reduce and Yarn
6
Concepts of Job tracker and task tracker
7
Understanding the Hadoop Zoo – various applications that are stacked in the Hadoop ecosystem
8
Hadoop ecosystem major components – Apache Sqoop, HBase and Apache Pig, Apache Hive, Apache Zoo keeper, Apache Flume and Apache Spark

Understanding HDFS

1
Introduction to the Hadoop stack
2
HDFS design goals
3
Functioning process among the name node and data node- difference between name node and data node
4
Map reduce framework and Yarn – understanding each of the framework, their similarity and their difference
5
Hadoop Resource scheduling – understanding the two types of resource scheduling in Hadoop
6
Fairshare scheduler and Capacity scheduler

Hadoop Configuration and Settings

1
Software and hardware requirements for Hadoop to be executable
2
Working with Hadoop on a windows platform with the use of a virtual machine
3
Setting up VM Ware or Virtual Box and linux based Ubuntu OS
4
Configuration of the system
5
Setting up the path files of java and certain configurations for Hadoop

Formation of Clusters Single and Multi

1
Understanding the pseudo mode, single node and multinode clustering
2
Application based set up of a multi-node cluster
3
Configurations required for the multi-node clustering
4
Execution of the HDFS and Map reduce function through commands
5
Understanding the HDFS in details, concepts of HDFS configuration and architecture
6
Verifying the working of the name node and data node

HDFS Performance and Tuning

1
HDFS access, commands, applications and APIs
2
HDFS Performance Envelope
3
HDFS Tuning Parametres
4
Native Java API’s for HDFS and others

Apache Hive

1
Downloading Hive Application – Features of Hive
2
Architecture of Hive and Working of Hive
3
Hive installation- verifying JAVA installation, verifying
4
Hadoop installation, Installing Hive. Configuring Hive
5
Types of hive data – columns, literals, null and complex values
6
Creating Hive database, dropping database, creating table

Apache Pig

1
An overview of the Apache pig and the architecture
2
Apache pig versus hive
3
Application and history of the Apache pig
4
Prerequisites for the download of Apache pig
5
Installation and execution of the apache pig- grunt shell i.e., shell commands
6
Understanding pig latin
7
Reading and storing data through the Apache Pig

Introduction of R programming

1
Links for the necessary softwares, GUI of R
2
History of R programming
3
Basic operations in R : Constant Values
4
Numeric
5
Arithmetic Operations
6
Conditions: Equality, Greater Than, Less Than
7
Function Calls: Introduction to R Functions
8
Symbols
9
NA, Inf, NaN, NULL, TRUE, FALSE
10
Data Types and Data Structures in R
11
Subsetting in R- c() c function, use of rep(), factor(), dataframe(), array() functions
12
Additional topics on data structures
13
Meta data access: dimnames(), rownames(), colnames()

Additional Topics on Data Structures

1
The recycling rule
2
Type coercion
3
Coercing factors: Using as.factor()
4
Attributes: attribute(), attr()
5
Importing of dataset from different sources in R
6
Control structures and user defined functions
7
If-else (), iteration and looping
8
lapply(), sapply(), apply()

Integrating R with Hadoop

1
Association Mining and Text Mining using R programming
2
Formation of word cloud using a real data
3
Sentiment analysis using R and its association with Hadoop
No announcements at this moment.

Be the first to add a review.

Please, login to leave a review
Add to Wishlist
Duration: 40 hours
Lectures: 73
Level: Intermediate

Archive

Working hours

Monday 9:30 am - 7.00 pm
Tuesday 9:30 am - 7.00 pm
Wednesday 9:30 am - 7.00 pm
Thursday Closed
Friday 9:30 am - 7.00 pm
Saturday 9:30 am - 7.00 pm
Sunday 9:30 am - 7.00 pm
WhatsApp chat