-
Hadoop is a framework for distributed storage and processing.
-
Core components of Hadoop include HDFS for storage, YARN for cluster-resource management, and MapReduce or Spark for processing. -The Hadoop ecosystem includes multiple components that support each stage of big data processing:
- Flume and Scoop ingest data
- HDFS and HBase store data
- Spark and MapReduce process data
- Pig, Hive, and Impala analyze data
- Hue and Search help to explore data
- Oozie manages the workflow of Hadoop tasks
On single node hadoop.
- Word Count Practical
- Trending Word Count
- Database Join
- Vector Multiplication
- Matrix Multiplication