Big Data Cluster with Spark, HDFS, Kafka and Airflow

How to run the task

The project is dockerized, to start all services run:

make start-all

This process will start four docker-compose clusters, all sharing the same network:

Airflow: It will run a standalone airflow. You can access it from: http://localhost:8090. You can log in with user=admin and password=admin
HDFS: It will run the HDFS cluster. You can access the namenode UI from: http://localhost:9870
Spark: It will run the spark cluster. You can access the spark UI from: http://localhost:8080
Kafka: It will run the kafka cluster. You can send events to kafka at: kafka:9092

Once all services are up and running, you should configure the Spark Connection in Airflow. To do so, go to Admin -> Connections -> spark_default and fill the form with the following values:

Conn Id: spark_default
Conn Type: Spark
Host: spark://spark-spark-1
Port: 7077
Extra: {"deploy-mode": "client"}

Then you can run the etl_task DAG on Airflow UI.

To stop all services run:

make stop-all

Tests

To test the spark etl, run:

make test

Important notes

The project is intended to work with an arm64 architecture, although modifying Airflow Dockerfile as follows should do the trick for amd64:

ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk-amd64

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
dags		dags
data		data
infra		infra
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Cluster with Spark, HDFS, Kafka and Airflow

How to run the task

Tests

Important notes

About

Releases

Packages

Languages

eloymc98/BigDataCluster

Folders and files

Latest commit

History

Repository files navigation

Big Data Cluster with Spark, HDFS, Kafka and Airflow

How to run the task

Tests

Important notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages