GitHub

This repositoy includes what is necessary in Kubernetes AKA k8s for the deployment of Spark, which is an open-source distributed computing framework. One can utilize code developed in Python, Scala, SQL and others to create applications that can process large amount of data. Spark uses parallel processing to distribute the work-load across a cluster of physical hosts, for example, nodes of Kubernetes cluster. I will explain the practical implementation of Spark on Kubernetes (Spark-on-k8s) and will leverage Google Kubernetes Engine (GKE) (Google was first to introduce and then open source Kubernetes) and show steps in implementing Spark on GKE cluster-manager by creating a simple yet effective example that uses PySpark to generate random test data in Spark, take that data and post it to Google BigQuery Data Warehouse table.

The directories are structured as below

src - RandomDataBigQuery.py, the main Python script that runs the code. configure.py, the Python scipt that reads the yaml file config.yml and creates a disctionary of variables
sparkutils - sparkstuff.py is used to initialise Spark and set configuration parameters for Spark to access Google BigQuery database. It does so through JDBC connection
design - gke2.png, Includes the high level design document
othermisc - Includes usedFunctions.py that has multiple functions for creating random data
deployment - Has src/scripts, shell scripts that are used to grant the necessary roles "grant-roles.sh" for the Kubernetes and bigQuery plus the main bash shell "gke.sh" that runs spark-submit code. In this case the code is only shown for gcp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
deployment		deployment
design		design
othermisc		othermisc
sparkutils		sparkutils
src		src
.gitignore		.gitignore
README.md		README.md

michTalebzadeh/spark_on_gke

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages