Will my Flight be Delayed?

The goal of this project is to predict if a said flight will be delayed.

Approach

I decided to use PySpark for this project. PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. PySpark is a great language for performing exploratory data analysis at scale, building machine learning pipelines, and creating ETLs for a data platform. PySpark is a Python API for Spark released by the Apache Spark community to support Python with Spark.

Dataset

This is the dataset used for this project. The dataset contains information about various airlines. The dataset contains 61 columns and more than 10 Million Rows rows. The datasets were combined using Python. The dataset was cleaned and preprocessed using PySpark.

Deployment

This is done using Docker and Streamlit, Visit the website: https://delaydecoded.azurewebsites.net

OR

Run the following command:

Pull the docker image:

docker pull dockeramaan/pysparkproj

Run the docker image:

docker run -p 8501:8501 dockeramaan/pysparkproj

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
airport-index		airport-index
airport-shiz		airport-shiz
notebooks		notebooks
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
airport-data.csv		airport-data.csv
cols.txt		cols.txt
page.py		page.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Will my Flight be Delayed?

Approach

Dataset

Deployment

About

Releases

Packages

Languages

Amaaan09/DelayDecoded-pyspark

Folders and files

Latest commit

History

Repository files navigation

Will my Flight be Delayed?

Approach

Dataset

Deployment

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages