Skip to content

This project utilizes PySpark to build a predictive model for estimating flight delays. It involves data collection, EDA, preprocessing, and the development of a machine learning model to provide airlines and passengers with insights into potential delays, enhancing travel decision-making.

Notifications You must be signed in to change notification settings

Amaaan09/DelayDecoded-pyspark

Repository files navigation

Will my Flight be Delayed?

The goal of this project is to predict if a said flight will be delayed.

Approach

I decided to use PySpark for this project. PySpark is the Python API written in python to support Apache Spark. Apache Spark is a distributed framework that can handle Big Data analysis. PySpark is a great language for performing exploratory data analysis at scale, building machine learning pipelines, and creating ETLs for a data platform. PySpark is a Python API for Spark released by the Apache Spark community to support Python with Spark.

Dataset

This is the dataset used for this project. The dataset contains information about various airlines. The dataset contains 61 columns and more than 10 Million Rows rows. The datasets were combined using Python. The dataset was cleaned and preprocessed using PySpark.

Deployment

This is done using Docker and Streamlit, Visit the website: https://delaydecoded.azurewebsites.net

OR

Run the following command:

Pull the docker image:

docker pull dockeramaan/pysparkproj

Run the docker image:

docker run -p 8501:8501 dockeramaan/pysparkproj

About

This project utilizes PySpark to build a predictive model for estimating flight delays. It involves data collection, EDA, preprocessing, and the development of a machine learning model to provide airlines and passengers with insights into potential delays, enhancing travel decision-making.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published