What

Real-Time Flight Data Pipeline and ML ETA System

This project demostrates the use of machine learning models within a real-time data processing pipeline. The intended purpose is to accelerate future work of this nature, by establishing software design choices and patterns.

The end-to-end implementation includes multiple independent components:

Data ingestion
- Aquire data from an API service and push to storage
- Highly-scalable
- Supports both historical and real-time ingestion (rather tedious)
Data preprocessing
- Pipiline step to transform and de-deplicate data
- Make compatible with ML models
Data analysis
- Simply visualize the preprocessed data to gain insight
Model training
- Execute script to create training data
- Train ML models and log metrics on versioned data
Inference
- Pipeline step to make predictions, using the best ML model.
- Borrows processing logic from model training repo
Model KPIs Live
- Execute script to measure accuracy of inference predictions.

The primary source of data is the FlightAware API /flights/ident endpoint, from which flight status information is retrieved and processed through a series of cloud-based services and tools.

Why

The goal is to predict estimated arrival times for flights in a manner that is scalable, near-real-time, and supports automatic model re-training on tabular data. The use of Serverless Cloud Functions, Apache Spark, and Delta Lake, significantly simplifies this goal by unifying batch and stream processing.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
1-data-ingest-service		1-data-ingest-service
2-data-preprocess		2-data-preprocess
3-data-analysis		3-data-analysis
4-train-model		4-train-model
5-inference		5-inference
6-model-kpi-monitor		6-model-kpi-monitor
docs		docs
.gitignore		.gitignore
README.md		README.md
default DS.code-profile		default DS.code-profile
project-config.yaml		project-config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What

Real-Time Flight Data Pipeline and ML ETA System

Why

Architecture (for now)

About

Releases

Packages

Languages

jcguidry/flight-ml-eta

Folders and files

Latest commit

History

Repository files navigation

What

Real-Time Flight Data Pipeline and ML ETA System

Why

Architecture (for now)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages