Datasets featuring global, high-level flight schedules per aircraft, extracted from ADS-B position reports.
Published per quarter of a year, starting from 2024 onwards. Covers all flights as long as within coverage of the ADSBlol initiative.
- This project uses the ADS-B data from the ADSBlol initiative. Consider supporting their great project.
- This project uses validation data from vradarserver/Andrew Whewell to check extracted routes with additional route data (based on aircraft callsign). Again, consider supporting this initiative.
See the Releases section of this repository for a parquet file with the flights per aircraft, per quarter of a year.
The parquet filetype has been selected to keep flights data manageable in terms of size and processing/loading times. Each quarter features approx. 10-12+ million flights and ~500,000 aircraft, which in csv format would total approx. 3 GB. Hence the selection of a parquet filetype, which stays far below 1 GB. Loading a parquet file is very straightforward with python:
df = pandas.read_parquet('2024_Q1.parquet')
Furthermore, to check the parquet dataset without python, you can use tools like ParquetViewer which feature a user interface/GUI and can be installed on Windows as exe.
The data is published per quarter of a year. The 4 quarters of each year feature some overlap to ensure no flights are incomplete (not cut in half).
Given potentially limited ADS-B reception coverage of the ADSBlol initiative in certain continents, some aircraft tracks start after the airport of origin or end before the airport of destination. For those cases, the flights data has been enhanced by looking up the aircraft flight callsign and matching it with the open-source aircraft callsign vs route dataset of vradarserver/Andrew Whewell.
Given ADS-B transmissions simply sending location data, wrong location data as a result of GPS spoofing can also be transmitted. Once more, the added column with callsign vs route lookup allows to filter out those flights where aircraft emitted wrong position data.
Status Q2 2024 Number of receivers/antennas of ADSBlol initiative (image above)
Aircraft coverage of ADSBlol initiative. Time of day ~13:00 UTC to have reasonable ops in all continents - no midnight situation in major markets (image above)
Given the fact that ADSBlol coverage improves regularly, validation of the extracted flights is a never finished task, especially given the global scope.
At present, each quarter of extracted flights features approx. 10-12+ million flights and ~500,000 aircraft.
Validation Case Study - AMS/EHAM Reference Day 2024-06-14
-
Number of (commercial!) flights extracted from ADS-B data vs # flights from AMS schedule --> significantly close, within 5% error margin
-
Airline representation --> significantly close, within 5% error margin
-
Destination/origin of a flight accurate 73% of time purely based on ADS-B track data, improved to 95+% by using callsign vs route lookup
Please use in line with the license defined in this repository. No guarantee, no liability, no warranty. All open-source.
This concerns RWY times, so lift-off time for departures and touchdown time for arrivals.
However, there can also be cases with more limited ADS-B coverage, where the track does not start or stop at the airport:
For those cases, the beginning/end of the track has been selected as the time of the flight.
Similar to the section above, for those cases where the track does not start or stop at the airport, multiple airports in the vicinity of the first/last position of the ADS-B track have been listed as options.
In case of go-arounds/touch-and-go/balked landings, only the final touchdown is counted as touchdown time of the flight (with commercial flights in mind).