Analysis-ready Parquet download? #2

marklit · 2024-03-05T08:47:23Z

I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB. There is also H3 indices which help filter specific geographies quickly.

https://tech.marksblogg.com/global-flight-tracking-adsb.html

Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file?

iakat · 2024-03-05T09:29:16Z

Hi Mark, thank you making this issue, While I am in principle not opposed to having other formats of the data, Before considering something like this, I need the files to have their ‘gaps’ accounted for. As you know, when readsb restarts for any reason (configuration change being the most common) one readsb (let’s say -0) will go down while the other will keep running. Then once 0 is back 1 will go down and restart. This will result in a few minutes of unique data for each file, which is why they are both there. So basically, I need to solve this problem first with the globe_history format before moving forward. Make sense?

…

On Tue, 5 Mar 2024 at 09:47, Mark Litwintschik < ***@***.***> wrote: I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a d *DuckDuckGo* removed one tracker. More <https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8> Report Spam <https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8> I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB. https://tech.marksblogg.com/global-flight-tracking-adsb.html Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file? — Reply to this email directly, view it on GitHub <#2>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAM553LXP35IMWZ74DYTJETYWWBCNAVCNFSM6AAAAABEGWDOTSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DQNRTG4YTGOA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

wiedehopf · 2024-03-05T11:43:50Z

Hey nice blog post! :)

If you're gonna make such a nice new format you should include info if the airplane is on the ground.

            'altitude':
                trace[3]
                if str(trace[3]).strip().lower() != 'ground'
                else None,

I didn't see that saved anywhere.
Possibly just a bool in your scheme?

You probably already referenced it while using the data, but here is some explanation on the format: https://github.com/wiedehopf/readsb/blob/dev/README-json.md#trace-jsons
The aircraft object is only present for every 4th point but i assume you didn't need much data from there / your DB scheme handles that somehow.

Also sorry for the format, it's a bit of a mess.

iakat · 2024-03-18T12:40:09Z

@marklit of course nothing is preventing you from tackling this project yourself and making the parquet-ready data available similar to this repo. :)

alexey-milovidov · 2024-04-03T04:10:36Z

@marklit, I've created a ClickHouse database with the data and also added ADSB-E: https://github.com/ClickHouse/adsb.exposed/
Connect me if there are further ideas.

iakat added help wanted Extra attention is needed enhancement New feature or request labels Mar 18, 2024

iakat added question Further information is requested and removed enhancement New feature or request help wanted Extra attention is needed labels Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis-ready Parquet download? #2

Analysis-ready Parquet download? #2

marklit commented Mar 5, 2024 •

edited

Loading

iakat commented Mar 5, 2024 via email

wiedehopf commented Mar 5, 2024

iakat commented Mar 18, 2024

alexey-milovidov commented Apr 3, 2024

Analysis-ready Parquet download? #2

Analysis-ready Parquet download? #2

Comments

marklit commented Mar 5, 2024 • edited Loading

iakat commented Mar 5, 2024 via email

wiedehopf commented Mar 5, 2024

iakat commented Mar 18, 2024

alexey-milovidov commented Apr 3, 2024

marklit commented Mar 5, 2024 •

edited

Loading