-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis-ready Parquet download? #2
Comments
Hi Mark, thank you making this issue,
While I am in principle not opposed to having other formats of the data,
Before considering something like this, I need the files to have their
‘gaps’ accounted for.
As you know, when readsb restarts for any reason (configuration change
being the most common) one readsb (let’s say -0) will go down while the
other will keep running.
Then once 0 is back 1 will go down and restart.
This will result in a few minutes of unique data for each file, which is
why they are both there.
So basically, I need to solve this problem first with the globe_history
format before moving forward.
Make sense?
…On Tue, 5 Mar 2024 at 09:47, Mark Litwintschik < ***@***.***> wrote:
I built an ETL script that turns the current download into a parquet file.
It has names for every field, is columnar-formatted so it is much quicker
to query and it is compressed with ZStandard so a d
*DuckDuckGo* removed one tracker. More
<https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8>
Report Spam
<https://duckduckgo.com/-uEgSxrlRvtp75UHOCEW7NotPVd3Be6g9xTrg0qR4_f0vQMLIEUum_juOmorLouK0Ii9oFFM9H44HytK0pZNQjIvI0221nv6wv0HbzC9G1c82W69vqPlG1i-eihth2nI0BBTbJQAbr2fxypS99oYYfA6duer_sJgmIwlnrrTq8oirkYzXX3xjDQ91LI3ee4NEP2l-h5jyMqsfXYvE36befDfOaJx18XlcIlUo-IzxS7VDpm1qOpZUVX0s6XCbwUncq217gcYezMtq8gp1owHU1XhK1e-3ACKoR9ZXRFVtojzLgK__9oBsKus2vp_GeaTiAJD3nvAvff5D--W7H60szcfLZY332MgEHS92GGKGf8QxK8DsMol2NvjXS3aKILvvKoLiM07SHOgRz9UpP5CRezWlQG2RZ_md0xsHLEj0h4mZ1f4NSPR7mciiwVL03K9VpscgtPTlTI1MiUceW98D07-vH6w2XddjKPe5jRmrNL6Xn8j75kDJZ2h8>
I built an ETL script that turns the current download into a parquet file.
It has names for every field, is columnar-formatted so it is much quicker
to query and it is compressed with ZStandard so a day's worth of data is
still around 1.2 GB.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your
infrastructure and produce a daily Parquet file in addition to the current
daily download tar file?
—
Reply to this email directly, view it on GitHub
<#2>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAM553LXP35IMWZ74DYTJETYWWBCNAVCNFSM6AAAAABEGWDOTSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE3DQNRTG4YTGOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hey nice blog post! :) If you're gonna make such a nice new format you should include info if the airplane is on the ground. 'altitude':
trace[3]
if str(trace[3]).strip().lower() != 'ground'
else None, I didn't see that saved anywhere. You probably already referenced it while using the data, but here is some explanation on the format: https://github.com/wiedehopf/readsb/blob/dev/README-json.md#trace-jsons Also sorry for the format, it's a bit of a mess. |
@marklit of course nothing is preventing you from tackling this project yourself and making the parquet-ready data available similar to this repo. :) |
@marklit, I've created a ClickHouse database with the data and also added ADSB-E: https://github.com/ClickHouse/adsb.exposed/ |
I built an ETL script that turns the current download into a parquet file. It has names for every field, is columnar-formatted so it is much quicker to query and it is compressed with ZStandard so a day's worth of data is still around 1.2 GB. There is also H3 indices which help filter specific geographies quickly.
https://tech.marksblogg.com/global-flight-tracking-adsb.html
Is there any chance the above ETL script could work its way into your infrastructure and produce a daily Parquet file in addition to the current daily download tar file?
The text was updated successfully, but these errors were encountered: