Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just max-temp #25

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

just max-temp #25

wants to merge 8 commits into from

Conversation

EtomicBomb
Copy link
Contributor

The input script

seq $FROM $TO |
sed "s;^;$URL;" |
sed 's;$;/;' |
xargs -n1 -r curl --insecure |
grep gz |
tr -s ' \n' |
cut -d ' ' -f9 |
sed 's;^(.*)(20[0-9][0-9]).gz;\2/\1\2.gz;' |
sed "s;^;$URL;" |
xargs -n1 curl --insecure |
gunzip > "$input_dir/temperatures.txt"

doesn't just download the temperatures from 2015 it downloads from the whole range $FROM to $TO. It was taken from https://github.com/binpash/benchmarks/blob/6d33e8209fac4a93b93a603f8e619c9a7d9deeb0/max-temp/max-temp.sh

@vagos
Copy link
Collaborator

vagos commented Oct 22, 2024

The input script

seq $FROM $TO | sed "s;^;$URL;" | sed 's;$;/;' | xargs -n1 -r curl --insecure | grep gz | tr -s ' \n' | cut -d ' ' -f9 | sed 's;^(.*)(20[0-9][0-9]).gz;\2/\1\2.gz;' | sed "s;^;$URL;" | xargs -n1 curl --insecure | gunzip > "$input_dir/temperatures.txt"

doesn't just download the temperatures from 2015 it downloads from the whole range $FROM to $TO. It was taken from https://github.com/binpash/benchmarks/blob/6d33e8209fac4a93b93a603f8e619c9a7d9deeb0/max-temp/max-temp.sh

Yep, that's correct.

@vagos
Copy link
Collaborator

vagos commented Oct 22, 2024

How about we add a --small option to run.sh that runs the script for just two years (2014, 2015 ?) and then runs it for all years in any other case.

@EtomicBomb
Copy link
Contributor Author

EtomicBomb commented Oct 22, 2024

That makes sense but also the amount of data from just 2015 is insane. temperatures.txt is 34G and it takes longer to download than you would think because it fetches each segment from a different URL. Downloading one year, much less two years, really can't be in CI because it will literally take like an hour just to download. It would be feasible if we just download the data from the atlas server, which should be much faster than downloading all of the parts from noaa.

@vagos
Copy link
Collaborator

vagos commented Oct 22, 2024

I see. How about we download a subset of a single year, then.
Try downloading just https://atlas.cs.brown.edu/data/noaa/1901/029070-99999-1901.gz

Just make sure that this is indeed a subset of the whole year.

@EtomicBomb
Copy link
Contributor Author

That sounds reasonable!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants