Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Dataset [E33OMA90D] #126

Open
smhassanerfani opened this issue Jun 5, 2024 · 7 comments
Open

New Dataset [E33OMA90D] #126

smhassanerfani opened this issue Jun 5, 2024 · 7 comments
Labels

Comments

@smhassanerfani
Copy link

Dataset Name

E33OMA90D

Dataset URL

No response

Description

This dataset represents aerosol compositions simulated by the NASA GISS ModelE ESM for the first three months of 1950, covering three different aerosol species: Sea Salt, Black Carbon, and Clay.

Size

The dataset is a NetCDF file with a total size of 79GB.

License

Unknown

Data Format

NetCDF

Data Format (other)

No response

Access protocol

scp

Source File Organization

There are 48 files per day for velocity fields (including u, v, and w), precipitation, emissions, and concentrations of each aerosol species. The velocity fields and concentrations are vertically expanded over the 60 pressure levels.

Example URLs

No response

Authorization

None

Transformation / Processing

No response

Target Format

Zarr

Comments

The dataset is currently located in Simurgh, Marcus supercomputer.

@jbusecke
Copy link
Contributor

jbusecke commented Jun 5, 2024

Hey @smhassanerfani. Thanks for raising this.

The dataset is currently located in Simurgh, Marcus supercomputer.

Can we somehow get access to that computer via HTTP, FTP or globus?

@smhassanerfani
Copy link
Author

Hi Julius,
I checked with Marcus about our options for transferring data in Simurgh. We normally use scp and rsync. Do you think these options could help?

@jbusecke
Copy link
Contributor

jbusecke commented Jun 6, 2024

We do not currently support that with pangeo-forge unfortunately. I have raised pangeo-forge/pangeo-forge-recipes#753 to discuss.

What is your timeline here? If you have a tight deadline, we can see if we can hack around this for now, and try to implement a cleaner solution later.

@smhassanerfani
Copy link
Author

smhassanerfani commented Jun 6, 2024

We need it as soon as possible. We could begin with the smaller version, which is around 70GB, and delay the larger version until a clean solution is available. By the way, can we utilize the solution you mentioned in the technical document?
image

@jbusecke
Copy link
Contributor

@smhassanerfani yes you could, and should, particularly if this is time sensitive!

But maybe before resorting to 'pushing' the data we try pangeo-forge/pangeo-forge-recipes#753 (comment) out first? I could make some time for this tomorrow or Thu?

@smhassanerfani
Copy link
Author

Awesome! let me know whenever works better for you. I can stop by LEAP or we can have a Zoom if needed.

@jbusecke
Copy link
Contributor

Quick summary of my meeting with @smhassanerfani.

  • We tried to use sshfs, but realized that the server is not accessible outside the CU network? @mvanlierwalq if there is any way to make the machine accessible from the public internet (ftp, or some public url/ip) that would be ideal.
  • To unblock folks for now @smhassanerfani and I agreed to be sneaky for now:
    • @smhassanerfani will download and past the file into the user directory, and then use a script to write it to zarr in the leap-persistent/... bucket.
    • From there I can copy it to the readonly bucket, and add an entry to our catalog, so other leap users can see the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants