Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Rearrange to show working example at the top with the correct links.
  • Loading branch information
noaaroland authored Oct 28, 2024
1 parent 9ad2949 commit 6694136
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions read_s3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,24 @@

## Summary:

(Update: I got it to work with intake. See below. Correct AWS S3 config TBD.)
The goal is to test providing access to large model outputs via S3 and Kurchunk.

The goal is to test providing access to large model outputs via S3 and Kurchunk. Eventually we hope the data will be hosted as part of NODD so the access will be unrestricted for all users. However, we don't know the access time will be good enough for use by CEFI scientists and the data portal.
After reading through a lot of xarray bug reports, it's clear that it is difficult to get the right options to the right place to open a xarray data set from X3. The recoomentation is to use a tool like intake to specify the options and have it handle the negotiations with the storage system. I have done that and I can successfully read the files as a Dask backed xarray dataset.

To test this we want to make a sample available to a variety of users (currently all within NOAA).
## Working Examples

## Results:
[Here is the intake catalog which defines the data and the underlying grid](intake_catalog.yml)

By setting the bucket to "publically readable" I can read it when [running a notebook](https://github.com/NOAA-PMEL/GoA_xpublish/blob/main/read_s3/read_goa_kerchunk-s3.ipynb) on an EC2 instance at AWS without as far as I can tell providing credentials.
[Here is the notebook that reads and plots the data.](https://github.com/NOAA-PMEL/GoA_xpublish/blob/main/read_s3/intake_read_s3.ipynb)

However, when I try the [same notebook](https://github.com/NOAA-PMEL/GoA_xpublish/blob/main/read_s3/read_goa_kerchunk-s3_from_pmel.ipynb) form pmel.noaa.gov I get an error:
Eventually we hope the data may be hosted as part of NODD so the access will be unrestricted for all users. However, we don't know the access time will be good enough for use by CEFI scientists and the data portal.

NoCredentialsError: Unable to locate credentials
To test this we want to make a sample available to a variety of users (currently all within NOAA).

## Further results:
## Initial Attempts:

After reading through a lot of xarray bug reports, it's clear that it is difficult to get the right options to the right place to open a xarray data set from X3. The recoomentation is to use a tool like intake to specify the options and have it handle the negotiations with the storage system. I have done that and I can successfully read the files as a Dask backed xarray dataset.
By setting the bucket to "publically readable" I can read it when [running a notebook](https://github.com/NOAA-PMEL/GoA_xpublish/blob/main/read_s3/read_goa_kerchunk-s3.ipynb) on an EC2 instance at AWS without as far as I can tell providing credentials.

[Here is the intake catalog which defines the data and the underlying grid](intake_catalog.yml)
However, when I try the [same notebook](https://github.com/NOAA-PMEL/GoA_xpublish/blob/main/read_s3/read_goa_kerchunk-s3_from_pmel.ipynb) form pmel.noaa.gov I get an error:

[Here is the notebook that reads and plots the data.](nb.ipynb)
NoCredentialsError: Unable to locate credentials. This error was what prompted the further reading about S3, kerchunk and xarray from which I found the suggested solution of using an intake catalog.

0 comments on commit 6694136

Please sign in to comment.