Skip to content

Commit

Permalink
Merge pull request #48 from wcschultz/data_access_nb
Browse files Browse the repository at this point in the history
Adding OpenUniverse Data Access to the Data Access Notebook
  • Loading branch information
tddesjardins authored Sep 24, 2024
2 parents 6aedd1e + d7030cf commit 1ae8cdc
Showing 1 changed file with 102 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,9 @@
"\n",
"Though Roman data will eventually be available through MAST, we currently offer a small set of simulated data available in a separate S3 bucket. These files can be streamed in exactly the same way as the HST FITS file above. Additionally, we can browse the available files similarly to a Unix terminal. A full list of commands can be found in the `s3fs` documentation [here](https://s3fs.readthedocs.io/en/latest/api.html#).\n",
"\n",
"The S3 bucket containing the data is currently only open to the public on the science platform where we have managed the permissions so none need to be specified explicitly. Because of the required permissions, the cell below will not work on a private comuter."
"The S3 bucket containing the data is currently only open to the public on the science platform where we have managed the permissions so none need to be specified explicitly. Because of the required permissions, many of the below cells will not work on a private comuter.\n",
"\n",
"There are currently three different data sources within the Roman science platform. We can view them by perfoming a list command (`ls`) on the the main science platform directory."
]
},
{
Expand All @@ -260,9 +262,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The `fs.ls()` command allows us to list the contents of the URI. In the above example, the `roman-sci-test-data-prod-summer-beta-test` s3 bucket contains two directories:\n",
"The `fs.ls()` command allows us to list the contents of the URI. In the above example, the `roman-sci-test-data-prod-summer-beta-test` S3 bucket contains three directories:\n",
"- `ROMANISIM` contains the simulated WFI-imaging mode Roman Space Telescope data used in this suite of notebooks\n",
"- `STIPS` contains data for the Space Telescope Image Product Simulator (STIPS) notebook (Notebook link: [stips.ipynb](../stips/stips.ipynb))\n",
"- `OPEN_UNIVERSE` contains data from the OpenUniverse 2024 Matched Rubin and Roman Simulation preview provided by NASA/IPAC Infrared Science Archive (IRSA) at Caltech. \n",
"\n",
"In the next subsection we will explore opening data files made using Roman I-Sim, which are stored in the `ROMANISIM` S3 directory. These simulations are saved in the same file formats as observed Roman data will be and thus are useful to help develop file ingestion pipelines. Unfortunately, Roman I-Sim has not been used to extensively simulate survey data. \n",
"\n",
"In the final subsection, we will explore how to open the OpenUniverse preview data (in the `OPEN_UNIVERSE` S3 directory). The OpenUniverse collaboration has simulated extensive datasets from two core community surveys: the High Latitude Time Domain and Wide Area Surveys (HLTDS and HLWAS). Though they have only provided a preview of the full simulation suite, the quantity of data is still sufficient to start creating data pipelines to analyze Roman data.\n",
"\n",
"A full description of the provided data products and simulation methodologies can be found the two linked MNRAS papers in [Additional Resources](#Additional-Resources) below, and an overview is provided in [Simulated Data Products](../../../markdown/simulated-data.md)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Opening Roman I-Sim Models\n",
"\n",
"Diving into the `ROMANISIM` directory, we find three folders:\n",
"- `CATALOGS_SCRIPTS`: contains stellar and galactic catalogs used to create the simulated data stored in the other directories\n",
Expand All @@ -288,6 +304,82 @@
"print(dm.info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Opening OpenUniverse Models\n",
"\n",
"The subset of data that IPAC has shared is hosted in their own S3 bucket, detailed on the [OpenUniverse AWS Open Data](https://registry.opendata.aws/openuniverse2024/) website. Additionally, IPAC has created two [OpenUniverse notebooks](https://irsa.ipac.caltech.edu/docs/notebooks/) that highlight how you can interact with their image data and catalog files. In this notebook, we focus on how to access the files and leave the linked notebooks as resources for the user to explore.\n",
"\n",
"The simulations are natively saved as FITS files and are divided by survey (the Wide Area Survey (WAS) or the Time Domain Survey (TDS)), optical element, and HEALPix cell ([HEALPix](https://healpix.sourceforge.io) is a commonly used way to uniformly discretize the area of a sphere). Please see [Simulated Data Products](../../../markdown/simulated-data.md) for more information about the specific products provided in the Open Universe data.\n",
"\n",
"Below we provide an example of streaming a simulated \"calibrated\" image FITS file from their S3 bucket using an alternate way of streaming a FITS file. Here instead of initializing our own `S3FileSystem`, we pass the credentials (anonymous credentails in this case as the data is public) to `fits.open` and allow it to create the file system. This shorthand is covenient when the URI is specifically provided, but it is impossible to explore the S3 directory structure without initializign the `S3FileSystem`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3bucket = 's3://nasa-irsa-simulations/openuniverse2024/roman/preview/RomanWAS/images/simple_model'\n",
"band = 'F184'\n",
"hpix = '9111'\n",
"sensor = 2\n",
"s3fpath = s3bucket+f'/{band}/{hpix}/Roman_WAS_simple_model_{band}_{hpix}_{sensor}.fits.gz'\n",
"\n",
"fits_file = fits.open(s3fpath, fsspec_kwargs={'anon':True})\n",
"print(fits_file.info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For convenience, we have converted all the simulated \"calibrated\" images from FITS to ASDF files and are hosting them on the science platform's S3 bucket. In addition to the original files' data, we have also included two new features to the ASDF file:\n",
"1. We have unpacked the WCS information from the FITS metadata and created a `gwcs.WCS` object that is saved in `asdf_file['roman']['wcs']`.\n",
"2. We have queried the provided source catalogs and have included all the point sources, galaxies, and transients that are present within the field of view of the detector in `astropy.table.Table` objects that are stored directly in the ASDF files. Below we print the galaxy catalog:\n",
"\n",
"Below is an example of accessing the same file that we opened with the FITS file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3bucket = 's3://roman-sci-test-data-prod-summer-beta-test/OPEN_UNIVERSE/WAS/images/'\n",
"band = 'F184'\n",
"hpix = '9111'\n",
"sensor = 2\n",
"s3fpath = s3bucket+f'/{band}/{hpix}/roman_was_{band}_{hpix}_wfi{sensor:02D}_simple.asdf'\n",
"\n",
"fs = s3fs.S3FileSystem(anon=True)\n",
"with fs.open(s3fpath, 'rb') as file_path:\n",
" asdf_file = asdf.open(file_path)\n",
"print(asdf_file.info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice the difference when printing the file information between FITS and ASDF. ASDF provides more detail about the contents in a hierarchical structure to FITS's native printing. Additionally we can index the `asdf_file` object similarly to a Python dictionary to access the contents.\n",
"\n",
"Below we print the pre-prepared source catalog of galaxies:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(asdf_file['roman']['catalogs']['galaxies']"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -340,11 +432,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Aditional Resources\n",
"## Additional Resources\n",
"Additional information can be found at the following links:\n",
"\n",
"- [`s3fs` Documentation](https://s3fs.readthedocs.io/en/latest/api.html#)\n",
"- [Working with ASDF Notebook](../working_with_asdf/working_with_asdf.ipynb)"
"- [Working with ASDF Notebook](../working_with_asdf/working_with_asdf.ipynb)\n",
"- [OpenUniverse AWS Open Data](https://registry.opendata.aws/openuniverse2024/)\n",
"- [OpenUniverse notebooks](https://irsa.ipac.caltech.edu/docs/notebooks/)\n",
"- [Simulated Data Products Document](../../../markdown/simulated-data.md)\n",
"- [MNRAS paper detailing Open Universe data simulation methods (Troxel et al 2021)](https://ui.adsabs.harvard.edu/abs/2021MNRAS.501.2044T/abstract)\n",
"- [MNRAS paper detailing the previewed Open Universe data (Troxel et al 2023)](https://ui.adsabs.harvard.edu/abs/2023MNRAS.522.2801T/abstract)"
]
},
{
Expand All @@ -359,7 +456,7 @@
"The data streaming information from this notebook largely builds off of the TIKE data-acces notebook by Thomas Dutkiewicz.\n",
"\n",
"**Author:** Will C. Schultz \n",
"**Updated On:** 2024-09-16"
"**Updated On:** 2024-09-24"
]
},
{
Expand Down

0 comments on commit 1ae8cdc

Please sign in to comment.