ADD: ESGF project abstraction #63

nocollier · 2024-07-29T21:21:58Z

As part of #47 Max and I discussed no longer trying to guess the columns of the dataframe (which define the project Dataset record id) from the response itself. CMIP6 does a good job of this but the prior projects do not and it leads to a good bit of code complexity. Instead, we will just make the projects we support explicit and use an abstract base class to define how facets are used in the code. These are things like, what facet is used for the variable? It is variable in CMIP3 and 5 but variable_id in CMIP6. Technically this limits how intake-esgf can be used but not in a way that limits what any users are doing now. If a user needs another project supported, we simply add a an implementation of the project base class.

Add project abstraction for CMIP{3|5|6} and use it to define columns in the dataframe.
Remove the need to specify particular facets in the code, the project abstraction should serve all these needs (i.e. model groups, relaxation facets, etc)

nocollier · 2024-07-29T21:24:27Z

@mgrover1 See what you think about the projects.py file I added.

…t. This breaks ensemble means, but we will restore it later

mgrover1 · 2024-07-31T16:41:00Z

I like this approach @nocollier - I think docs will be critical here in terms of clarifiying the terminology and such, but the idea of using the base class, adding the not-implemented errors, and building out for CMIP3, 5, and 6 looks great.

The other approach here would be use some sort of json/yaml spec, but the use of classes seems more straightforward/error-prone.

Please let me know when you would me to give a more thorough review - but it looks like this is the approach to go with

nocollier · 2024-07-31T17:55:04Z

Thanks for taking an early look. Just trying not to paint myself into a corner. In reviewing the code I have realized that a lot of the functionality (automatic cell measures for example) have cmip6 specific code anyway which this patch should fix. I agree documentation about the terminology will be key, I have short descriptions started in the base class but maybe they should just be repeated for the particularizations?

…m is that the search facets are not in the attributes. Publishing has been very sloppy. We will fix this in the to_dataset_dict() refactor by inserting all search facets into the global attributes

nocollier · 2024-08-01T19:21:32Z

The more that I look into the code, the more I realize that a lot of the nice features won't work for non-CMIP6 projects. One issue is that I assumed that the control vocabulary was always present in the dataset global attributes--this isn't true. I am disabling some things for now, but will restore them in the to_dataset_dict() rework. If we can insert the facets (those that make up the master_id) into the datasets, this would fix the poor publishing.

initial work to abstract ESGF projects and their facets

ac6f504

nocollier added 3 commits July 30, 2024 10:23

ensure that the project is always part of the returned dataframe

221c57f

combining dataframes now uses the project master_id facets

5d6958e

search criteria now seeks to extract master_id facets from the datase…

e5bf8cb

…t. This breaks ensemble means, but we will restore it later

nocollier added 2 commits August 1, 2024 14:34

remove specific facet references from catalog

5a2f563

Adding cell measures will not work for non-cmip6 projects. The proble…

7de6304

…m is that the search facets are not in the attributes. Publishing has been very sloppy. We will fix this in the to_dataset_dict() refactor by inserting all search facets into the global attributes

nocollier marked this pull request as ready for review August 1, 2024 19:13

nocollier requested a review from mgrover1 as a code owner August 1, 2024 19:13

nocollier merged commit c4848e7 into main Aug 2, 2024
13 checks passed

nocollier deleted the rework branch August 2, 2024 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADD: ESGF project abstraction #63

ADD: ESGF project abstraction #63

nocollier commented Jul 29, 2024 •

edited

Loading

nocollier commented Jul 29, 2024

mgrover1 commented Jul 31, 2024

nocollier commented Jul 31, 2024

nocollier commented Aug 1, 2024

ADD: ESGF project abstraction #63

ADD: ESGF project abstraction #63

Conversation

nocollier commented Jul 29, 2024 • edited Loading

nocollier commented Jul 29, 2024

mgrover1 commented Jul 31, 2024

nocollier commented Jul 31, 2024

nocollier commented Aug 1, 2024

nocollier commented Jul 29, 2024 •

edited

Loading