Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADD: ESGF project abstraction #63

Merged
merged 6 commits into from
Aug 2, 2024
Merged

ADD: ESGF project abstraction #63

merged 6 commits into from
Aug 2, 2024

Conversation

nocollier
Copy link
Member

@nocollier nocollier commented Jul 29, 2024

As part of #47 Max and I discussed no longer trying to guess the columns of the dataframe (which define the project Dataset record id) from the response itself. CMIP6 does a good job of this but the prior projects do not and it leads to a good bit of code complexity. Instead, we will just make the projects we support explicit and use an abstract base class to define how facets are used in the code. These are things like, what facet is used for the variable? It is variable in CMIP3 and 5 but variable_id in CMIP6. Technically this limits how intake-esgf can be used but not in a way that limits what any users are doing now. If a user needs another project supported, we simply add a an implementation of the project base class.

  • Add project abstraction for CMIP{3|5|6} and use it to define columns in the dataframe.
  • Remove the need to specify particular facets in the code, the project abstraction should serve all these needs (i.e. model groups, relaxation facets, etc)

@nocollier
Copy link
Member Author

@mgrover1 See what you think about the projects.py file I added.

@mgrover1
Copy link
Member

I like this approach @nocollier - I think docs will be critical here in terms of clarifiying the terminology and such, but the idea of using the base class, adding the not-implemented errors, and building out for CMIP3, 5, and 6 looks great.

The other approach here would be use some sort of json/yaml spec, but the use of classes seems more straightforward/error-prone.

Please let me know when you would me to give a more thorough review - but it looks like this is the approach to go with

@nocollier
Copy link
Member Author

Thanks for taking an early look. Just trying not to paint myself into a corner. In reviewing the code I have realized that a lot of the functionality (automatic cell measures for example) have cmip6 specific code anyway which this patch should fix. I agree documentation about the terminology will be key, I have short descriptions started in the base class but maybe they should just be repeated for the particularizations?

…m is that the search facets are not in the attributes. Publishing has been very sloppy. We will fix this in the to_dataset_dict() refactor by inserting all search facets into the global attributes
@nocollier nocollier marked this pull request as ready for review August 1, 2024 19:13
@nocollier
Copy link
Member Author

The more that I look into the code, the more I realize that a lot of the nice features won't work for non-CMIP6 projects. One issue is that I assumed that the control vocabulary was always present in the dataset global attributes--this isn't true. I am disabling some things for now, but will restore them in the to_dataset_dict() rework. If we can insert the facets (those that make up the master_id) into the datasets, this would fix the poor publishing.

@nocollier nocollier merged commit c4848e7 into main Aug 2, 2024
13 checks passed
@nocollier nocollier deleted the rework branch August 2, 2024 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants