Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update catalogs for xscen #68

Open
aulemahal opened this issue Feb 3, 2023 · 0 comments
Open

Update catalogs for xscen #68

aulemahal opened this issue Feb 3, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@aulemahal
Copy link

aulemahal commented Feb 3, 2023

Original issue:
intake-esm uses cat.esmcat.aggregation_control.variable_column_name to configure which column stores which variables are in which entry. When it is set, a search like : cat.search(variable=['tas']).to_dataset_dict() will only return tas, and not other variables in the same files. Also, it makes possible the use of the DerivedVariableRegistry with which we can convert data on-the-fly. (Ex: dtr from tasmin and tasmax).

The current intake-esm (even in master) will break if aggregation_control is not given. Also, a fix needs to be implemented to support OpenDAP links. But even if we fix those (see my PR on intake-esm), we are excluding the
PAVICS catalog from useful features by not setting this field.

EDIT: Two PRs on intake-esm have been made:

  • Aggregation control is now optional
  • Format== "opendap" is supported

However, the current catalogs have a few others caveats.

Dataset "ID"

Intake_esm won't build the dataset "keys" with fields from columns with both NaNs and values. When "aggregation control" is not given, keys are built by concatenating all columns. Thus, to work with intake-esm, pavics' catalogs must have values for each entry and column. For example, this is isn't true for the "biasadjusted" catalog, where driving_institution is empty for some datasets. Thus, we need to either fill the columns or to have another column acting as a complete dataset id (like xscen does). The current dataset_id does not contain the driving_institution information, so if it is used as a key, intake will receive multiple assets without knowing how to merge them.

xscen

Overall, the catalogs are not easy to work with with xscen. AFAIU, the current catalogs are not used by anyone ? If so, I suggest we copy the xscen vocabulary (column names), allowing an easier interaction, without losing human-readable information. This might necessitate some complex attribute parsing though, as the ncmls on pavics might not carry those attribute as-is.

@aulemahal aulemahal added enhancement New feature or request question Further information is requested labels Feb 3, 2023
@aulemahal aulemahal changed the title No variable subsetting possible No variable subsetting, conversion possible Feb 3, 2023
@huard huard self-assigned this Feb 3, 2023
@huard huard moved this to Todo in WP4 - Climate services Feb 6, 2023
@aulemahal aulemahal changed the title No variable subsetting, conversion possible Update catalogs for xscen Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants