You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Original issue: intake-esm uses cat.esmcat.aggregation_control.variable_column_name to configure which column stores which variables are in which entry. When it is set, a search like : cat.search(variable=['tas']).to_dataset_dict() will only return tas, and not other variables in the same files. Also, it makes possible the use of the DerivedVariableRegistry with which we can convert data on-the-fly. (Ex: dtr from tasmin and tasmax).
The current intake-esm (even in master) will break if aggregation_control is not given. Also, a fix needs to be implemented to support OpenDAP links. But even if we fix those (see my PR on intake-esm), we are excluding the
PAVICS catalog from useful features by not setting this field.
EDIT: Two PRs on intake-esm have been made:
Aggregation control is now optional
Format== "opendap" is supported
However, the current catalogs have a few others caveats.
Dataset "ID"
Intake_esm won't build the dataset "keys" with fields from columns with both NaNs and values. When "aggregation control" is not given, keys are built by concatenating all columns. Thus, to work with intake-esm, pavics' catalogs must have values for each entry and column. For example, this is isn't true for the "biasadjusted" catalog, where driving_institution is empty for some datasets. Thus, we need to either fill the columns or to have another column acting as a complete dataset id (like xscen does). The current dataset_id does not contain the driving_institution information, so if it is used as a key, intake will receive multiple assets without knowing how to merge them.
xscen
Overall, the catalogs are not easy to work with with xscen. AFAIU, the current catalogs are not used by anyone ? If so, I suggest we copy the xscen vocabulary (column names), allowing an easier interaction, without losing human-readable information. This might necessitate some complex attribute parsing though, as the ncmls on pavics might not carry those attribute as-is.
The text was updated successfully, but these errors were encountered:
Original issue:
intake-esm
usescat.esmcat.aggregation_control.variable_column_name
to configure which column stores which variables are in which entry. When it is set, a search like :cat.search(variable=['tas']).to_dataset_dict()
will only returntas
, and not other variables in the same files. Also, it makes possible the use of theDerivedVariableRegistry
with which we can convert data on-the-fly. (Ex:dtr
fromtasmin
andtasmax
).The current
intake-esm
(even in master) will break ifaggregation_control
is not given. Also, a fix needs to be implemented to supportOpenDAP
links. But even if we fix those (see my PR onintake-esm
), we are excluding thePAVICS catalog from useful features by not setting this field.
EDIT: Two PRs on intake-esm have been made:
However, the current catalogs have a few others caveats.
Dataset "ID"
Intake_esm won't build the dataset "keys" with fields from columns with both NaNs and values. When "aggregation control" is not given, keys are built by concatenating all columns. Thus, to work with intake-esm, pavics' catalogs must have values for each entry and column. For example, this is isn't true for the "biasadjusted" catalog, where
driving_institution
is empty for some datasets. Thus, we need to either fill the columns or to have another column acting as a complete dataset id (like xscen does). The currentdataset_id
does not contain thedriving_institution
information, so if it is used as a key, intake will receive multiple assets without knowing how to merge them.xscen
Overall, the catalogs are not easy to work with with xscen. AFAIU, the current catalogs are not used by anyone ? If so, I suggest we copy the xscen vocabulary (column names), allowing an easier interaction, without losing human-readable information. This might necessitate some complex attribute parsing though, as the ncmls on pavics might not carry those attribute as-is.
The text was updated successfully, but these errors were encountered: