how to handle `required_global_attributes` depending on `activity_id` #19

gnikulin · 2023-10-16T13:00:09Z

Should we rename cordex to cordex-cmip6, both for the repository name and for all CVs ? Many CVs are defined only for CORDEX-CMIP6 and will be different in CORDEX-CMIP7.

sethmcg · 2023-10-16T16:14:12Z

I think it makes sense to rename it. CORDEX-CMIP6 is cumbersome, especially when said aloud. How about CORDEX6, in alignment with AR6 & CMIP6?

larsbuntemeyer · 2023-10-16T17:07:03Z

actually, we discussed this in #5 . I think, there is some more or less convention to have the activity_id be the filename prefix of the tables (e.g., compare to obs4MIPs, input4MIPs)...

gnikulin · 2023-10-17T14:32:49Z

I think it makes sense to rename it. CORDEX-CMIP6 is cumbersome, especially when said aloud. How about CORDEX6, in alignment with AR6 & CMIP6?

It has been decided to use "CORDEX-CMIP6" for this activity. Indeed, it's a bit cumbersome but provides good and clear description. From the CORDEX experiment design for dynamical downscaling of CMIP6 (https://cordex.org/wp-content/uploads/2021/05/CORDEX-CMIP6_exp_design_RCM.pdf):

In addition to the continental-scale downscaling, addressed in this document,
CORDEX includes many other components. For example, the Flagship Pilot Studies
(FPS) and regional workshops for climate and VIA communities. CORDEX is a
continuous activity that is not divided into phases (1st, 2nd, etc.) and not necessarily
related to the CMIP cycles. The framework described in this document is simply
referred to as CORDEX-CMIP6.

gnikulin · 2023-10-17T14:59:49Z

actually, we discussed this in #5 . I think, there is some more or less convention to have the activity_id be the filename prefix of the tables (e.g., compare to obs4MIPs, input4MIPs)...

I'm not sure that there are build rules for the file name of the tables. Will the input4MIPs tables have the same names (without mip_era) as now for CMIP7? Other activities don't have their own table at all, e.g. ScenarioMIP etc.

CORDEX is not a CMIP6 project or activity that contributes to CMIP6 and here we have more freedom to define what's better for CORDEX. Regarding activity_id, it was suggested that in CORDEX-CMIP6 activity_id is "an identifier of different CORDEX activities as dynamical downscaling, empirical- statistical downscaling, Flagship Pilot Studies and bias adjustment (e.g. “RCM”, “ESD”, “FPS”, “Adjust”)".

Currently we have CORDEX_source_id.json assuming only RCMs as a source. However, when we are going to register ESD methods we need to distinguish this ESD source table from the RCM one, another level of complexity :-). Perhaps we may even need to add the CORDEX-CMIP6 activity_id to some CV file names, something like CORDEX-CMIP6_RCM_source_id.json, CORDEX-CMIP6_ESD_source_id.json etc. ?

sethmcg · 2023-10-17T16:16:12Z

@gnikulin - That makes sense. CORDEX-CMIP6 it is, then.

With regard to activity_id, I think we need to allow for additional cases. For example, one project I'm involved with aims to include some variable-resolution simulations in the mix for comparison with RCM downscaling. There are also efforts to train Machine Learning models to emulate RCMs. Those both will require expanding the CORDEX_source_id.json file, and in the case of ML methods, I think you have two sources: both the ML setup and the RCM it was trained to emulate. (Or possibly even multiple RCMs, if that proves feasible.)

gnikulin · 2023-10-18T08:47:03Z

I would include both limited area RCMs and VR-GCMs in the same "RCM" source_id file. There is the global attribute source_type which provides a short description of model configuration (e.g. “RCM”, “AGCM”, “RESM”, "VR-GCM", etc., all acronyms should be defined). This information can also be requested during the registration.

Regarding ML methods, I consider them as some kind of ESD and suggest to include them to the "ESD" source_id file. Information about datasets (e.g RCMs) used for training ML methods should be reflected in metadata (global attributes), can be different for the same ML method (e.g. https://cordex.org/wp-content/uploads/2017/06/CORDEX_ESD_Experiment1.pdf) Here, it is necessary to get input from the ESD community.

Creating many CVs for specific cases may make the CORDEX data infrastructure too complex. I would vote for the simplest solution.

jesusff · 2023-10-18T11:11:55Z

Promoting specific values (RCM, ESD, ...) of the controlled vocabulary to the filenames seems to break the general build rules for these files. I see no problem in merging all source_id's under a single file, given that we add the source_type to each source. Also, having different model_components or required_global_attributes depending of the source_type should not be a problem. It would be a matter of having a new CORDEX[-CMIP6?]_model_component.json listing the expected components (or global attributes, for the existing CORDEX_required_global_attributes.json) for each source_type. These two files (CORDEX_model_component.json (maybe in plural, components?) and CORDEX_required_global_attributes.json) would kind of define the source_type. Each new source_type created should add its defining components and attributes to those files.

(this thread has gone a bit off-topic from the original post)

larsbuntemeyer · 2023-10-18T13:16:46Z

Yes, i agree that is sufficient to use source_type to distinguish different types of downscaling methods (dynamic, statistical, ml) so that all types of downscaling sources can go into one source_id table.

For the required attributes to register a source_id (#4) i wouldn't make the model components a requirement but only the most basic ones, e.g., source_id, source, release_year, institution_id.

required global attributes

I am unsure about the required_global_attributes. Having different required_global_attributes depending on activity_id would require like a new CV table for each activity (in the end, everything ends up in one CV table). So, in the past, the distinction was made through product and project_id aka activity_id. I could imagine, e.g., having ML and bias adjustment models producing bias-adjusted or ml-adjusted output if they are based on output of RCM (dynamic) models. bias adjustment also had different variable names, e.g., tasAdjust instead of tas. There can still be additional attributes of course, e.g., like in ESD there was bias_adjustment but i would not make them required global attributes.

Second option would be to have another set of tables for bias adjustment and bias adjust based on the common CV in this repo. For example, for bias adjustment there would another repo of tables with the same filenames (CORDEX-CMIP6_CV.json, CORDEX-CMIP6_mon.json, etc...) but tailored for bias adjustment and if necessary adjusted output variable names. I wonder how that was done in the past since, at least, the cordex cmip5 cmor tables contain no hint on adjusted output. I guess, it was done by adjusting those tables?

jesusff · 2023-10-18T17:54:09Z

I would say different activities (see also #20) would need to define their own CV and tables, based on these general ones. It should be a matter of degenerating the CORDEX_activity_id.json to a single value, reducing other elements (e.g. CORDEX_domain_id.json) and generally adapting the rest of the elements (including the required_global_attributes) and tables to the protocols of the particular initiatives. The question is also if here we want to encompass all activities or just focus on providing the example for the dynamical downscaling on continental domains (activity_id = RCM). Even for the continental-scale domains, different domains are managing different output variable lists; mainly based on the general CORDEX one, but removing (no problem) or adding some variables. So, even for the same activity_id a separate set of tables might be needed.

larsbuntemeyer · 2023-10-20T07:49:46Z

Yes, i agree with @jesusff, other activities might have different requirements for their vocabulary that we don't even know about yet. And from my experience, users will mess with the tables anyway. The important thing is to have a vocabulary that can be used for QA for ESGF publication, althoug we don't even have a checker yet 😣 (PrePARE does only work for CMIP6)

Thanks for opening #20 !

gnikulin · 2023-10-20T16:21:03Z

If we are back to the original post :-). Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX ? My concern is that when we come to CORDEX-CMIP7 it's a bad practice to use the same file names for files with different content.

gnikulin · 2023-10-20T16:25:10Z

Promoting specific values (RCM, ESD, ...) of the controlled vocabulary to the filenames seems to break the general build rules for these files. I see no problem in merging all source_id's under a single file, given that we add the source_type to each source. Also, having different model_components or required_global_attributes depending of the source_type should not be a problem. It would be a matter of having a new CORDEX[-CMIP6?]_model_component.json listing the expected components (or global attributes, for the existing CORDEX_required_global_attributes.json) for each source_type. These two files (CORDEX_model_component.json (maybe in plural, components?) and CORDEX_required_global_attributes.json) would kind of define the source_type. Each new source_type created should add its defining components and attributes to those files.

(this thread has gone a bit off-topic from the original post)

OK, we can try to merge all source_id's under a single file and distinguish them by source_type.

sethmcg · 2023-10-20T16:47:45Z

For the required attributes to register a source_id (#4) i wouldn't make the model components a requirement but only the most basic ones, e.g., source_id, source, release_year, institution_id.

Does the source_id identify the model / method used to perform the downscaling? If so, I'm not sure that release_year and institution_id are well-defined for methods that aren't RCMs. For example, what would they be for the (simplistic but still widely-used) ESD method of interpolation + bias-correction?

Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX?

CORDEX-CMIP6 makes sense for exactly the reason you give, that we don't want ambiguity if/when we do this again later on.

larsbuntemeyer · 2023-10-20T20:25:08Z

If we are back to the original post :-). Should we use CORDEX-CMIP6 for all tables and CVs instead of simply CORDEX ? My concern is that when we come to CORDEX-CMIP7 it's a bad practice to use the same file names for files with different content.

OK, agreed, i'll rename them!

gnikulin · 2023-11-13T15:03:25Z

Second option would be to have another set of tables for bias adjustment and bias adjust based on the common CV in this repo. For example, for bias adjustment there would another repo of tables with the same filenames (CORDEX-CMIP6_CV.json, CORDEX-CMIP6_mon.json, etc...) but tailored for bias adjustment and if necessary adjusted output variable names. I wonder how that was done in the past since, at least, the cordex cmip5 cmor tables contain no hint on adjusted output. I guess, it was done by adjusting those tables?

Regarding bias-adjusted variables, all modifications of their acronyms and long names are very simple and described in the DRS for bias-adjusted CORDEX simulations http://is-enes-data.github.io/CORDEX_adjust_drs.pdf

by appending Adjust to the variable name DRS elements in file names and in NetCDF files: pr -> prAdjust, tas -> tasAdjust

long names (the long_name NetCDF attribute) have to be also modified by adding Bias-Adjusted in front of the long names Near-Surface Air Temperature –> Bias-Adjusted Near-Surface Air Temperature.

There were no specific CORDEX-CMIP5 CMOR tables for bias-adjustment.

gnikulin · 2023-11-13T15:06:57Z

I would say different activities (see also #20) would need to define their own CV and tables, based on these general ones. It should be a matter of degenerating the CORDEX_activity_id.json to a single value, reducing other elements (e.g. CORDEX_domain_id.json) and generally adapting the rest of the elements (including the required_global_attributes) and tables to the protocols of the particular initiatives. The question is also if here we want to encompass all activities or just focus on providing the example for the dynamical downscaling on continental domains (activity_id = RCM). Even for the continental-scale domains, different domains are managing different output variable lists; mainly based on the general CORDEX one, but removing (no problem) or adding some variables. So, even for the same activity_id a separate set of tables might be needed.

Actually, there is no need to create new CMOR tables for different domains if output variable lists are different. All variables should be include in the CORDEX-CMIP6 CMOR tables and each domain post-processes only a subset of them.

larsbuntemeyer · 2024-04-12T11:08:49Z

I agree, so maybe we can setup also a simple registration process for the data-request table (just give variable, frequency and some basic details) from which the tables are updated.
It would be much nicer since converting from google spreadsheets is a pain.

gnikulin · 2024-04-12T13:59:56Z

Yes, it's a good idea. The atmospheric variable spreadsheet was the first human-readable step to discuss what variables should be archived. Adding new variables indeed can be done more efficiently with a registration process for the data request table. There is a ocean variable list (https://doi.org/10.5281/zenodo.8207553), again a spreadsheet :-), that should be included.

larsbuntemeyer · 2024-04-12T14:02:03Z

OK, I can add the ocean variables, still have the script that can tackle the spreadsheets, seems to be similar format...

gnikulin · 2024-04-12T14:08:46Z

The format should be the same as the atmospheric variable spreadsheet was used as a template. The ocean list published in zenodo is pdf but there is a goggle spreadsheet as well.

gnikulin · 2024-04-12T14:11:51Z

There is also lists with aerosol variables (https://doi.org/10.5281/zenodo.7112860) and river ones (https://doi.org/10.5281/zenodo.7112673) should be checked to avoid duplication.

larsbuntemeyer changed the title ~~cordex to cordex-cmip6~~ how to handle required_global_attributes depending on activity_id Oct 18, 2023

jesusff mentioned this issue Oct 18, 2023

product vs activity_id vs source_type #20

Closed

larsbuntemeyer mentioned this issue Oct 23, 2023

change table prefix to CORDEX-CMIP6 #21

Merged

jesusff mentioned this issue Oct 23, 2023

source_id: what info is required for registration? #4

Closed

larsbuntemeyer mentioned this issue Apr 12, 2024

setup registration procedure for new variables WCRP-CORDEX/data-request-table#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to handle `required_global_attributes` depending on `activity_id` #19

how to handle `required_global_attributes` depending on `activity_id` #19

gnikulin commented Oct 16, 2023

sethmcg commented Oct 16, 2023

larsbuntemeyer commented Oct 16, 2023 •

edited

Loading

gnikulin commented Oct 17, 2023

gnikulin commented Oct 17, 2023

sethmcg commented Oct 17, 2023

gnikulin commented Oct 18, 2023

jesusff commented Oct 18, 2023

larsbuntemeyer commented Oct 18, 2023 •

edited

Loading

jesusff commented Oct 18, 2023 •

edited

Loading

larsbuntemeyer commented Oct 20, 2023 •

edited

Loading

gnikulin commented Oct 20, 2023

gnikulin commented Oct 20, 2023

sethmcg commented Oct 20, 2023

larsbuntemeyer commented Oct 20, 2023

gnikulin commented Nov 13, 2023

gnikulin commented Nov 13, 2023

larsbuntemeyer commented Apr 12, 2024

gnikulin commented Apr 12, 2024

larsbuntemeyer commented Apr 12, 2024

gnikulin commented Apr 12, 2024

gnikulin commented Apr 12, 2024

how to handle required_global_attributes depending on activity_id #19

how to handle required_global_attributes depending on activity_id #19

Comments

gnikulin commented Oct 16, 2023

sethmcg commented Oct 16, 2023

larsbuntemeyer commented Oct 16, 2023 • edited Loading

gnikulin commented Oct 17, 2023

gnikulin commented Oct 17, 2023

sethmcg commented Oct 17, 2023

gnikulin commented Oct 18, 2023

jesusff commented Oct 18, 2023

larsbuntemeyer commented Oct 18, 2023 • edited Loading

required global attributes

jesusff commented Oct 18, 2023 • edited Loading

larsbuntemeyer commented Oct 20, 2023 • edited Loading

gnikulin commented Oct 20, 2023

gnikulin commented Oct 20, 2023

sethmcg commented Oct 20, 2023

larsbuntemeyer commented Oct 20, 2023

gnikulin commented Nov 13, 2023

gnikulin commented Nov 13, 2023

larsbuntemeyer commented Apr 12, 2024

gnikulin commented Apr 12, 2024

larsbuntemeyer commented Apr 12, 2024

gnikulin commented Apr 12, 2024

gnikulin commented Apr 12, 2024

how to handle `required_global_attributes` depending on `activity_id` #19

how to handle `required_global_attributes` depending on `activity_id` #19

larsbuntemeyer commented Oct 16, 2023 •

edited

Loading

larsbuntemeyer commented Oct 18, 2023 •

edited

Loading

jesusff commented Oct 18, 2023 •

edited

Loading

larsbuntemeyer commented Oct 20, 2023 •

edited

Loading