Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex in CMIP6_CV.json to test *_index attributes #281

Open
neumannd opened this issue Jan 9, 2020 · 1 comment
Open

Regex in CMIP6_CV.json to test *_index attributes #281

neumannd opened this issue Jan 9, 2020 · 1 comment

Comments

@neumannd
Copy link

neumannd commented Jan 9, 2020

The CMIP6_CV.json contains regular expressions to test the global attributes physics_index, initialization_index, forcing_index and realization_index for correctness. These global attributes should be integers (CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s). Therefore, the CMOR PrePARE.py script) just checks the type of these attributes and does not use the regex of CMIP6_CV.json.

However, the regular expression provided in CMIP6_CV.json seems to check for an arbitrary number of [ in front and ] behind the integer. I don't understand, why this is done. This seems to contradict CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s.

evaluation of the regular expression

In the CMIP6_CV.json the regex for testing the *_index attributes is written as:

^\\[\\{0,\\}[[:digit:]]\\{1,\\}\\]\\{0,\\}$

The first \ of each \\ escapes the second \. That's clear. Without escapes we have

^\[\{0,\}[[:digit:]]\{1,\}\]\{0,\}$

I assume that we have a POSIX Basic Regular Expression. That means that \[ and \] are taken literally. \{n,\} are intepreted as: "the sign/character/number left of this expression may appear n to infinite times". The ^ and $ are the beginning and end of a line, respectively. Thus, we have

^                 : beginning of the line
\[\{0,\}          : `[` appears zero to infinite times
[[:digit:]]\{1,\} : a digit between `0` and `9` appears one to infinite times
\]\{0,\}          : `]` appears zero to infinite times
$                 : end of the line

These values would be captured by the regular expression:

1
123
42
53253262

But also these values would be captured by the regular expression:

[1435]
[[123]]
[[123]
[123]]
[123]]]]]]]]]

I would have expected this regular expression

^[[:digit:]]\\{1,\\}$

or

^[[0-9]]\\{1,\\}$
^[[:digit:]]+$
^[[0-9]]+$
@neumannd
Copy link
Author

neumannd commented Jan 9, 2020

Or is this something that should be mentioned in https://github.com/PCMDI/cmor/issues/256?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant