Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for usage of "extra_types" #148

Open
ollie-bell opened this issue Nov 26, 2022 · 3 comments
Open

Add documentation for usage of "extra_types" #148

ollie-bell opened this issue Nov 26, 2022 · 3 comments

Comments

@ollie-bell
Copy link

The parse function docstring mentions documentation for extra_types argument, but I cannot find it. Can it be added?

The default types are not sufficient for my needs. e.g. I would like to be able to enforce letters and numbers ONLY in a field of the format string (currently there is one for letters, numbers and underscores which doesn't work for me, as my format strings are all underscore delimited). I also have datetime strings of format %Y%m%d which doesn't seem to be supported for datetime parsing.

@jenisys
Copy link
Contributor

jenisys commented Nov 26, 2022

The module docstring indirectly provides examples how to use it, in the @parse.with_pattern decorator section (but it does not mention the parameter extra_types by name, it just uses the parameter in function calls).

SEE ALSO:

OTHERWISE:
Provide your own types / type-converters via extra_types.

@jenisys
Copy link
Contributor

jenisys commented Nov 26, 2022

@ollie-bell
The ti type (ISO date or ISO timestamp, like: 2022-11-26 or 2022-11-26T12:04:23) comes pretty close to what you need.

@ollie-bell
Copy link
Author

ollie-bell commented Nov 27, 2022

@jenisys thanks for pointing that out to me. Tbh I'm not really experienced with string manipulation and regex etc. so I don't understand the usage very well. I'm hoping someone can just help me out with a question :D

I have formats like

"{variable}_{domain}_{GCMsource}_{scenario}_{member}_{RCMsource}_{RCMversion}_{frequency}_{start:d}-{end:d}.nc"
"{variable}_{frequency}_{source}_{scenario}_{member}_{grid}_{start:d}-{end:d}.nc"

and strings to parse like

"pr_EUR-11_CNRM-CERFACS-CNRM-CM5_rcp45_r1i1p1_CLMcom-CCLM4-8-17_v1_day_20060101-20101231.nc"
"pr_day_MIROC-ESM-CHEM_rcp45_r1i1p1_gr1_20310101-20310101.nc"

As you can see, each {field} in the format is made up of a mix of letters, numbers and dashes (-). Underscores are always used to separate fields and never within the fields themselves.

I'd like to add a format specification within each {field} which constrains it to letters, numbers and dashes only. Curently the :w specification does letters, numbers and underscore.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants