Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Publish a list of parser-challenging valid & invalid UCUM codes #291

Open
dalito opened this issue Jan 6, 2024 · 3 comments
Open

Comments

@dalito
Copy link

dalito commented Jan 6, 2024

For implementing UCUM in software the collection of common UCUM unit codes is very helpful to create unit tests.

Also very useful would be a list of invalid UCUM codes which include invalid cases such as

m(/s)
m.(/s)
(m/s)2

A good source for such a list are bug reports in UCUM implementation and old issues which led to clarification of the specification.

@incansvl
Copy link

incansvl commented Jan 16, 2024

(Note- Response based on the original issue title that referred to a "list of invalid UCUM codes")

The exact aim/intent here needs to be clarified.

The number of possible valid UCUM codes is infinite, so the number of "all possible text strings in the universe" MINUS that number is just another infinity.

A well-implemented UCUM parser will confirm whether a specific string is a valid UCUM expression. What it WON'T tell you is if a particular expression is misleading i.e. what it conveys to a typical human reader is not what a UCUM library will make of the same expression.

It would certainly be possible to produce a "rogues gallery" of UCUM codes that either illustrate specific "foot guns" in the syntax, or are dangerous examples seen in live use. I listed some of those examples myself in the past, although as i'm now retired I don't have easy access to that work any more.

@dalito
Copy link
Author

dalito commented Jan 16, 2024

I meant it like you say in your last paragraph. A rather short list (tens) of ucum codes that challenge the parser and helps implementers to reach

A well-implemented UCUM parser will confirm whether a specific string is a valid UCUM expression.

I have some more invalid UCUM codes in the test suite of my trial on a "well-implemented" UCUM parser in Python here.

There are also some challenging valid UCUM codes like "dar" which is only parsed correctly if prefixes (or at least "da") have lower priority than unit atoms. Knowing about such cases would also help when working on a parser/validator. Some examples are in the same test file.

Since I completed the parser, such a list will be of smaller value for myself. But others may find it useful when they start or want to validate some existing code. (related #157)

@dalito dalito changed the title Proposal: Publish a list of invalid UCUM codes Proposal: Publish a list of parser-challenging valid & invalid UCUM codes Jan 16, 2024
@incansvl
Copy link

Ah, so you WERE really talking about invalid codes to challenge the parser, while I had taken it to more towards "valid but misleading" codes, which are really a different issue.

So, my mistake, but a useful clarification, thank you @dalito .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants