Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable the LightRDF-based RDF/XML check. #928

Merged
merged 2 commits into from
Nov 24, 2023

Conversation

gouttegd
Copy link
Contributor

The ODK used to have a validation check that ensured that RDF/XML files were valid. The check was based on LightRDF, which was fast enough to make it reasonable to enable the check by default.

We had to disable that check because of a bug in LightRDF that caused the library to sometimes fail to parse perfectly valid RDF/XML files (#745). That bug has been fixed, so now we can re-enable the check by default.

The check-rdfxml script is now invoked by default, and by default is uses the fast LightRDF-based check. A new ODK option is added (extra_rdfxml_checks): when enabled, it instructs the check-rdfxml script to also perform the Jena- and RDFLib-based checks (which are not enabled by default as they are more time-consuming).

closes #892

The ODK used to have a validation check that ensured that RDF/XML files
were valid. The check was based on LightRDF, whichwas fast enough to
make it possible to enable the check by default.

We had to disable that check because of a bug in LightRDF that caused
the library to sometimes fail to parse perfectly valid RDF/XML files
(#745). That bug has been fixed, so now we can re-enable the check by
default.

The check-rdfxml script is now invoked by default, and by default is
uses the fast LightRDF-based check. A new ODK option is added
(extra_rdfxml_checks): when enabled, it instructs the check-rdfxml
script to *also* perform the Jena- and RDFLib-based checks (which are
not enabled by default as they are more time-consuming).
@gouttegd gouttegd self-assigned this Sep 26, 2023
@gouttegd
Copy link
Contributor Author

PR blocked until the next time the Python package constraints are updated, so that the ODK gets the new version of LightRDF (either 0.3.2 or 0.4.0) in which the aforementioned bug has been fixed.

@gouttegd gouttegd marked this pull request as ready for review October 3, 2023 13:47
@gouttegd gouttegd requested a review from matentzn October 3, 2023 13:48
@matentzn
Copy link
Contributor

matentzn commented Oct 3, 2023

Have you tried validating a bunch of release files like Uberon CL FBBT to see that any issues they might reveal are actually solvable?

@gouttegd
Copy link
Contributor Author

gouttegd commented Oct 5, 2023

Not sure I understand your question. As far as I know FBbt, Uberon, and CL have never produced invalid RDF/XML files.
What potentially (un)solvable issues are you talking about?

@matentzn
Copy link
Contributor

matentzn commented Oct 5, 2023

Concretely: Did you checked the that current OWL API serialiser produces valid RDFXML according the LightRDF for a variety of key ontologies?

@gouttegd
Copy link
Contributor Author

gouttegd commented Oct 5, 2023

No, but if the OWL API was producing invalid RDF/XML, I sure hope it would have been caught before by the OWL API’s own test suite. The goal here is not to check the OWL API, but to catch possibly invalid contents that the OWL API will not produce itself but that it may let pass (typically an invalid IRI).

I have not tested now, but no such contents was present in either Uberon, CL, or FBbt back when that check was first implemented in September 2022 and before it was removed because of the LightRDF bug 2 months later.

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let merge to dev and we will see immediately.

@gouttegd
Copy link
Contributor Author

@matentzn Did you get to test that on -dev? It’s been fine working fine for me so far.

@matentzn
Copy link
Contributor

I tested this with two ontologies, good to go!

@gouttegd gouttegd merged commit 3b11a54 into master Nov 24, 2023
1 check passed
@gouttegd gouttegd deleted the re-enable-lightrdf-based-check branch November 24, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add an RDF validator to check for spaces in IRIs
2 participants