-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --input-format
option
#1056
base: master
Are you sure you want to change the base?
Conversation
I think this addresses the use case in the issue. Thanks! I looked into the suggestion about storing the |
@beckyjackson I am very happy you have been able to work on this - from a user perspective this will solve many of my debugging problems. Thank you :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite some changes to the core loading process. I went through it line by line, but remind me to do some extensive testing before the next release.
case "owx": | ||
return Sets.newHashSet(new OWLXMLParserFactory()); | ||
case "owl": | ||
return Sets.newHashSet(new RDFXMLParserFactory(), new RioRDFXMLParserFactory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add the RIO parser here? Shouldnt we be more strict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong feelings one way or another. I figured it would be used if RDFXML failed without --input-format
anyway? @jamesaoverton do you have a preference for excluding Rio parsers?
case "ttl": | ||
return Sets.newHashSet(new RioTurtleParserFactory(), new TurtleOntologyParserFactory()); | ||
case "json": | ||
return Sets.newHashSet(new RioJsonParserFactory(), new RioJsonLDParserFactory()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the OBOGraphs parser (stand-alone) or none at all I think.
TBH, I think the --input-format options should reflect identically the -f
option. Why would we care about RioJson and not about the other more outlandish parsers like KRSS? I would want my parsing to fail if it try parsing with the exact parsers I specify - not because for some reason the second json parser in the list was successful because of its crappy implementation..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that we should just divide up the parsers that OWLAPI currently uses, but you're right that if we're adding a new option then we don't have to be constrained in that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you do it for remove
is IMO the best. We use the same extensions:parser mappings here as in -f
, but, if we ever so want to (I don't), add a bunch more switches later for compare complex combinations any-owl
for ofn, owx, rdfxml, KRSSParser for krss etc (analogous to axiomtype option in --axioms
). I would not bother here - the -f
should be by far sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent quite a bit of time digging around OWLAPI and could not find anything for OBOGraphs. Here's a full list of the parsers in a default manager (OWLManager.createOWLOntologyManager()
):
- RDFXMLParserFactory
- OWLXMLParserFactory
- OWLFunctionalSyntaxOWLParserFactory
- RioTurtleParserFactory
- ManchesterOWLSyntaxOntologyParserFactory
- RioNQuadsParserFactory
- RioJsonParserFactory
- RioNTriplesParserFactory
- RioTrigParserFactory
- RioBinaryRdfParserFactory
- RioJsonLDParserFactory
- RioN3ParserFactory
- RioRDFXMLParserFactory
- RioTrixParserFactory
- TurtleOntologyParserFactory
- OBOFormatOWLAPIParserFactory
- KRSS2OWLParserFactory
- RioRDFaParserFactory
Unless I'm missing something, I only see the Rio JSON parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OBOgraphs parser is not in owlapi.. If it is anywhere, it is in the obographs package..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had checked there as well (https://github.com/geneontology/obographs), but didn't see anything about a parser. The manager currently doesn't use OBOGraphs anyway when loading any JSON files, so it seems we're using one of the Rio parsers when we're loading in any JSON. We just use the OgJsonGenerator
to write OBO graphs JSON.
The OG reader (OgJsonReader) doesn't produce an OWLOntology.
I was curious to see what's going on... Interestingly, if you load an OG JSON file into ROBOT and print out the format:
// Load the ontology
OWLOntology loadedOntology = manager.loadOntologyFromOntologyDocument(source, config);
// Check for unparsed triples - get the document format and then the loader metadata
OWLDocumentFormat f = manager.getOntologyFormat(loadedOntology);
System.out.println(f);
...it tells you it's RDF/XML. Looks like at some point while loading, it's converted:
DEBUG Converting from JSON to RDF
...
RDF/XML Syntax
I created an issue to discuss auto-detection of formats. @beckyjackson Don't feel obligated to dive even deeper into this -- this is already a great start! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at this code because I wanted to use the same approach in a different project. But I ran into a problem there which I confirmed also is a problem here. If you limit the parsers to a particular set, all the imported ontologies must also use that format. :-(
Try making an ro.ttl and setting input format to ttl. When it tries to import http://purl.obolibrary.org/obo/ro/annotations.owl
(RDF/XML), it will fail.
@balhoff There is nothing though we can do about it, right? The only thing we can do is document that behaviour on the docs pages and that's that... |
@matentzn I think you're right, but I'm not sure we can add this feature if it is inherently broken. |
I am not going to fight for this feature. I just want to point out it is inherently broken only in cases where owl:imports are involved.. But yeah, I am not sure if this downside completely invalidates this PR. |
I think we might want to go ahead with this if we document the limitation well. I'm running into a problem in another project where I really need to specify the format. It's a huge ontology and running through all the parsers is hitting memory problems. |
I don't have any plans for this, so I'm going to convert to Draft. If someone really wants this, feel free to push it forward. |
Resolves #1038
docs/
have been added/updatedmvn verify
says all tests passmvn site
says all JavaDocs correctCHANGELOG.md
has been updatedCreates a new global
--input-format
option which accepts one of the valid formats (owl
,obo
,owx
,ofn
,omn
,ttl
, orjson
) to specify which parser(s) to use when loading the ontology.I would prefer to add
manager
to IOHelper instead of as a parameter forloadOntology
, but that would require a big refactoring of the IOHelper class and potentially introduce some breaking changes? On the other hand, I don't want to build up a bunch of tech debt and this introduces a bunch more 'loadOntology' overrides. @jamesaoverton what do you think?Last note - there were two test commands in the docs that were overwriting the example files. I fixed these so that they write to the results folder instead (had to remove an axiom to get the reason test to work).