Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OWL2 representation #30

Closed
jamesamcl opened this issue Oct 9, 2024 · 12 comments · Fixed by #32
Closed

Add OWL2 representation #30

jamesamcl opened this issue Oct 9, 2024 · 12 comments · Fixed by #32
Assignees
Labels
enhancement New feature or request

Comments

@jamesamcl
Copy link

It would be nice to have an OWL2 RDF/XML representation alongside the OBO file. Then we can add medgen to OLS.

@joeflack4 joeflack4 self-assigned this Oct 10, 2024
@joeflack4 joeflack4 added the enhancement New feature or request label Oct 10, 2024
@joeflack4
Copy link
Contributor

That sounds like a great idea!

@joeflack4
Copy link
Contributor

Hey @jamesamcl, unfortunately this is a bigger ask than I realized. It was just a couple lines to add the robot convert step, but the problem is file size. While medgen.obo is ~450mb, my local medgen.owl is so 2.7g. Similarly the medgen-disease-extract.obo is 200mb but the medgen-disease-extract.owl is 2.2g. These sizes are greater than can be uploaded to GitHub releases (2g).

I wonder if the problem could be solved simply by minifying. I've never done this to XML, personally, much less RDF/XML.

Do you know if OLS supports any other formats? I wonder if there are any others that would be small enough when converted.

@jamesamcl
Copy link
Author

I think the easiest option would be to gzip it and add support to OLS for gzipped files. I’ll open an issue on OLS and cross reference. Thanks so much for looking into this!

@joeflack4
Copy link
Contributor

Oh, OK cool! I like this solution. Looks like you've already made some headway. I'll implement that on this side shortly.

@joeflack4 joeflack4 linked a pull request Oct 18, 2024 that will close this issue
@joeflack4
Copy link
Contributor

Hey @jamesamcl, I tried to set us up for .owl.gz today, but the GH action is failing. I forgot, there is a blocker:

I tried macos as well, which has more memory than ubuntu, but still no go.

I could potentially try using some alternative robot convert formats. I think it will have the same problem though. If you want, I could give it a go, though. Do you know if OLS would support any of these other formats?

I know won't support this format:

Tried already but running out of memory:

Haven't tried yet, though I imagine will have the same problem:

@joeflack4
Copy link
Contributor

@twhetzel FYI I spent a short amount of time on this but hit a block due to memory issues. If you happen to know an alternate way to do a conversion and use less memory, let me know.

I could possibly try owltools instead, but I'm not optimistic it'll solve.

@jamesamcl
Copy link
Author

jamesamcl commented Oct 18, 2024

I just tried locally:

robot convert --input medgen.obo --output medgen.owl peaked at 16.1 GB
robot convert --input medgen.obo --output medgen.ttl peaked at 14.5 GB

So it seems like ttl might get us just under the memory ceiling which is apparently 16 GB for a standard GitHub runner. Turtle is supported by OLS.

@joeflack4
Copy link
Contributor

joeflack4 commented Oct 18, 2024

Thanks for looking into this! That's great to see that .ttl is clocking in under that limit; hopefully that remains consistent.

Last time I looked into runners, I think the memory limits were a lot lower. Perhaps we can manage!


I changed the runner back to ubuntu which has the 16G memory, but unfortunately for whatever reason it's still running out of memory in the action.

Other options:
a. FastOBO: Chris recommended but I haven't looked into.
b. Smaller structure: Nico:

I think in this case the best way to deal with the memory issue is to drastically reduce the size of the medgen dump you load into OLS: just labels, xrefs and subclass. That should shrink the size
c. ROBOT_JAVA_ARGS

@balhoff
Copy link
Member

balhoff commented Oct 18, 2024

@joeflack4 are you setting the max heap size? This succeeded for me:

export ROBOT_JAVA_ARGS=-Xmx12G
robot convert -i medgen.obo -o medgen.ttl

@joeflack4
Copy link
Contributor

I was thinking of trying that, but in the past when I've tried that with robot I've had it still run out of memory. But perhaps you're right! I'll give it a go.

@balhoff
Copy link
Member

balhoff commented Oct 19, 2024

@joeflack4 for Java programs I would never leave that to chance, but have to make sure it fits within the machine you're running on. There is some overhead so always make it less than the actual RAM.

@joeflack4
Copy link
Contributor

Yeah, I'm not sure why this hasn't helped in the past. Perhaps because what I was running just required way too much memory, and couldn't be set to the threshold I wanted.

In any case, good news is that it worked this time!:

@joeflack4 joeflack4 linked a pull request Oct 19, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants