Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linkml_files #310

Merged
merged 4 commits into from
Mar 28, 2024
Merged

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Mar 18, 2024

on windows, the paths for local model files look like this:

'D:\\a\\linkml\\linkml\\.venv\\lib\\site-packages\\linkml_runtime\\linkml_model\\model/schema/types.yaml'

which seems to work, but it probably shouldn't.

change hardcoded / separators to use os.path.join to make paths correct on windows

Related to: linkml/linkml#1976

Copy link

codecov bot commented Mar 18, 2024

Codecov Report

Attention: Patch coverage is 78.75000% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 64.33%. Comparing base (33ca663) to head (c072024).
Report is 12 commits behind head on main.

Files Patch % Lines
linkml_runtime/linkml_model/linkml_files.py 78.75% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #310      +/-   ##
==========================================
+ Coverage   62.92%   64.33%   +1.40%     
==========================================
  Files          62       62              
  Lines        8545     8580      +35     
  Branches     2437     2440       +3     
==========================================
+ Hits         5377     5520     +143     
+ Misses       2557     2450     -107     
+ Partials      611      610       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

I think we should use pathib over os.path but either is infinitely better than hardcoding slash

@sneakers-the-rat
Copy link
Contributor Author

I think we should use pathib over os.path but either is infinitely better than hardcoding slash

just trying to follow the surrounding conventions :) ez to change

@sneakers-the-rat
Copy link
Contributor Author

OK i went to swap out os.path.join with pathlib, and when i went to write a simple test that the generated paths exist, it turns out that most of them don't and there were a lot of other problems with the module, and since this is something i'd like to be able to rely on i went ahead and rewrote most of it.

Problems:

  • Many of the paths were wrong, the directory structure changed, particularly for the github paths.
  • Several formats were missing - in the generated output, but not in the format mapping
  • HTML didn't exist
  • you can't have an enum that has multiple items with the same value, they end up shadowing each other, so the Format and Path enum didn't work.
  • several of the formats only have meta, but there was no indication of that
  • the URL building methods were mixed up with the path building methods

So this PR

  • swaps out os.path.join with pathlib
  • Replaces Format with an Enum that keeps all the items
  • Moves the extension specification out of Format and into _Path
  • Specifies directories and extensions with FormatPath within _Path
  • Return PathParts from _build_path rather than a string, because paths and URLs need to be built differently
  • corrects github paths
  • corrects github_io paths
  • indicates which formats only generate META
  • tests all of the above.

You'll notice a lot of skipped tests. those are the ones that make network requests, which we need to cache in the same way we do with linkml. I noticed this repo is still using unittest in its CI, and didn't want to swap that out here too, so whenever we switch to pytest and add requests_cache those tests shoudl automatically switch back on.

@sneakers-the-rat sneakers-the-rat changed the title Use os.path.join instead of hardcoding slash in local path functions Fix linkml_files Mar 21, 2024
@sneakers-the-rat
Copy link
Contributor Author

ope i guess the tests aren't being run? idk how unittest works. ¯_(ツ)_/¯ if ya want 'em they're there!

@cmungall
Copy link
Member

Thanks!

I'd like to test this with the main linkml project before merging into main, as there are possibly some assumptions on the enum values in linkml.__init()__ and possibly other places.

I realize the current SOP for these kinds of tests is suboptimal, so we can strategize ways to improve things.

I also realize I'm being very conservative here because as you point out a lot of this is legacy and things didn't break when paths changed..

I'm pretty certain this PR represents a much better way of doing things than before. But another approach might be to eliminate dependency on this file altogether. The metamodel should largely not operate so differently from other models, and including the cross product of all schema modules times formats is not a scalable strategy.

See
linkml/linkml#546

@sneakers-the-rat
Copy link
Contributor Author

I realize the current SOP for these kinds of tests is suboptimal, so we can strategize ways to improve things.

We could make a manually triggered action that runs the linkml tests along with the PR version of linkml-runtime, and make a branch protection rule that says that action must be run and passing before merging to main? that would make it so we don't have to run all the tests all the time, but since these are tightly coupled packages, ensures that changes here don't break linkml.

But another approach might be to eliminate dependency on this file altogether. The metamodel should largely not operate so differently from other models,

I think it would be very useful to be able to at least get a reliable handle to the metamodel schema files from linkml-runtime - if not for perf reasons like using local schema rather than always dereferencing, then for introspection reasons. i agree that metamodel shouldn't be that special cased, but it still is a little special.

I could go either way on the github links part, i am not really sure when that would be useful.

including the cross product of all schema modules times formats is not a scalable strategy.

true, but with caching these tests would be pretty fast:

  • the file existence tests take essentially no time
  • the github API call is the slowest part, but is the same across all tests and so can be cached
  • the simple gets are not that expensive, but still sorta expensive, but if we have a module that says it provides links, then we should ensure that they work.

so if perf is a concern, imo we should drop the link generation part.

@cmungall
Copy link
Member

I think it would be very useful to be able to at least get a reliable handle to the metamodel schema files from linkml-runtime

oh absolutely - my point was just that having an enormous list of N x M constants is not necessarily the best way to do this

@sneakers-the-rat
Copy link
Contributor Author

sneakers-the-rat commented Mar 22, 2024

ah i see. yes totally agreed - ideally this metadata should come from the generators, no? or maybe projectgen? i'm actually still a little foggy on the workflow for updating the linkml_model module but ya for sure on the same page re: a ton of constants not being a good longterm strategy

the thing we want here is 'given the metamodel and {projectgen or however the linkml_model is generated}, we should expect to have these files and these links,' so if it was possible to introspect that expectation from the generation process that sounds more sustainable. maybe this PR is just to fix this module for now and we make that as a TODO on it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants