Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] Use yamllib if available #306

Merged
merged 4 commits into from
Mar 18, 2024
Merged

Conversation

sneakers-the-rat
Copy link
Contributor

@sneakers-the-rat sneakers-the-rat commented Mar 13, 2024

Loading yaml is actually really slow in pure python! and we load a lot of yaml!

There's a free perf boost we can get by just using the compiled yamllib versions if they are available (they usually are).

The only failure I noticed was that there was a single \t character at the start of a multiline string in the biolink model that cause s parsing error, but once that was fixed it was exactly the same just way faster.

For YAMLLoader.load_as_dict in main linkml tests, (cumulative time in seconds, --with-slow)

Total test time load_as_dict per call
Before PR 1527s 451.3s .1136s
This PR 1334s 243.9s .062s
Change. -193s -207s -.0516
Change: -12.6% -46%

all ~ for free ~

Same as #307 , i have no idea why these tests are failing. i didn't touch anything near that test, and this is a totally orthogonal change to 307 having the same error, and one of the tests here miraculously passed, so i think this is a flaky test rather than anything I did.

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.90%. Comparing base (27b9158) to head (82fcfa6).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #306      +/-   ##
==========================================
+ Coverage   62.88%   62.90%   +0.01%     
==========================================
  Files          62       62              
  Lines        8528     8532       +4     
  Branches     2436     2436              
==========================================
+ Hits         5363     5367       +4     
  Misses       2554     2554              
  Partials      611      611              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cmungall
Copy link
Member

cmungall commented Mar 13, 2024

thx!

Looks like one failing test

We also want to consider whether to incorporate ruamel.yaml into our strategy. We use it in some places as it has nice features like comment preservations (very useful for diff/patch workflows). Not sure how it compares on performance

https://yaml.readthedocs.io/en/latest/pyyaml/

We currently have incomplete behavior with decimals using pyyaml:
yaml/pyyaml#255

See test_type_range in the compliance tests

@cmungall cmungall merged commit fad7720 into linkml:main Mar 18, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants