Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Cache To Support > 128 Entries #309

Merged

Conversation

timothy-trinidad-ps
Copy link
Contributor

Fixes linkml/linkml#1973.

I'm starting with a really naive fix of just removing the entry limit for the lru_cache, but I'm open to pivoting to other solutions.

With this change, running gen-doc on the following yaml file goes down from 3+ minutes to 7s.

test.yaml.txt

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.88%. Comparing base (27b9158) to head (0be1913).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #309   +/-   ##
=======================================
  Coverage   62.88%   62.88%           
=======================================
  Files          62       62           
  Lines        8528     8528           
  Branches     2436     2436           
=======================================
  Hits         5363     5363           
  Misses       2554     2554           
  Partials      611      611           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sierra-moxon
Copy link
Member

@timothy-trinidad-ps - looking lru doc, it seems like the default is "128" items in the cache and this PR takes it to "limitless" items in the cache.

Do you think it would be helpful to take this item-number up in increments of 100 or so until we see a significant speedup without going limitless?

@timothy-trinidad-ps
Copy link
Contributor Author

I'm open to it, but any other number seemed just as arbitrary to me as 128. If I up it to 256 I know that we will run into this problem again in the next 6 months (we're in the process of annotating our database tables).

The obvious downside to this would be memory leaks for very large schemas, but I'm not entirely sure how to measure that. I think memory usage growing proportional to the schema size would be more expected than an arbitrary threshold after which the program effectively stops working.

@sierra-moxon
Copy link
Member

I can agree with that :) - is there anything we can do to prevent a very large schema from running into problems with too big a cache? Said another way, what is the largest schema size you've tested this on so far?

@timothy-trinidad-ps
Copy link
Contributor Author

Just this 129. I can automate creating a schema with 1024 tables, but the specific use case I'm testing with is with gen-doc and the functions called by its dependent templates. What I would need help with is determining how to test other use cases of schemaview.py when it comes to memory usage.

@timothy-trinidad-ps
Copy link
Contributor Author

timothy-trinidad-ps commented Mar 15, 2024

I just ran a benchmark for a file with 2,048 classes and 10,240 slots - it took about 4 minutes with a peak of about 450 MB memory (at least for the gen-doc use case).
test.yaml.txt

Python_47340

@sierra-moxon sierra-moxon self-requested a review March 15, 2024 20:20
@cmungall cmungall merged commit 5403612 into linkml:main Mar 18, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gen-doc Hangs / Slows Down With >128 Classes
3 participants