-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Cache To Support > 128 Entries #309
Update Cache To Support > 128 Entries #309
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #309 +/- ##
=======================================
Coverage 62.88% 62.88%
=======================================
Files 62 62
Lines 8528 8528
Branches 2436 2436
=======================================
Hits 5363 5363
Misses 2554 2554
Partials 611 611 ☔ View full report in Codecov by Sentry. |
@timothy-trinidad-ps - looking lru doc, it seems like the default is "128" items in the cache and this PR takes it to "limitless" items in the cache. Do you think it would be helpful to take this item-number up in increments of 100 or so until we see a significant speedup without going limitless? |
I'm open to it, but any other number seemed just as arbitrary to me as 128. If I up it to 256 I know that we will run into this problem again in the next 6 months (we're in the process of annotating our database tables). The obvious downside to this would be memory leaks for very large schemas, but I'm not entirely sure how to measure that. I think memory usage growing proportional to the schema size would be more expected than an arbitrary threshold after which the program effectively stops working. |
I can agree with that :) - is there anything we can do to prevent a very large schema from running into problems with too big a cache? Said another way, what is the largest schema size you've tested this on so far? |
Just this 129. I can automate creating a schema with 1024 tables, but the specific use case I'm testing with is with |
I just ran a benchmark for a file with 2,048 classes and 10,240 slots - it took about 4 minutes with a peak of about 450 MB memory (at least for the |
Fixes linkml/linkml#1973.
I'm starting with a really naive fix of just removing the entry limit for the
lru_cache
, but I'm open to pivoting to other solutions.With this change, running
gen-doc
on the following yaml file goes down from 3+ minutes to 7s.test.yaml.txt