Update Cache To Support > 128 Entries #309

timothy-trinidad-ps · 2024-03-15T18:17:41Z

Fixes linkml/linkml#1973.

I'm starting with a really naive fix of just removing the entry limit for the lru_cache, but I'm open to pivoting to other solutions.

With this change, running gen-doc on the following yaml file goes down from 3+ minutes to 7s.

test.yaml.txt

codecov · 2024-03-15T18:36:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.88%. Comparing base (27b9158) to head (0be1913).
Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #309   +/-   ##
=======================================
  Coverage   62.88%   62.88%           
=======================================
  Files          62       62           
  Lines        8528     8528           
  Branches     2436     2436           
=======================================
  Hits         5363     5363           
  Misses       2554     2554           
  Partials      611      611

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sierra-moxon · 2024-03-15T18:37:21Z

@timothy-trinidad-ps - looking lru doc, it seems like the default is "128" items in the cache and this PR takes it to "limitless" items in the cache.

Do you think it would be helpful to take this item-number up in increments of 100 or so until we see a significant speedup without going limitless?

timothy-trinidad-ps · 2024-03-15T18:43:51Z

I'm open to it, but any other number seemed just as arbitrary to me as 128. If I up it to 256 I know that we will run into this problem again in the next 6 months (we're in the process of annotating our database tables).

The obvious downside to this would be memory leaks for very large schemas, but I'm not entirely sure how to measure that. I think memory usage growing proportional to the schema size would be more expected than an arbitrary threshold after which the program effectively stops working.

sierra-moxon · 2024-03-15T19:00:49Z

I can agree with that :) - is there anything we can do to prevent a very large schema from running into problems with too big a cache? Said another way, what is the largest schema size you've tested this on so far?

timothy-trinidad-ps · 2024-03-15T19:50:09Z

Just this 129. I can automate creating a schema with 1024 tables, but the specific use case I'm testing with is with gen-doc and the functions called by its dependent templates. What I would need help with is determining how to test other use cases of schemaview.py when it comes to memory usage.

timothy-trinidad-ps · 2024-03-15T20:06:26Z

I just ran a benchmark for a file with 2,048 classes and 10,240 slots - it took about 4 minutes with a peak of about 450 MB memory (at least for the gen-doc use case).
test.yaml.txt

remove limit for cache size

0be1913

sierra-moxon self-requested a review March 15, 2024 20:20

cmungall approved these changes Mar 18, 2024

View reviewed changes

cmungall merged commit 5403612 into linkml:main Mar 18, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Cache To Support > 128 Entries #309

Update Cache To Support > 128 Entries #309

timothy-trinidad-ps commented Mar 15, 2024

codecov bot commented Mar 15, 2024 •

edited

Loading

sierra-moxon commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024

sierra-moxon commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024 •

edited

Loading

Update Cache To Support > 128 Entries #309

Update Cache To Support > 128 Entries #309

Conversation

timothy-trinidad-ps commented Mar 15, 2024

codecov bot commented Mar 15, 2024 • edited Loading

Codecov Report

sierra-moxon commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024

sierra-moxon commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024

timothy-trinidad-ps commented Mar 15, 2024 • edited Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

timothy-trinidad-ps commented Mar 15, 2024 •

edited

Loading