-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sequence prefetching with in-memory cache #89
Labels
keep alive
exempt issue from staleness checks
Comments
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
github-actions
bot
added
the
stale
Issue is stale and subject to automatic closing
label
Sep 20, 2023
This issue was closed because it has been stalled for 7 days with no activity. |
reece
added
stale closed
Issue was closed automatically due to inactivity
and removed
stale closed
Issue was closed automatically due to inactivity
labels
Nov 27, 2023
github-actions
bot
removed
the
stale
Issue is stale and subject to automatic closing
label
Dec 9, 2023
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
github-actions
bot
added
the
stale
Issue is stale and subject to automatic closing
label
Mar 29, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
SeqRepo is capable of >1500 queries/second single-threaded with local data. At this rate, sequence fetching is likely to be a small component of overall execution of a typical analysis pipeline.
Optimizing significantly beyond current performance requires loading sequences in memory. However, it's not generally feasible or useful to prefetch all sequences. Current human databases are ~12GB compressed. Prefetching selected sequences on first access could be very beneficial for certain access patterns.
Prefetching might work as follows. The client would be instantiated with a prefetch cache size, which would control the number of sequences in the prefetch cache. The default is 0 (no prefetch).
When a client requests a slice of a sequence, the entire sequence would be read speculatively, anticipating that the next queries might be on the same sequence (e.g., on a single chromosome). Subsequent sequence lookups would be entirely in-memory.
The cache would operate in a typical LRU sense, automatically flushing the sequence least recently accessed if the cache size has reached its target size.
Importantly, prefetching can degrade performance if accesses are not suitably ordered.
The text was updated successfully, but these errors were encountered: