Implement sequence prefetching with in-memory cache #89

reece · 2020-10-31T22:10:33Z

SeqRepo is capable of >1500 queries/second single-threaded with local data. At this rate, sequence fetching is likely to be a small component of overall execution of a typical analysis pipeline.

Optimizing significantly beyond current performance requires loading sequences in memory. However, it's not generally feasible or useful to prefetch all sequences. Current human databases are ~12GB compressed. Prefetching selected sequences on first access could be very beneficial for certain access patterns.

Prefetching might work as follows. The client would be instantiated with a prefetch cache size, which would control the number of sequences in the prefetch cache. The default is 0 (no prefetch).

When a client requests a slice of a sequence, the entire sequence would be read speculatively, anticipating that the next queries might be on the same sequence (e.g., on a single chromosome). Subsequent sequence lookups would be entirely in-memory.

The cache would operate in a typical LRU sense, automatically flushing the sequence least recently accessed if the cache size has reached its target size.

Importantly, prefetching can degrade performance if accesses are not suitably ordered.

github-actions · 2023-09-20T01:41:16Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2023-09-27T01:42:15Z

This issue was closed because it has been stalled for 7 days with no activity.

github-actions · 2024-03-29T01:40:09Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions bot added the stale Issue is stale and subject to automatic closing label Sep 20, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023

reece added stale closed Issue was closed automatically due to inactivity and removed stale closed Issue was closed automatically due to inactivity labels Nov 27, 2023

reece reopened this Dec 8, 2023

github-actions bot removed the stale Issue is stale and subject to automatic closing label Dec 9, 2023

jsstevenson added the keep alive exempt issue from staleness checks label Dec 29, 2023

github-actions bot added the stale Issue is stale and subject to automatic closing label Mar 29, 2024

jsstevenson removed the stale Issue is stale and subject to automatic closing label Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement sequence prefetching with in-memory cache #89

Implement sequence prefetching with in-memory cache #89

reece commented Oct 31, 2020

github-actions bot commented Sep 20, 2023

github-actions bot commented Sep 27, 2023

github-actions bot commented Mar 29, 2024

Implement sequence prefetching with in-memory cache #89

Implement sequence prefetching with in-memory cache #89

Comments

reece commented Oct 31, 2020

github-actions bot commented Sep 20, 2023

github-actions bot commented Sep 27, 2023

github-actions bot commented Mar 29, 2024