Add `ProtocolDAGResult` caching to user-facing client #58

dotsdl · 2023-01-10T04:15:18Z

ProtocolDAGResults are pulled by users to evaluate free energy differences for Transformations they are interested in. These can be rather large, and many can be pulled for a given Transformation in a single client method call. This can cause slow behavior for users, killing productivity.

Because ProtocolDAGResults never change once created, there is no risk of a stale cache. This make cache invalidation a non-issue for these objects.

Caching should be two-tier:

an in-memory cache, of finite size; this can be a LRU cache, for example
an on-disk cache, of finite size; max size can be set on client instantiation

When a ProtocolDAGResult ScopedKey is slated for retrieval:

(1) is hit first; if present, the ProtocolDAGResult is returned; if not,
(2) is hit; if present, the ProtocolDAGResult is returned; if not,
the request goes to the user-facing API

The text was updated successfully, but these errors were encountered:

ianmkenney · 2024-04-24T16:44:23Z

@dotsdl it looks like we're already doing (a)LRU caching for PDRs (although we have up to 10000 in maxsize, which I think is pretty aggressive). That said, we could just add a disk check inside the relevant methods. I think that will achieve what you proposed. Does that sound good to you?

dotsdl · 2024-04-24T19:04:41Z

@ianmkenney our use of the (a)LRU cache there is rather crude, since usages of AlchemiscaleClient.get_network_results (a method commonly used by users to pull all results for a network) uses a ProcessPoolExecutor for retrieval and this may mean that each subprocess gets its own in-memory cache.

Do you think it's feasible to create a single in-memory cache using e.g. a WeakValueDictionary that child processes would be able to automatically use (at least on POSIX systems)? That should generally yield higher performance with fewer in-memory cache misses when using get_network_results, requiring less use of the disk cache for repeated calls by the subprocesses. Thoughts?

dotsdl · 2024-04-24T19:06:10Z

Can we also expose the cache settings (e.g. location, max records, etc.) as kwargs to the AlchemiscaleClient so users can alter these if needed?

dotsdl added performance component-user-client labels Jan 10, 2023

dotsdl modified the milestones: Release 0.4.0 - "living networks" and automated strategies enablement, Release 0.3.0 - new features, optimizations, targeted refactors Jan 11, 2024

dotsdl assigned ianmkenney Apr 23, 2024

ianmkenney linked a pull request Apr 25, 2024 that will close this issue

Use diskcache for caching ProtocolDAGResults in the Alchemiscale client #271

Open

ianmkenney linked a pull request Apr 30, 2024 that will close this issue

Use diskcache for caching ProtocolDAGResults in the Alchemiscale client #271

Open

dotsdl modified the milestones: Release 0.7.0 - "living networks" and automated strategies enablement, Release 0.6.0 - result retrieval optimizations, server-side task restart policies Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `ProtocolDAGResult` caching to user-facing client #58

Add `ProtocolDAGResult` caching to user-facing client #58

dotsdl commented Jan 10, 2023

ianmkenney commented Apr 24, 2024

dotsdl commented Apr 24, 2024

dotsdl commented Apr 24, 2024

Add ProtocolDAGResult caching to user-facing client #58

Add ProtocolDAGResult caching to user-facing client #58

Comments

dotsdl commented Jan 10, 2023

ianmkenney commented Apr 24, 2024

dotsdl commented Apr 24, 2024

dotsdl commented Apr 24, 2024

Add `ProtocolDAGResult` caching to user-facing client #58

Add `ProtocolDAGResult` caching to user-facing client #58