[FEATURE] Concurrency optimizations with native memory graph loading and force eviction #2265

kotwanikunal · 2024-11-09T02:32:47Z

Is your feature request related to a problem?

With the introduction of Lucene compatible loading layer within NativeMemoryLoadStrategy - the IndexLoadStrategy.load() takes care of loading the graph file into the native memory cache using IndexInput
With force eviction, the synchronized block contains the logic for loading the entry into the cache as well as the memory
The synchronized nature of the block to deal with cache sizing causes a premature bottleneck with the memory load, especially in the case of concurrent segment search where multiple threads are forced to be synchronized for graph load operations

What solution would you like?

An ideal solution here would be to ensure that the preload to memory and any preliminary operations (for eg download segments in case of remote store or searchable snapshots or checksumming) can be performed outside of the synchronized block to allow for better parallelism
A suggested approach would look like adding in a new API ensureLoadable to the NativeMemoryEntryContext which will make sure that the graph file is accessible, and ready to be loaded into memory once space is available

What alternatives have you considered?
N/A

Do you have any additional context?
N/A

The text was updated successfully, but these errors were encountered:

Gankris96 · 2024-11-13T01:16:30Z

You can assign this to me

Gankris96 · 2024-12-04T18:57:33Z

My proposal is to basically refactor the load functionality into 2 steps -

preload (which will happen outside the synchronized block)
load, which will basically use the JNI service to get the mapped address of the graph file and proceed to createIndexAllocation. (This will still happen in the synchronized block)

What this will achieve is that we can ensure that the index is loadable in all scenarios -

for a regular index which is readily available in memory, there will be no change in behavior.
For remote-store and searchable snapshots case, preload will ensure that the data is downloaded into memory before doing the load phase.

kotwanikunal added untriaged enhancement labels Nov 9, 2024

navneet1v removed the untriaged label Nov 13, 2024

navneet1v assigned Gankris96 Nov 13, 2024

navneet1v added this to Vector Search RoadMap Nov 14, 2024

github-project-automation bot moved this to Backlog in Vector Search RoadMap Nov 14, 2024

navneet1v moved this from Backlog to 2.19.0 in Vector Search RoadMap Nov 14, 2024

navneet1v added the v2.19.0 label Nov 14, 2024

Provide feedback