Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Concurrency optimizations with native memory graph loading and force eviction #2265

Open
kotwanikunal opened this issue Nov 9, 2024 · 2 comments

Comments

@kotwanikunal
Copy link
Member

Is your feature request related to a problem?

  • With the introduction of Lucene compatible loading layer within NativeMemoryLoadStrategy - the IndexLoadStrategy.load() takes care of loading the graph file into the native memory cache using IndexInput
  • With force eviction, the synchronized block contains the logic for loading the entry into the cache as well as the memory
  • The synchronized nature of the block to deal with cache sizing causes a premature bottleneck with the memory load, especially in the case of concurrent segment search where multiple threads are forced to be synchronized for graph load operations

What solution would you like?

  • An ideal solution here would be to ensure that the preload to memory and any preliminary operations (for eg download segments in case of remote store or searchable snapshots or checksumming) can be performed outside of the synchronized block to allow for better parallelism
  • A suggested approach would look like adding in a new API ensureLoadable to the NativeMemoryEntryContext which will make sure that the graph file is accessible, and ready to be loaded into memory once space is available

What alternatives have you considered?
N/A

Do you have any additional context?
N/A

@Gankris96
Copy link

You can assign this to me

@Gankris96
Copy link

Looking into the code more, i feel that there is a bottleneck when we try to open the indexInput before loading the index here,
https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/memory/NativeMemoryLoadStrategy.java#L91-L92

My proposal is to basically refactor the load functionality into 2 steps -

  1. preload (which will happen outside the synchronized block)
  2. load, which will basically use the JNI service to get the mapped address of the graph file and proceed to createIndexAllocation. (This will still happen in the synchronized block)

What this will achieve is that we can ensure that the index is loadable in all scenarios -

  1. for a regular index which is readily available in memory, there will be no change in behavior.
  2. For remote-store and searchable snapshots case, preload will ensure that the data is downloaded into memory before doing the load phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 2.19.0
Development

No branches or pull requests

3 participants