-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add DocValuesProducers for releasing memory when close index #1946
Add DocValuesProducers for releasing memory when close index #1946
Conversation
@luyuncheng can we fix the build CIs. BWC CIs is failing across other PRs will check whats happening there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this @luyuncheng. Overall looks pretty good. A few comments. I think it is okay to initially focus on DocValues and then add functionality for KnnVectorsFormat in another PR (and get rid of file watcher in this PR).
|
||
@Override | ||
public IndexInput openInput(String name, IOContext context) throws IOException { | ||
return delegate.openInput(name, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future extendability, could we check if this is a k-NN file and open it from the directory? I know we typically dont open k-NN files via openInput, but if we do, this might fail:
if (KNNEngine.getEnginesThatCreateCustomSegmentFiles().stream().anyMatch(engine -> name.endsWith(engine.getCompoundExtension()))) {
return directory.openInput(name, context);
}
return delegate.openInput(name, context);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 why we want to do this? this can lead to unnecessary mapping of the file in memory. My thoughts will be not to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think anyone should call openInput on k-NN files. But if they do, return delegate.openInput(name, context);
will throw exception because file isnt actually in compound file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if exception will come. @luyuncheng have you seen an exception here? I think it will open the input.
But even if there is no exception we should not open the input, it is just unnecessary mapping of file. One we start to use IndexInput to read the graph file(ref: #1951) we can start opening the files here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i did not find any exception. but i do want to implement an nativeEngine indexInput like #1951 says.
how about:
- we return
return delegate.openInput(name, context);
- and added a assert when name is native engines.
- also added an TODO comments here: TODO: using the native engine IndexInput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@navneet1v See how we write: https://github.com/luyuncheng/k-NN-1/blob/DocValuesProducers/src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80CompoundFormat.java#L48-L53. We dont use the delegate to write the file, we use directory directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 I understand we would be directly writing the file. I have tested a small code change where in KNNWeight class I was opening .hnsw file using IndexInput and it worked. I didn't get the exception. So want to know when you say there will be exception what will be that exception.
Neverthless I believe we should not even open the nativeIndex files in the producer as it will unnecessary map the file on RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your test was it an .hnsw file or .hnswc? I think .hnswc should fail with this line
src/main/java/org/opensearch/knn/index/memory/NativeMemoryAllocation.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80CompoundDirectory.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
Added some thoughts related to the cache free up: #1885 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment - other than that looks good!
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
@jmazanec15 at 2bcc139 how about added try catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks good. Thanks approving!
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/knn/index/memory/NativeMemoryAllocation.java
Show resolved
Hide resolved
12b96e4
to
7932452
Compare
@luyuncheng can you please look at the comment added reply on them. |
f043fc4
to
439c5b7
Compare
@navneet1v Sorry for the late reply, i just get back from vacation. i'll do it ASAP. |
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
Signed-off-by: luyuncheng <[email protected]>
e8310c2
to
1edcdc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jmazanec15 @navneet1v Thanks guys for reviewing |
…2109) Add DocValuesProducers for releasing memory when close index #1946 (cherry picked from commit 004fcc0) Co-authored-by: luyuncheng <[email protected]>
Hi @luyuncheng
|
because in model engine, do not need docValuesProducer for merge. so i filter it out |
@luyuncheng
Thank you! |
i think only faiss can have a model engine?
sorry i do not understand this question.
i see there are 3 memory allocation |
@luyuncheng Could you give more contexts why did you add the logic so that those 'model engines' are excluded from invalidating the cache??
|
Hi @luyuncheng, What I understand is modelId coming from cluster state also I confirmed OpneSearch build its index internally. Thank you |
Description
in #1885 we talk about a method that we need release memory when a producer closed. so i open this PR and added
KNN80DocValuesProducer
. this producer can release memory when reader closed a segment.also i added
refCount
as the comments cares about.i think this pr is the 1st step, we only added a producers, because we talked in #1885, that we need get
binaryDocValues
inDocValuesProducers
from native engine.i see there is an #1853 on going, so i prefer to in next step to read binaryDocValues from native engines.
Related Issues
#1885
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.