-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding 'dense_vector' field type #3659
Adding 'dense_vector' field type #3659
Conversation
✅ Gradle Check success e6f79103191b342d6dd26eca23fb31f23609aaf7 |
b46bdb0
to
521644c
Compare
✅ Gradle Check success b46bdb08b912c7cb65813b54bda729d766843966 |
✅ Gradle Check success 521644c537396b2ccdc00fb057761a3270afdb2b |
* compatible open source license. | ||
*/ | ||
|
||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is new code, isn't it? Doesn't need an ES license grant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm adding several new classes. I actually wasn't sure on which one to use, checked few recent check-ins and classes in the same package, they used ES style license.
Let me update PR and exclude ES part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only SPDX header is required for new code AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind changing, it's the same we use in kNN repo. I just checked few random PRs for this repo before making this change, found the header I used here, e.g. #3004 or #690. At the same time I haven't found any example with the compact header.
Do you mean it's some recent requirement for the SPDX header and there might be no examples in the repo yet?
d1203f8
to
92ba3a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @martin-gaievski this is coming together! I left some initial feedback. Nothing too major. Mostly around test scaffolding and some extensibility protections.
import static org.opensearch.index.mapper.KnnAlgorithmContextFactory.HNSW_PARAMETER_BEAM_WIDTH; | ||
import static org.opensearch.index.mapper.KnnAlgorithmContextFactory.HNSW_PARAMETER_MAX_CONNECTIONS; | ||
|
||
public class DenseVectorFieldTypeTests extends OpenSearchSingleNodeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of these are FieldMapperTests
(parsing the mapping) not FieldTypeTests
so lets refactor them over to DenseVectorFieldMapperTests
.
Also, OpenSearchSingleNodeTestCase
is slow and big so we try to avoid it. Let's refactor this test class as follows:
public class DenseVectorFieldTypeTests extends OpenSearchSingleNodeTestCase { | |
public class DenseVectorFieldTypeTests extends FieldTypeTestCase { |
For guidance on this test see the CollationFieldTypeTests as an example.
import static org.opensearch.index.mapper.KnnAlgorithmContextFactory.HNSW_PARAMETER_BEAM_WIDTH; | ||
import static org.opensearch.index.mapper.KnnAlgorithmContextFactory.HNSW_PARAMETER_MAX_CONNECTIONS; | ||
|
||
public class DenseVectorMapperTests extends MapperServiceTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class DenseVectorMapperTests extends MapperServiceTestCase { | |
public class DenseVectorFieldMapperTests extends FieldMapperTestCase2<DenseVectorFieldMapper.Builder> { |
We should use the FieldMapperTestCase2
helper class here to ensure we covering mapper merge conflict and serialization tests that are important for the REST client interfaces.
For guidance on this test see the LegacyGeoShapeFieldMapperTests example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use FieldMapperTestCase2
directly as DenseVectorFieldMapper.Builder
extends ParametrizedFieldMapper.Builder that is not supported by FieldMapperTestCase2
, seems that majority of other mapper builders extend ParametrizedFieldMapper.Builder
as well.
@nknize what would be your suggestion, is it better to keep new mapper for sense_vector as ParametrizedFieldMapper.Builder
and write custom test cases for merge conflicts, or change it to FieldMapper.Builder
? I do not see any strong argument for keeping it as ParametrizedFieldMapper.Builder
except that it will require additional efforts to re-write current implementation, while with FieldMapper.Builder
we obviously will have merge and serialization tests for free?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done refactoring, but to b honest I'm not sure if that is better than a previous version. While test coverage is better, there is a lot of additional code added that has been taking care of by the ParametrizedFieldMapper
. Please check and let me know what do you think. Thx!
} | ||
|
||
public DenseVectorFieldType(String name, Map<String, String> meta, int dimension, KnnContext knnContext, boolean hasDocValues) { | ||
super(name, false, false, false, TextSearchInfo.NONE, meta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(name, false, false, false, TextSearchInfo.NONE, meta); | |
super(name, false, false, hasDocValues, TextSearchInfo.NONE, meta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
|
||
private final int dimension; | ||
private final KnnContext knnContext; | ||
private final boolean hasDocValues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MappedFieldType already takes care of hasDocValues
for you. Let's remove this variable and use the built in mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me check MappedFieldType logic, initially I had to add custom code because unit tests were failing. Probably some piece was missing.
import java.util.Map; | ||
|
||
/** | ||
* Field Mapper for Dense vector type. Extends ParametrizedFieldMapper in order to easily configure mapping parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want external plugins (e.g., KNN) to support extending this class to add plugin specific logic? If we don't have anything as of yet, then can we make this final
? Also let's annotate w/ @opensearch.internal
.
* Field Mapper for Dense vector type. Extends ParametrizedFieldMapper in order to easily configure mapping parameters. | |
* Field Mapper for Dense vector type. Extends ParametrizedFieldMapper in order to easily configure mapping parameters. | |
* | |
* @opensearch.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
/** | ||
* Abstracts KNN segment of dense_vector field type | ||
*/ | ||
public class KnnContext implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark this class as final to prevent any subclassing fragmentation?
public class KnnContext implements ToXContentFragment, Writeable { | |
public final class KnnContext implements ToXContentFragment, Writeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
import java.util.Objects; | ||
|
||
/** | ||
* Abstracts KNN segment of dense_vector field type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Abstracts KNN segment of dense_vector field type | |
* Abstracts KNN segment of dense_vector field type | |
* | |
* @opensearch.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
/** | ||
* Abstracts supported metrics for dense_vector and KNN search | ||
*/ | ||
public enum Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be package private?
public enum Metric { | |
final enum Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think package private should be possible, let me check. I'm not sure if we can/need do enum final
import java.util.Optional; | ||
|
||
/** | ||
* Abstracts supported metrics for dense_vector and KNN search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Abstracts supported metrics for dense_vector and KNN search | |
* Abstracts supported metrics for dense_vector and KNN search | |
* | |
* @opensearch.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
int value = XContentMapValues.nodeIntegerValue(o); | ||
if (value > MAX_DIMENSION) { | ||
throw new IllegalArgumentException( | ||
String.format(Locale.ROOT, "[dimension] value cannot be greater than %d for vector [%s]", MAX_DIMENSION, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.format(Locale.ROOT, "[dimension] value cannot be greater than %d for vector [%s]", MAX_DIMENSION, name) | |
String.format(Locale.ROOT, "[dimension] value %d cannot be greater than %d for vector [%s]", value, MAX_DIMENSION, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
} | ||
if (value <= 0) { | ||
throw new IllegalArgumentException( | ||
String.format(Locale.ROOT, "[dimension] value must be greater than 0 for vector [%s]", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.format(Locale.ROOT, "[dimension] value must be greater than 0 for vector [%s]", name) | |
String.format(Locale.ROOT, "[dimension] value %d must be greater than 0 for vector [%s]", value, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
} | ||
|
||
private boolean isDenseVectorFieldType(final MappedFieldType mappedFieldType) { | ||
if (mappedFieldType != null && mappedFieldType instanceof DenseVectorFieldMapper.DenseVectorFieldType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need explicit null check before instanceof ? i think instanceof takes care of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, I learned a new thing today thanks to you - https://docs.oracle.com/javase/specs/jls/se11/html/jls-15.html#jls-15.20.2. I'll update the code
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
…avadocs Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
3682a66
to
ecb18fd
Compare
Signed-off-by: Martin Gaievski <[email protected]>
private final Method method; | ||
private final Map<String, Object> parameters; | ||
|
||
private static final int MAX_NUMBER_OF_ALGORITHM_PARAMETERS = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we move all constants to beginning of class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
private static final int MAX_NUMBER_OF_ALGORITHM_PARAMETERS = 50; | ||
|
||
public KnnAlgorithmContext(Method method, Map<String, Object> parameters) { | ||
this.method = method; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requireNonNull check for params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Gradle Check (Jenkins) Run Completed with:
|
Object v = e.getValue(); | ||
if (v instanceof Map) { | ||
throw new MapperParsingException( | ||
String.format(Locale.ROOT, "Unable to parse parameter [%s] for [algorithm]", e.getValue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.format(Locale.ROOT, "Unable to parse parameter [%s] for [algorithm]", e.getValue()) | |
String.format(Locale.ROOT, "Unable to parse parameter [%s] for [algorithm]", v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
if (parameters == null) { | ||
builder.field(PARAMETERS, (String) null); | ||
} else { | ||
builder.startObject(PARAMETERS); | ||
parameters.forEach((key, value) -> { | ||
try { | ||
builder.field(key, value); | ||
} catch (IOException ioe) { | ||
throw new RuntimeException("Unable to generate xcontent for method component"); | ||
} | ||
|
||
}); | ||
builder.endObject(); | ||
} | ||
return builder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (parameters == null) { | |
builder.field(PARAMETERS, (String) null); | |
} else { | |
builder.startObject(PARAMETERS); | |
parameters.forEach((key, value) -> { | |
try { | |
builder.field(key, value); | |
} catch (IOException ioe) { | |
throw new RuntimeException("Unable to generate xcontent for method component"); | |
} | |
}); | |
builder.endObject(); | |
} | |
return builder; | |
if (parameters == null) { | |
return builder.field(PARAMETERS, (String) null); | |
} | |
builder.startObject(PARAMETERS); | |
parameters.forEach((key, value) -> { | |
try { | |
builder.field(key, value); | |
} catch (IOException ioe) { | |
throw new RuntimeException("Unable to generate xcontent for method component"); | |
} | |
}); | |
return builder.endObject(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Gradle Check (Jenkins) Run Completed with:
|
* | ||
* @opensearch.internal | ||
*/ | ||
public class KnnVectorFormatFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT?
public class KnnVectorFormatFactory { | |
public class KNNVectorFormatFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to be honest - we're wrapping Lucene classes and they have notation as "Knn" (e.g. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/document/KnnVectorField.java), so I've follow them
if (mappedFieldType instanceof DenseVectorFieldMapper.DenseVectorFieldType) { | ||
return true; | ||
} | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (mappedFieldType instanceof DenseVectorFieldMapper.DenseVectorFieldType) { | |
return true; | |
} | |
return false; | |
return (mappedFieldType instanceof DenseVectorFieldMapper.DenseVectorFieldType); |
or you don't need method if it is used once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Gradle Check (Jenkins) Run Completed with:
|
Objects.requireNonNull(method, "[method] for knn algorithm context cannot be null"); | ||
Objects.requireNonNull(parameters, "[parameters] for knn algorithm context cannot be null"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Objects.requireNonNull(method, "[method] for knn algorithm context cannot be null"); | |
Objects.requireNonNull(parameters, "[parameters] for knn algorithm context cannot be null"); | |
this.method = Objects.requireNonNull(method, "[method] for knn algorithm context cannot be null"); | |
this.parameters = Objects.requireNonNull(parameters, "[parameters] for knn algorithm context cannot be null"); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Signed-off-by: Martin Gaievski <[email protected]>
dc6128c
to
24dfcdb
Compare
Gradle Check (Jenkins) Run Completed with:
|
@nknize Take a look at the updates? |
Closing this PR as team decided to go with implementation in k-NN plugin |
Signed-off-by: Martin Gaievski [email protected]
Description
Adding a new field type
dense_vector
. It's a first step in enabling KNN search in core OpenSearch. This implementation is based on Lucene HNSW, more details in corresponding RFC #3545We'll be adding query type support in next PR, limiting this change to field type only to keep PR/change size smaller. With this change user will be able to :
dense_vector
with an option to index for the KNN searchdense_vector
fields to the indexIssues Resolved
opensearch-project/k-NN#39
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.