-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a multivalued property to field mappings (#16420) #16601
Add a multivalued property to field mappings (#16420) #16601
Conversation
❌ Gradle check result for 70d0100: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
❌ Gradle check result for fbc6855: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
* Can only be used for field types that support multiple values * If a field has the multivalued property, then new documents must have an array for its value Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
fbc6855
to
98d377b
Compare
@@ -135,6 +137,7 @@ public Builder(String name, boolean ignoreMalformedByDefault, boolean coerceByDe | |||
ignoreMalformedByDefault | |||
); | |||
this.coerce = Parameter.explicitBoolParam("coerce", true, m -> toType(m).coerce, coerceByDefault); | |||
this.multivalued = Parameter.explicitBoolParam("multivalued", true, m -> toType(m).multivalued, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding an explicitBoolParam
to each FieldMapper
, I would suggest that we treat this similarly to indexed
, hasDocValues
, and stored
. That is, I would add a new static multivaluedParam
helper function to the Parameter
class.
@Override | ||
public boolean isMultivalue() { | ||
return multivalued.explicit() && multivalued.value() != null && multivalued.value(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned with the idea of treating "unset" the same as false
.
In particular, existing fields may or may not be multi-valued. We just don't know. I think we should/must treat the null
(or unset) case as ambiguous.
In my opinion, the more interesting case is when a user defines a new mapping and explicitly says "multivalued":false
. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception. If a field is explicitly specified as "multivalued":true
, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.
@normanj-bitquill -- what do you think? I know it's different from the approach you took, but I think it will provide both backward-compatibility and let us clearly identify single-valued fields in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, the more interesting case is when a user defines a new mapping and explicitly says
"multivalued":false
. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception.
I am inclined to agree, except I suggest that we make the system more forgiving. If "multivalued":false
, then if the field is in fact multi-valued, then we should return the first item encountered without guaranteeing any order. This may mean that the user will get back different values from different requests, but that's what they're explicitly requesting (any value is good enough).
If a field is explicitly specified as
"multivalued":true
, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.
This seems more obvious and straight-forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msfroh I'm OK with inverting the ingestion check. By that I mean only give an error when a field contains multiple values, but "multivalued": false
is set. No errors in other cases. @acarbonetto was curious about if we could do anything different with the results that are returned. Once that discussion is resolved, I'll update this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to see a hint that affects how results are returned from OpenSearch. We don't have strict array objects, but this should hint at returning arrays or non-single values.
For example, if a min/max aggregation is performed on a field that has multivalued=true
, I would expect that the results would be an array/multivalued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing this PR as discussed. If there ever is a fix related to this in the future it would be around how aggregates handle multi-valued fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if a min/max aggregation is performed on a field that has multivalued=true, I would expect that the results would be an array/multivalued.
I think that makes a lot of sense. Part of the original motivation for the #16420 was that @anirudha was asking how the SQL plugin could return results in the appropriate format.
Moving forward, I would love to change the default behavior from "unknown" to multivalued=false
, so that you would need to explicitly mark a multivalued field as such. Unfortunately, that's a (potentially) disruptive change for dynamically-added fields. I'll write down some more thoughts on #16420 on how to handle that.
The good news is that we can address this purely at the OpenSearch mappings layer, leaving the underlying Lucene logic unchanged. (Lucene implicitly allows any field to be multi-valued, though there are some optimizations for cases where we can guarantee at search time that a field is single-valued.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@normanj-bitquill -- do you plan to open another PR to flip the logic?
Once we have the property in place, we should be able to use it for rendering at query time (especially from the SQL plugin).
Incidentally, I think I would add multivalued
as a Boolean
field on MappedFieldType
, since I believe it could be applicable to every subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YANG-DB suggested mapping._meta#field.name.type:array
. Is that better than multivalued
?
opensearch-project/sql#3138 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that should at least address the SQL side of things.
I would still like to address this in OpenSearch core, as we move towards a more strictly-defined schema. (There are some indexing ideas that I've been discussing with @backslasht that involve packing fixed-width values for fields, which means that you need to know if a field is single- or multi-valued.) That said, if we have a workaround for the more immediate SQL problem, that's a lower priority.
Thanks, @acarbonetto and @normanj-bitquill!
Description
A mapping for a field can now contain a property
multivalued
.multivalued
must have a boolean value. It can only be applied to field types that support multiple values.If the
multivalued
property has never been set for a field, then it is assumed to be false and will not be returned in the index mapping.If
multivalued
is set to false, then there is not change in behaviour except for includingmultivalued
in the index mapping.If
multivalued
is set to true, then it is returned in the index mapping. In addition, any new documents inserted must have an array value for the field withmultivalued
set to true.The
multivalued
property is intended to be useful to services that consume index mappings. It indicates that the field will should only contain array values. As an example, the SQL plugin can this to decide on how to process an aggregate operation such asMIN
orMAX
.An example mapping that uses
multivalued
:Related Issues
Resolves #16420
Relates to: SQL #3137
Relates to: SQL #3138
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.