-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a multivalued property to field mappings (#16420) #16601
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -120,6 +120,8 @@ public static class Builder extends ParametrizedFieldMapper.Builder { | |
m -> toType(m).nullValue | ||
).acceptsNull(); | ||
|
||
private final Parameter<Explicit<Boolean>> multivalued; | ||
|
||
private final Parameter<Map<String, String>> meta = Parameter.metaParam(); | ||
|
||
public Builder(String name, Settings settings) { | ||
|
@@ -135,6 +137,7 @@ public Builder(String name, boolean ignoreMalformedByDefault, boolean coerceByDe | |
ignoreMalformedByDefault | ||
); | ||
this.coerce = Parameter.explicitBoolParam("coerce", true, m -> toType(m).coerce, coerceByDefault); | ||
this.multivalued = Parameter.explicitBoolParam("multivalued", true, m -> toType(m).multivalued, false); | ||
} | ||
|
||
Builder scalingFactor(double scalingFactor) { | ||
|
@@ -149,7 +152,7 @@ Builder nullValue(double nullValue) { | |
|
||
@Override | ||
protected List<Parameter<?>> getParameters() { | ||
return Arrays.asList(indexed, hasDocValues, stored, ignoreMalformed, meta, scalingFactor, coerce, nullValue); | ||
return Arrays.asList(indexed, hasDocValues, stored, ignoreMalformed, meta, scalingFactor, coerce, nullValue, multivalued); | ||
} | ||
|
||
@Override | ||
|
@@ -372,6 +375,8 @@ public double toDoubleValue(long value) { | |
private final boolean ignoreMalformedByDefault; | ||
private final boolean coerceByDefault; | ||
|
||
private final Explicit<Boolean> multivalued; | ||
|
||
private ScaledFloatFieldMapper( | ||
String simpleName, | ||
ScaledFloatFieldType mappedFieldType, | ||
|
@@ -389,6 +394,7 @@ private ScaledFloatFieldMapper( | |
this.coerce = builder.coerce.getValue(); | ||
this.ignoreMalformedByDefault = builder.ignoreMalformed.getDefaultValue().value(); | ||
this.coerceByDefault = builder.coerce.getDefaultValue().value(); | ||
this.multivalued = builder.multivalued.getValue(); | ||
} | ||
|
||
boolean coerce() { | ||
|
@@ -399,6 +405,10 @@ boolean ignoreMalformed() { | |
return ignoreMalformed.value(); | ||
} | ||
|
||
boolean multivalued() { | ||
return multivalued.value(); | ||
} | ||
|
||
@Override | ||
public ScaledFloatFieldType fieldType() { | ||
return (ScaledFloatFieldType) super.fieldType(); | ||
|
@@ -414,6 +424,11 @@ public ParametrizedFieldMapper.Builder getMergeBuilder() { | |
return new Builder(simpleName(), ignoreMalformedByDefault, coerceByDefault).init(this); | ||
} | ||
|
||
@Override | ||
public boolean isMultivalue() { | ||
return multivalued.explicit() && multivalued.value() != null && multivalued.value(); | ||
} | ||
Comment on lines
+427
to
+430
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little concerned with the idea of treating "unset" the same as In particular, existing fields may or may not be multi-valued. We just don't know. I think we should/must treat the In my opinion, the more interesting case is when a user defines a new mapping and explicitly says @normanj-bitquill -- what do you think? I know it's different from the approach you took, but I think it will provide both backward-compatibility and let us clearly identify single-valued fields in future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am inclined to agree, except I suggest that we make the system more forgiving. If
This seems more obvious and straight-forward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @msfroh I'm OK with inverting the ingestion check. By that I mean only give an error when a field contains multiple values, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer to see a hint that affects how results are returned from OpenSearch. We don't have strict array objects, but this should hint at returning arrays or non-single values. For example, if a min/max aggregation is performed on a field that has There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Closing this PR as discussed. If there ever is a fix related to this in the future it would be around how aggregates handle multi-valued fields. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think that makes a lot of sense. Part of the original motivation for the #16420 was that @anirudha was asking how the SQL plugin could return results in the appropriate format. Moving forward, I would love to change the default behavior from "unknown" to The good news is that we can address this purely at the OpenSearch mappings layer, leaving the underlying Lucene logic unchanged. (Lucene implicitly allows any field to be multi-valued, though there are some optimizations for cases where we can guarantee at search time that a field is single-valued.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @normanj-bitquill -- do you plan to open another PR to flip the logic? Once we have the property in place, we should be able to use it for rendering at query time (especially from the SQL plugin). Incidentally, I think I would add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @YANG-DB suggested There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, that should at least address the SQL side of things. I would still like to address this in OpenSearch core, as we move towards a more strictly-defined schema. (There are some indexing ideas that I've been discussing with @backslasht that involve packing fixed-width values for fields, which means that you need to know if a field is single- or multi-valued.) That said, if we have a workaround for the more immediate SQL problem, that's a lower priority. Thanks, @acarbonetto and @normanj-bitquill! |
||
|
||
@Override | ||
protected ScaledFloatFieldMapper clone() { | ||
return (ScaledFloatFieldMapper) super.clone(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
/* | ||
* SPDX-License-Identifier: Apache-2.0 | ||
* | ||
* The OpenSearch Contributors require contributions made to | ||
* this file be licensed under the Apache-2.0 license or a | ||
* compatible open source license. | ||
*/ | ||
|
||
package org.opensearch.index.mapper; | ||
|
||
import org.opensearch.common.xcontent.XContentFactory; | ||
import org.opensearch.core.xcontent.XContentBuilder; | ||
import org.opensearch.test.OpenSearchIntegTestCase; | ||
|
||
import java.io.IOException; | ||
import java.util.List; | ||
|
||
import static org.opensearch.test.hamcrest.OpenSearchAssertions.assertAcked; | ||
import static org.hamcrest.Matchers.equalTo; | ||
|
||
public class MultivaluedFieldsIntegrationIT extends OpenSearchIntegTestCase { | ||
public void testMultivaluedFields() throws Exception { | ||
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMultivaluedTypeSource())); | ||
XContentBuilder singleValueSource = XContentFactory.jsonBuilder().startObject().field("title", "Hello world").endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(true) | ||
); | ||
XContentBuilder multiValueSource = XContentFactory.jsonBuilder() | ||
.startObject() | ||
.field("title", List.of("Hello world", "abcdef")) | ||
.endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(false) | ||
); | ||
} | ||
|
||
public void testGeoPointMultivaluedField() throws Exception { | ||
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("geo_point"))); | ||
XContentBuilder singleValueSource = XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startObject("a") | ||
.field("lat", 40.71) | ||
.field("lon", 74.0) | ||
.endObject() | ||
.endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(true) | ||
); | ||
XContentBuilder multiValueSource = XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startArray("a") | ||
.startObject() | ||
.field("lat", 40.71) | ||
.field("lon", 74.0) | ||
.endObject() | ||
.startObject() | ||
.field("lat", 63.45) | ||
.field("lon", 123.79) | ||
.endObject() | ||
.endArray() | ||
.endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(false) | ||
); | ||
} | ||
|
||
public void testCompletionMultivaluedField() throws Exception { | ||
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("completion"))); | ||
XContentBuilder singleValueSource = XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startObject("a") | ||
.array("input", "foo", "bar") | ||
.field("weight", 10) | ||
.endObject() | ||
.endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(true) | ||
); | ||
XContentBuilder multiValueSource = XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startArray("a") | ||
.startObject() | ||
.array("input", "foo", "bar") | ||
.field("weight", 10) | ||
.endObject() | ||
.startObject() | ||
.array("input", "baz", "xyz") | ||
.field("weight", 10) | ||
.endObject() | ||
.endArray() | ||
.endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(false) | ||
); | ||
} | ||
|
||
public void testIpMultivaluedField() throws Exception { | ||
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("ip"))); | ||
XContentBuilder singleValueSource = XContentFactory.jsonBuilder().startObject().field("a", "127.0.0.1").endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(true) | ||
); | ||
XContentBuilder multiValueSource = XContentFactory.jsonBuilder().startObject().array("a", "127.0.0.1", "127.0.0.1").endObject(); | ||
assertThat( | ||
client().prepareBulk() | ||
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource)) | ||
.get() | ||
.hasFailures(), | ||
equalTo(false) | ||
); | ||
} | ||
|
||
private XContentBuilder createMultivaluedTypeSource() throws IOException { | ||
return XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startObject("properties") | ||
.startObject("title") | ||
.field("type", "text") | ||
.field("multivalued", true) | ||
.startObject("fields") | ||
.startObject("not_analyzed") | ||
.field("type", "keyword") | ||
.endObject() | ||
.endObject() | ||
.endObject() | ||
.endObject() | ||
.endObject(); | ||
} | ||
|
||
private XContentBuilder createMappingSource(String fieldType) throws IOException { | ||
return XContentFactory.jsonBuilder() | ||
.startObject() | ||
.startObject("properties") | ||
.startObject("a") | ||
.field("type", fieldType) | ||
.field("multivalued", true) | ||
.endObject() | ||
.endObject() | ||
.endObject(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding an
explicitBoolParam
to eachFieldMapper
, I would suggest that we treat this similarly toindexed
,hasDocValues
, andstored
. That is, I would add a new staticmultivaluedParam
helper function to theParameter
class.