Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a multivalued property to field mappings (#16420) #16601

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ public static class Builder extends ParametrizedFieldMapper.Builder {
m -> toType(m).nullValue
).acceptsNull();

private final Parameter<Explicit<Boolean>> multivalued;

private final Parameter<Map<String, String>> meta = Parameter.metaParam();

public Builder(String name, Settings settings) {
Expand All @@ -135,6 +137,7 @@ public Builder(String name, boolean ignoreMalformedByDefault, boolean coerceByDe
ignoreMalformedByDefault
);
this.coerce = Parameter.explicitBoolParam("coerce", true, m -> toType(m).coerce, coerceByDefault);
this.multivalued = Parameter.explicitBoolParam("multivalued", true, m -> toType(m).multivalued, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding an explicitBoolParam to each FieldMapper, I would suggest that we treat this similarly to indexed, hasDocValues, and stored. That is, I would add a new static multivaluedParam helper function to the Parameter class.

}

Builder scalingFactor(double scalingFactor) {
Expand All @@ -149,7 +152,7 @@ Builder nullValue(double nullValue) {

@Override
protected List<Parameter<?>> getParameters() {
return Arrays.asList(indexed, hasDocValues, stored, ignoreMalformed, meta, scalingFactor, coerce, nullValue);
return Arrays.asList(indexed, hasDocValues, stored, ignoreMalformed, meta, scalingFactor, coerce, nullValue, multivalued);
}

@Override
Expand Down Expand Up @@ -372,6 +375,8 @@ public double toDoubleValue(long value) {
private final boolean ignoreMalformedByDefault;
private final boolean coerceByDefault;

private final Explicit<Boolean> multivalued;

private ScaledFloatFieldMapper(
String simpleName,
ScaledFloatFieldType mappedFieldType,
Expand All @@ -389,6 +394,7 @@ private ScaledFloatFieldMapper(
this.coerce = builder.coerce.getValue();
this.ignoreMalformedByDefault = builder.ignoreMalformed.getDefaultValue().value();
this.coerceByDefault = builder.coerce.getDefaultValue().value();
this.multivalued = builder.multivalued.getValue();
}

boolean coerce() {
Expand All @@ -399,6 +405,10 @@ boolean ignoreMalformed() {
return ignoreMalformed.value();
}

boolean multivalued() {
return multivalued.value();
}

@Override
public ScaledFloatFieldType fieldType() {
return (ScaledFloatFieldType) super.fieldType();
Expand All @@ -414,6 +424,11 @@ public ParametrizedFieldMapper.Builder getMergeBuilder() {
return new Builder(simpleName(), ignoreMalformedByDefault, coerceByDefault).init(this);
}

@Override
public boolean isMultivalue() {
return multivalued.explicit() && multivalued.value() != null && multivalued.value();
}
Comment on lines +427 to +430
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned with the idea of treating "unset" the same as false.

In particular, existing fields may or may not be multi-valued. We just don't know. I think we should/must treat the null (or unset) case as ambiguous.

In my opinion, the more interesting case is when a user defines a new mapping and explicitly says "multivalued":false. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception. If a field is explicitly specified as "multivalued":true, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.

@normanj-bitquill -- what do you think? I know it's different from the approach you took, but I think it will provide both backward-compatibility and let us clearly identify single-valued fields in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the more interesting case is when a user defines a new mapping and explicitly says "multivalued":false. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception.

I am inclined to agree, except I suggest that we make the system more forgiving. If "multivalued":false, then if the field is in fact multi-valued, then we should return the first item encountered without guaranteeing any order. This may mean that the user will get back different values from different requests, but that's what they're explicitly requesting (any value is good enough).

If a field is explicitly specified as "multivalued":true, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.

This seems more obvious and straight-forward.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh I'm OK with inverting the ingestion check. By that I mean only give an error when a field contains multiple values, but "multivalued": false is set. No errors in other cases. @acarbonetto was curious about if we could do anything different with the results that are returned. Once that discussion is resolved, I'll update this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see a hint that affects how results are returned from OpenSearch. We don't have strict array objects, but this should hint at returning arrays or non-single values.

For example, if a min/max aggregation is performed on a field that has multivalued=true, I would expect that the results would be an array/multivalued.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this PR as discussed. If there ever is a fix related to this in the future it would be around how aggregates handle multi-valued fields.

Copy link
Collaborator

@msfroh msfroh Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if a min/max aggregation is performed on a field that has multivalued=true, I would expect that the results would be an array/multivalued.

I think that makes a lot of sense. Part of the original motivation for the #16420 was that @anirudha was asking how the SQL plugin could return results in the appropriate format.

Moving forward, I would love to change the default behavior from "unknown" to multivalued=false, so that you would need to explicitly mark a multivalued field as such. Unfortunately, that's a (potentially) disruptive change for dynamically-added fields. I'll write down some more thoughts on #16420 on how to handle that.

The good news is that we can address this purely at the OpenSearch mappings layer, leaving the underlying Lucene logic unchanged. (Lucene implicitly allows any field to be multi-valued, though there are some optimizations for cases where we can guarantee at search time that a field is single-valued.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanj-bitquill -- do you plan to open another PR to flip the logic?

Once we have the property in place, we should be able to use it for rendering at query time (especially from the SQL plugin).

Incidentally, I think I would add multivalued as a Boolean field on MappedFieldType, since I believe it could be applicable to every subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YANG-DB suggested mapping._meta#field.name.type:array. Is that better than multivalued?
opensearch-project/sql#3138 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that should at least address the SQL side of things.

I would still like to address this in OpenSearch core, as we move towards a more strictly-defined schema. (There are some indexing ideas that I've been discussing with @backslasht that involve packing fixed-width values for fields, which means that you need to know if a field is single- or multi-valued.) That said, if we have a workaround for the more immediate SQL problem, that's a lower priority.

Thanks, @acarbonetto and @normanj-bitquill!


@Override
protected ScaledFloatFieldMapper clone() {
return (ScaledFloatFieldMapper) super.clone();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
import org.apache.lucene.util.automaton.Automata;
import org.apache.lucene.util.automaton.Automaton;
import org.apache.lucene.util.automaton.Operations;
import org.opensearch.common.Explicit;
import org.opensearch.common.collect.Iterators;
import org.opensearch.common.lucene.search.AutomatonQueries;
import org.opensearch.index.analysis.AnalyzerScope;
Expand Down Expand Up @@ -156,6 +157,12 @@ public static class Builder extends ParametrizedFieldMapper.Builder {
final Parameter<String> indexOptions = TextParams.indexOptions(m -> toType(m).indexOptions);
final Parameter<Boolean> norms = TextParams.norms(true, m -> ft(m).getTextSearchInfo().hasNorms());
final Parameter<String> termVectors = TextParams.termVectors(m -> toType(m).termVectors);
final Parameter<Explicit<Boolean>> multivalued = Parameter.explicitBoolParam(
"multivalued",
true,
m -> toType(m).multivalued,
false
);

private final Parameter<Map<String, String>> meta = Parameter.metaParam();

Expand All @@ -178,6 +185,7 @@ protected List<Parameter<?>> getParameters() {
indexOptions,
norms,
termVectors,
multivalued,
meta
);
}
Expand Down Expand Up @@ -628,6 +636,8 @@ public SpanQuery spanPrefixQuery(String value, SpanMultiTermQueryWrapper.SpanRew

private final IndexAnalyzers indexAnalyzers;

private final Explicit<Boolean> multivalued;

public SearchAsYouTypeFieldMapper(
String simpleName,
SearchAsYouTypeFieldType mappedFieldType,
Expand All @@ -646,6 +656,7 @@ public SearchAsYouTypeFieldMapper(
this.indexOptions = builder.indexOptions.getValue();
this.termVectors = builder.termVectors.getValue();
this.indexAnalyzers = builder.analyzers.indexAnalyzers;
this.multivalued = builder.multivalued.getValue();
}

@Override
Expand Down Expand Up @@ -684,6 +695,15 @@ public SearchAsYouTypeFieldType fieldType() {
return (SearchAsYouTypeFieldType) super.fieldType();
}

boolean multivalued() {
return multivalued.value();
}

@Override
public boolean isMultivalue() {
return multivalued.explicit() && multivalued.value() != null && multivalued.value();
}

public int maxShingleSize() {
return maxShingleSize;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ protected void registerParameters(ParameterChecker checker) throws IOException {
b -> b.field("ignore_malformed", true),
m -> assertTrue(((ScaledFloatFieldMapper) m).ignoreMalformed())
);
checker.registerUpdateCheck(b -> b.field("multivalued", true), m -> assertTrue(((ScaledFloatFieldMapper) m).multivalued()));
}

public void testExistsQueryDocValuesDisabled() throws IOException {
Expand Down Expand Up @@ -359,6 +360,37 @@ private void doTestIgnoreMalformed(Object value, String exceptionMessageContains
assertEquals(0, fields.length);
}

public void testMultivalued() throws Exception {
DocumentMapper mapper = createDocumentMapper(
fieldMapping(b -> b.field("type", "scaled_float").field("scaling_factor", 10.0).field("multivalued", true))
);
ThrowingRunnable runnable = () -> mapper.parse(
new SourceToParse(
"test",
"1",
BytesReference.bytes(XContentFactory.jsonBuilder().startObject().field("field", "1.34").endObject()),
MediaTypeRegistry.JSON
)
);
MapperParsingException e = expectThrows(MapperParsingException.class, runnable);
assertThat(
e.getMessage(),
containsString("object mapping [field] trying to serialize a scalar value [1.34] for a multi-valued field")
);

ParsedDocument doc = mapper.parse(
new SourceToParse(
"test",
"1",
BytesReference.bytes(XContentFactory.jsonBuilder().startObject().field("field", List.of("1.34", "2.35")).endObject()),
MediaTypeRegistry.JSON
)
);

IndexableField[] fields = doc.rootDoc().getFields("field");
assertEquals(4, fields.length);
}

public void testNullValue() throws IOException {
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
ParsedDocument doc = mapper.parse(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,9 @@
import org.apache.lucene.search.SynonymQuery;
import org.apache.lucene.search.TermQuery;
import org.opensearch.common.lucene.search.MultiPhrasePrefixQuery;
import org.opensearch.common.xcontent.XContentFactory;
import org.opensearch.core.common.Strings;
import org.opensearch.core.common.bytes.BytesReference;
import org.opensearch.core.xcontent.MediaTypeRegistry;
import org.opensearch.core.xcontent.XContentBuilder;
import org.opensearch.index.IndexSettings;
Expand Down Expand Up @@ -129,6 +131,8 @@ protected void registerParameters(ParameterChecker checker) throws IOException {
b.field("search_quote_analyzer", "keyword");
}, m -> assertEquals("keyword", m.fieldType().getTextSearchInfo().getSearchQuoteAnalyzer().name()));

checker.registerUpdateCheck(b -> b.field("multivalued", true), m -> assertTrue(((SearchAsYouTypeFieldMapper) m).multivalued()));

}

protected void writeFieldValue(XContentBuilder builder) throws IOException {
Expand Down Expand Up @@ -636,6 +640,35 @@ public void testMultiMatchBoolPrefix() throws IOException {
);
}

public void testMultivalued() throws Exception {
DocumentMapper mapper = createDocumentMapper(fieldMapping(b -> b.field("type", "search_as_you_type").field("multivalued", true)));
ThrowingRunnable runnable = () -> mapper.parse(
new SourceToParse(
"test",
"1",
BytesReference.bytes(XContentFactory.jsonBuilder().startObject().field("field", "foo").endObject()),
MediaTypeRegistry.JSON
)
);
MapperParsingException e = expectThrows(MapperParsingException.class, runnable);
assertThat(
e.getMessage(),
containsString("object mapping [field] trying to serialize a scalar value [foo] for a multi-valued field")
);

ParsedDocument doc = mapper.parse(
new SourceToParse(
"test",
"1",
BytesReference.bytes(XContentFactory.jsonBuilder().startObject().field("field", List.of("foo", "bar")).endObject()),
MediaTypeRegistry.JSON
)
);

IndexableField[] fields = doc.rootDoc().getFields("field");
assertEquals(2, fields.length);
}

public void testAnalyzerSerialization() throws IOException {
MapperService ms = createMapperService(fieldMapping(b -> {
b.field("type", "search_as_you_type");
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

package org.opensearch.index.mapper;

import org.opensearch.common.xcontent.XContentFactory;
import org.opensearch.core.xcontent.XContentBuilder;
import org.opensearch.test.OpenSearchIntegTestCase;

import java.io.IOException;
import java.util.List;

import static org.opensearch.test.hamcrest.OpenSearchAssertions.assertAcked;
import static org.hamcrest.Matchers.equalTo;

public class MultivaluedFieldsIntegrationIT extends OpenSearchIntegTestCase {
public void testMultivaluedFields() throws Exception {
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMultivaluedTypeSource()));
XContentBuilder singleValueSource = XContentFactory.jsonBuilder().startObject().field("title", "Hello world").endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource))
.get()
.hasFailures(),
equalTo(true)
);
XContentBuilder multiValueSource = XContentFactory.jsonBuilder()
.startObject()
.field("title", List.of("Hello world", "abcdef"))
.endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource))
.get()
.hasFailures(),
equalTo(false)
);
}

public void testGeoPointMultivaluedField() throws Exception {
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("geo_point")));
XContentBuilder singleValueSource = XContentFactory.jsonBuilder()
.startObject()
.startObject("a")
.field("lat", 40.71)
.field("lon", 74.0)
.endObject()
.endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource))
.get()
.hasFailures(),
equalTo(true)
);
XContentBuilder multiValueSource = XContentFactory.jsonBuilder()
.startObject()
.startArray("a")
.startObject()
.field("lat", 40.71)
.field("lon", 74.0)
.endObject()
.startObject()
.field("lat", 63.45)
.field("lon", 123.79)
.endObject()
.endArray()
.endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource))
.get()
.hasFailures(),
equalTo(false)
);
}

public void testCompletionMultivaluedField() throws Exception {
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("completion")));
XContentBuilder singleValueSource = XContentFactory.jsonBuilder()
.startObject()
.startObject("a")
.array("input", "foo", "bar")
.field("weight", 10)
.endObject()
.endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource))
.get()
.hasFailures(),
equalTo(true)
);
XContentBuilder multiValueSource = XContentFactory.jsonBuilder()
.startObject()
.startArray("a")
.startObject()
.array("input", "foo", "bar")
.field("weight", 10)
.endObject()
.startObject()
.array("input", "baz", "xyz")
.field("weight", 10)
.endObject()
.endArray()
.endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource))
.get()
.hasFailures(),
equalTo(false)
);
}

public void testIpMultivaluedField() throws Exception {
assertAcked(client().admin().indices().prepareCreate("my-index-multivalued").setMapping(createMappingSource("ip")));
XContentBuilder singleValueSource = XContentFactory.jsonBuilder().startObject().field("a", "127.0.0.1").endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(singleValueSource))
.get()
.hasFailures(),
equalTo(true)
);
XContentBuilder multiValueSource = XContentFactory.jsonBuilder().startObject().array("a", "127.0.0.1", "127.0.0.1").endObject();
assertThat(
client().prepareBulk()
.add(client().prepareIndex().setIndex("my-index-multivalued").setSource(multiValueSource))
.get()
.hasFailures(),
equalTo(false)
);
}

private XContentBuilder createMultivaluedTypeSource() throws IOException {
return XContentFactory.jsonBuilder()
.startObject()
.startObject("properties")
.startObject("title")
.field("type", "text")
.field("multivalued", true)
.startObject("fields")
.startObject("not_analyzed")
.field("type", "keyword")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
}

private XContentBuilder createMappingSource(String fieldType) throws IOException {
return XContentFactory.jsonBuilder()
.startObject()
.startObject("properties")
.startObject("a")
.field("type", fieldType)
.field("multivalued", true)
.endObject()
.endObject()
.endObject();
}
}
Loading
Loading