Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a multivalued property to field mappings (#16420) #16601

Conversation

normanj-bitquill
Copy link

@normanj-bitquill normanj-bitquill commented Nov 8, 2024

Description

A mapping for a field can now contain a property multivalued. multivalued must have a boolean value. It can only be applied to field types that support multiple values.

If the multivalued property has never been set for a field, then it is assumed to be false and will not be returned in the index mapping.

If multivalued is set to false, then there is not change in behaviour except for including multivalued in the index mapping.

If multivalued is set to true, then it is returned in the index mapping. In addition, any new documents inserted must have an array value for the field with multivalued set to true.

The multivalued property is intended to be useful to services that consume index mappings. It indicates that the field will should only contain array values. As an example, the SQL plugin can this to decide on how to process an aggregate operation such as MIN or MAX.

An example mapping that uses multivalued:

{
  "test_integer": {
    "mappings": {
      "properties": {
        "x": {
          "type": "long"
        },
        "y": {
          "type": "integer",
          "multivalued": true
        }
      }
    }
  }
}

Related Issues

Resolves #16420
Relates to: SQL #3137
Relates to: SQL #3138

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Nov 8, 2024

❌ Gradle check result for 70d0100: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for fbc6855: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

* Can only be used for field types that support multiple values
* If a field has the multivalued property, then new documents must have an array for its value

Signed-off-by: Norman Jordan <[email protected]>
Signed-off-by: Norman Jordan <[email protected]>
@normanj-bitquill normanj-bitquill force-pushed the field-mapping-multivalued branch from fbc6855 to 98d377b Compare December 5, 2024 23:05
Copy link
Contributor

github-actions bot commented Dec 6, 2024

✅ Gradle check result for 98d377b: SUCCESS

Copy link

codecov bot commented Dec 6, 2024

Codecov Report

Attention: Patch coverage is 87.73585% with 13 lines in your changes missing coverage. Please review.

Project coverage is 72.06%. Comparing base (42dc22e) to head (98d377b).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
...pensearch/index/mapper/ScaledFloatFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
...earch/index/mapper/SearchAsYouTypeFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
...arch/index/mapper/AbstractGeometryFieldMapper.java 95.00% 0 Missing and 1 partial ⚠️
...org/opensearch/index/mapper/BinaryFieldMapper.java 83.33% 0 Missing and 1 partial ⚠️
...rg/opensearch/index/mapper/BooleanFieldMapper.java 83.33% 0 Missing and 1 partial ⚠️
...opensearch/index/mapper/CompletionFieldMapper.java 83.33% 0 Missing and 1 partial ⚠️
...a/org/opensearch/index/mapper/DateFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
...ava/org/opensearch/index/mapper/IpFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
...rg/opensearch/index/mapper/KeywordFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
...org/opensearch/index/mapper/NumberFieldMapper.java 80.00% 0 Missing and 1 partial ⚠️
... and 3 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16601      +/-   ##
============================================
+ Coverage     72.05%   72.06%   +0.01%     
- Complexity    65183    65206      +23     
============================================
  Files          5318     5318              
  Lines        303993   304080      +87     
  Branches      43990    43997       +7     
============================================
+ Hits         219028   219123      +95     
+ Misses        67046    66921     -125     
- Partials      17919    18036     +117     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -135,6 +137,7 @@ public Builder(String name, boolean ignoreMalformedByDefault, boolean coerceByDe
ignoreMalformedByDefault
);
this.coerce = Parameter.explicitBoolParam("coerce", true, m -> toType(m).coerce, coerceByDefault);
this.multivalued = Parameter.explicitBoolParam("multivalued", true, m -> toType(m).multivalued, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding an explicitBoolParam to each FieldMapper, I would suggest that we treat this similarly to indexed, hasDocValues, and stored. That is, I would add a new static multivaluedParam helper function to the Parameter class.

Comment on lines +427 to +430
@Override
public boolean isMultivalue() {
return multivalued.explicit() && multivalued.value() != null && multivalued.value();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned with the idea of treating "unset" the same as false.

In particular, existing fields may or may not be multi-valued. We just don't know. I think we should/must treat the null (or unset) case as ambiguous.

In my opinion, the more interesting case is when a user defines a new mapping and explicitly says "multivalued":false. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception. If a field is explicitly specified as "multivalued":true, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.

@normanj-bitquill -- what do you think? I know it's different from the approach you took, but I think it will provide both backward-compatibility and let us clearly identify single-valued fields in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the more interesting case is when a user defines a new mapping and explicitly says "multivalued":false. At that point, if the field is specified multiple times in a document, or specified as an array, we should throw an exception.

I am inclined to agree, except I suggest that we make the system more forgiving. If "multivalued":false, then if the field is in fact multi-valued, then we should return the first item encountered without guaranteeing any order. This may mean that the user will get back different values from different requests, but that's what they're explicitly requesting (any value is good enough).

If a field is explicitly specified as "multivalued":true, I'm inclined to accept a single value, not passed in an array -- but I would always return it wrapped in an array.

This seems more obvious and straight-forward.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh I'm OK with inverting the ingestion check. By that I mean only give an error when a field contains multiple values, but "multivalued": false is set. No errors in other cases. @acarbonetto was curious about if we could do anything different with the results that are returned. Once that discussion is resolved, I'll update this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see a hint that affects how results are returned from OpenSearch. We don't have strict array objects, but this should hint at returning arrays or non-single values.

For example, if a min/max aggregation is performed on a field that has multivalued=true, I would expect that the results would be an array/multivalued.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this PR as discussed. If there ever is a fix related to this in the future it would be around how aggregates handle multi-valued fields.

Copy link
Collaborator

@msfroh msfroh Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if a min/max aggregation is performed on a field that has multivalued=true, I would expect that the results would be an array/multivalued.

I think that makes a lot of sense. Part of the original motivation for the #16420 was that @anirudha was asking how the SQL plugin could return results in the appropriate format.

Moving forward, I would love to change the default behavior from "unknown" to multivalued=false, so that you would need to explicitly mark a multivalued field as such. Unfortunately, that's a (potentially) disruptive change for dynamically-added fields. I'll write down some more thoughts on #16420 on how to handle that.

The good news is that we can address this purely at the OpenSearch mappings layer, leaving the underlying Lucene logic unchanged. (Lucene implicitly allows any field to be multi-valued, though there are some optimizations for cases where we can guarantee at search time that a field is single-valued.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@normanj-bitquill -- do you plan to open another PR to flip the logic?

Once we have the property in place, we should be able to use it for rendering at query time (especially from the SQL plugin).

Incidentally, I think I would add multivalued as a Boolean field on MappedFieldType, since I believe it could be applicable to every subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YANG-DB suggested mapping._meta#field.name.type:array. Is that better than multivalued?
opensearch-project/sql#3138 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that should at least address the SQL side of things.

I would still like to address this in OpenSearch core, as we move towards a more strictly-defined schema. (There are some indexing ideas that I've been discussing with @backslasht that involve packing fixed-width values for fields, which means that you need to know if a field is single- or multi-valued.) That said, if we have a workaround for the more immediate SQL problem, that's a lower priority.

Thanks, @acarbonetto and @normanj-bitquill!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing Search:Query Capabilities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add mapping information for single-/multi-valued fields
3 participants