Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not evaluate shard_size and shard_min_doc_count at segment slice level #9085

Merged
merged 4 commits into from
Aug 9, 2023

Conversation

jed326
Copy link
Collaborator

@jed326 jed326 commented Aug 3, 2023

Description

Refactored InternalTerms and InternalSignificantTerms to take BucketCountThresholds in the constructor. Using this we can determine which values to use when doing reduce at the shard vs coordinator level.

This resolves the test failures in TermsShardMinDocCountIT and ShardSizeTermsIT, however it does not address the failures in TermsDocCountErrorIT.

Related Issues

Resolves #8860

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2023

Gradle Check (Jenkins) Run Completed with:

@jed326 jed326 changed the title temp Do not evaluate shard_size and shard_min_doc_count at segment slice level Aug 3, 2023
@jed326 jed326 force-pushed the shard-params-fix branch from 95e8eab to d1e5d1c Compare August 3, 2023 18:34
@jed326
Copy link
Collaborator Author

jed326 commented Aug 3, 2023

The TermsShardMinDocCountIT ITs should pass now. There are 2 things that I'm still exploring:

  1. In the reduce method we just use the same min_doc_count and required_size and don't need to check ReduceContext::isSliceLevel.
    • This seems fine for InternalSignificantTerms, however for InternalTerms the min_doc_count evaluation actually happens after reduce so that probably won't work.
  2. In the buildAggregations method, we can pass in different values for min_doc_count and required_size based on whether or not it's at the shard or coordinator level.
    • First of all, it doesn't look like there's currently a great way to determine where buildAggregations is being called. However, even if there is then the problem with InternalTerms in [1] above makes this difficult to use.

@reta
Copy link
Collaborator

reta commented Aug 9, 2023

It looks pretty clean, thanks @jed326 !

@jed326 jed326 force-pushed the shard-params-fix branch from 2b210a6 to 3b55340 Compare August 9, 2023 17:42
@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 36m 56s

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:



> Task :checkCompatibility
Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git]
Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

BUILD SUCCESSFUL in 35m 33s

@jed326
Copy link
Collaborator Author

jed326 commented Aug 9, 2023

https://build.ci.opensearch.org/job/gradle-check/22262/#showFailuresLink

Build failures are from unrelated cluster routing tests. @reta I think we are good to merge if it looks good to you. Thanks!

@reta
Copy link
Collaborator

reta commented Aug 9, 2023

Build failures are from unrelated cluster routing tests. @reta I think we are good to merge if it looks good to you. Thanks!

Yes, flaky tests but we need green checks, rerurn it

@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2023

Gradle Check (Jenkins) Run Completed with:

@reta reta merged commit 15b7de0 into opensearch-project:main Aug 9, 2023
@reta reta added the backport 2.x Backport to 2.x branch label Aug 9, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-9085-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 15b7de00f2ff75587560ac6e38ffb20da030178e
# Push it to GitHub
git push --set-upstream origin backport/backport-9085-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-9085-to-2.x.

@reta
Copy link
Collaborator

reta commented Aug 9, 2023

@jed326 sadly backport failed, could you please send manual one?

neetikasinghal pushed a commit to neetikasinghal/OpenSearch that referenced this pull request Aug 9, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
jed326 added a commit to jed326/OpenSearch that referenced this pull request Aug 9, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
jed326 added a commit to jed326/OpenSearch that referenced this pull request Aug 9, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
jed326 added a commit to jed326/OpenSearch that referenced this pull request Aug 10, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
reta pushed a commit that referenced this pull request Aug 10, 2023
…evel (#9085) (#9211)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search



* Addressing comments



* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms



* Introduce LocalBucketCountThresholds for local size and min_doc_count values



---------

Signed-off-by: Jay Deng <[email protected]>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
@jed326 jed326 deleted the shard-params-fix branch September 13, 2023 19:10
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…evel (opensearch-project#9085)

* Use BucketCountThresholds in InternalTerms and InternalAggregations and do not apply shard level thresholds at slice level for Concurrent Segment Search

Signed-off-by: Jay Deng <[email protected]>

* Addressing comments

Signed-off-by: Jay Deng <[email protected]>

* Re-introduce shardSize member to InternalMultiTerms and InternalMappedTerms

Signed-off-by: Jay Deng <[email protected]>

* Introduce LocalBucketCountThresholds for local size and min_doc_count values

Signed-off-by: Jay Deng <[email protected]>

---------

Signed-off-by: Jay Deng <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Concurrent Segment Search] shard_min_doc_count and shard_size should not be evaluated at the slice level
3 participants