Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding access to noSubMatches and noOverlappingMatches in Hyphenation… #13895

Conversation

hasnain2808
Copy link
Contributor

@hasnain2808 hasnain2808 commented May 30, 2024

Description

This change adds support for / exposes two new settings (noSubMatches and noOverlappingMatches) that were added to Lucene's HyphenationCompoundWordTokenFilter class.

Related Issues

Resolves #8796
Based on of #10765

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers low hanging fruit Search Search query, autocomplete ...etc Search:Relevance labels May 30, 2024
@hasnain2808 hasnain2808 force-pushed the issue-8796/expose-new-lucene-filter-settings branch 2 times, most recently from 5abc8ec to 3d5ffdc Compare May 30, 2024 14:16
Copy link
Contributor

❌ Gradle check result for 5abc8ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5abc8ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7b2142e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Mohammad Hasnain <[email protected]>
@jainankitk
Copy link
Collaborator

@hasnain2808 - It seems the spotless check is failing. Can you fix those?

Execution failed for task ':modules:analysis-common:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/analysis/common/CompoundAnalysisTests.java
          @@ -35,7 +35,6 @@
           import·org.apache.lucene.analysis.Analyzer;
           import·org.apache.lucene.analysis.TokenStream;
           import·org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
          -import·org.junit.Before;
           import·org.opensearch.Version;
           import·org.opensearch.cluster.metadata.IndexMetadata;
           import·org.opensearch.common.settings.Settings;
          @@ -51,6 +50,7 @@
           import·org.opensearch.test.IndexSettingsModule;
           import·org.opensearch.test.OpenSearchTestCase;
           import·org.hamcrest.MatcherAssert;
          +import·org.junit.Before;
           
           import·java.io.IOException;
           import·java.io.InputStream;
  Run './gradlew :modules:analysis-common:spotlessApply' to fix these violations.

Signed-off-by: Mohammad Hasnain <[email protected]>
@jainankitk jainankitk added the backport 2.x Backport to 2.x branch label Aug 13, 2024
@hasnain2808
Copy link
Contributor Author

@hasnain2808 - It seems the spotless check is failing. Can you fix those?

Execution failed for task ':modules:analysis-common:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/analysis/common/CompoundAnalysisTests.java
          @@ -35,7 +35,6 @@
           import·org.apache.lucene.analysis.Analyzer;
           import·org.apache.lucene.analysis.TokenStream;
           import·org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
          -import·org.junit.Before;
           import·org.opensearch.Version;
           import·org.opensearch.cluster.metadata.IndexMetadata;
           import·org.opensearch.common.settings.Settings;
          @@ -51,6 +50,7 @@
           import·org.opensearch.test.IndexSettingsModule;
           import·org.opensearch.test.OpenSearchTestCase;
           import·org.hamcrest.MatcherAssert;
          +import·org.junit.Before;
           
           import·java.io.IOException;
           import·java.io.InputStream;
  Run './gradlew :modules:analysis-common:spotlessApply' to fix these violations.

Done
Weird this error was missed

Copy link
Collaborator

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh @mch2 - Can one of you help merge this change?

Copy link
Contributor

❌ Gradle check result for 6a88bb0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8752b76: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hasnain2808
Copy link
Contributor Author

hasnain2808 commented Aug 19, 2024

@msfroh @mch2 - Can one of you help merge this change?

@msfroh @mch2 could you please have a look at this mini pr 🙂

Copy link
Contributor

❌ Gradle check result for 8752b76: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 8752b76: SUCCESS

@hasnain2808
Copy link
Contributor Author

I cannot merge even after approval 😢
Need your help again @msfroh 😄

@jainankitk jainankitk merged commit ce64fac into opensearch-project:main Aug 21, 2024
37 of 39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 21, 2024
#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <[email protected]>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: linting

Signed-off-by: Mohammad Hasnain <[email protected]>

---------

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>
(cherry picked from commit ce64fac)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@jainankitk
Copy link
Collaborator

I cannot merge even after approval 😢 Need your help again @msfroh 😄

Merged! :)

jainankitk pushed a commit that referenced this pull request Aug 21, 2024
#13895) (#15329)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter



* Add Changelog Entry



* test: add hyphenation decompounder tests



* test: refactor tests



* test: reformat test files



* chore: add changelog entry for 2.X



* chore: remove 3.x changelog



* chore: commonify settingsarr



* chore: commonify settingsarr



* chore: linting



---------





(cherry picked from commit ce64fac)

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evan Kielley <[email protected]>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <[email protected]>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: linting

Signed-off-by: Mohammad Hasnain <[email protected]>

---------

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>
shiv0408 added a commit to shiv0408/OpenSearch that referenced this pull request Sep 2, 2024
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <[email protected]>

* Add unit test

Signed-off-by: Michael Froh <[email protected]>

* Add changelog entry

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <[email protected]>

* Improve test coverage

Signed-off-by: Michael Froh <[email protected]>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <[email protected]>

* Address comments from @mch2

Signed-off-by: Michael Froh <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>

* Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <[email protected]>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: linting

Signed-off-by: Mohammad Hasnain <[email protected]>

---------

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>

* Add Settings related to Workload Management feature (opensearch-project#15028)

* add QeryGroup Service tests
Signed-off-by: Ruirui Zhang <[email protected]>

* add PR to changelog
Signed-off-by: Ruirui Zhang <[email protected]>

* change the test directory
Signed-off-by: Ruirui Zhang <[email protected]>

* modify comments to be more specific
Signed-off-by: Ruirui Zhang <[email protected]>

* add test coverage
Signed-off-by: Ruirui Zhang <[email protected]>

* remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* Update affiliation for @nknize. (opensearch-project#15322)

Signed-off-by: dblock <[email protected]>

* Add log when download completes with file size (opensearch-project#15224)

Signed-off-by: Gaurav Bafna <[email protected]>

* Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352)

Signed-off-by: Andriy Redko <[email protected]>

* Add support for index level slice count setting (opensearch-project#15336)

Signed-off-by: Ganesh Ramadurai <[email protected]>

* Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325)

* Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common.

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Remove duplicate test in ingest-common

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Adding changelog

Signed-off-by: Sarat Vemulapalli <[email protected]>

---------

Signed-off-by: Sarat Vemulapalli <[email protected]>

* Add Delete QueryGroup API Logic (opensearch-project#14735)

* Add Delete QueryGroup API Logic
Signed-off-by: Ruirui Zhang <[email protected]>

* modify changelog
Signed-off-by: Ruirui Zhang <[email protected]>

* include comments from create pr
Signed-off-by: Ruirui Zhang <[email protected]>

* remove delete all
Signed-off-by: Ruirui Zhang <[email protected]>

* rebase and address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* rebase
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* address comments
Signed-off-by: Ruirui Zhang <[email protected]>

* add UT coverage
Signed-off-by: Ruirui Zhang <[email protected]>

* [Star Tree] Lucene Abstractions for Star Tree File Formats  (opensearch-project#15278)

---------
Signed-off-by: Sarthak Aggarwal <[email protected]>

* [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152)

---------
Signed-off-by: Bharathwaj G <[email protected]>

* relaxing the join validation for nodes which have only store disabled but only publication enabled

* relaxing the join validation for nodes which have only store disabled but only publication enabled

Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]>

---------

Signed-off-by: Michael Froh <[email protected]>
Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Signed-off-by: dblock <[email protected]>
Signed-off-by: Gaurav Bafna <[email protected]>
Signed-off-by: Andriy Redko <[email protected]>
Signed-off-by: Ganesh Ramadurai <[email protected]>
Signed-off-by: Sarat Vemulapalli <[email protected]>
Signed-off-by: Rajiv Kumar Vaidyanathan <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
Co-authored-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>
Co-authored-by: Ruirui Zhang <[email protected]>
Co-authored-by: Daniel (dB.) Doubrovkine <[email protected]>
Co-authored-by: Gaurav Bafna <[email protected]>
Co-authored-by: Andriy Redko <[email protected]>
Co-authored-by: Ganesh Krishna Ramadurai <[email protected]>
Co-authored-by: Sarat Vemulapalli <[email protected]>
Co-authored-by: Sarthak Aggarwal <[email protected]>
Co-authored-by: Bharathwaj G <[email protected]>
Co-authored-by: Rajiv Kumar Vaidyanathan <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <[email protected]>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <[email protected]>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <[email protected]>

* chore: linting

Signed-off-by: Mohammad Hasnain <[email protected]>

---------

Signed-off-by: Evan Kielley <[email protected]>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <[email protected]>
Signed-off-by: Mohammad Hasnain <[email protected]>
Co-authored-by: Evan Kielley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers low hanging fruit Search:Relevance Search Search query, autocomplete ...etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide access to new settings for HyphenationCompoundWordTokenFilter
5 participants