Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Glob matching implementation is inefficient #12101

Closed
jainankitk opened this issue Jan 31, 2024 · 3 comments
Closed

[BUG] Glob matching implementation is inefficient #12101

jainankitk opened this issue Jan 31, 2024 · 3 comments
Labels
bug Something isn't working good first issue Good for newcomers Search:Resiliency

Comments

@jainankitk
Copy link
Collaborator

jainankitk commented Jan 31, 2024

Describe the bug

The current implementation for glob matching is really inefficient, recursively invoking method for every * pattern, when many of those can be skipped altogether. For example, it breaks for simple match all request with 4k * appended together (assuming small size JVM like 512k):

_search?filter_path=hits" + "*" * 4030, json={"query": {"match_all" : {}}}
+    public static boolean globMatch(String pattern, String str) {
        
        /* code */
        
        if (firstIndex == 0) {
            if (pattern.length() == 1) {
                return true;
            }
            int nextIndex = pattern.indexOf('*', firstIndex + 1);
            if (nextIndex == -1) {
                return str.endsWith(pattern.substring(1));
            } else if (nextIndex == 1) {
                // Double wildcard "**" - skipping the first "*"
+                return globMatch(pattern.substring(1), str);
            }
        /* code */
    }

Extract from Glob.java

Solution
Can be written more efficiently by:

  • not using recursion, OR
  • converting glob to regex pattern and using existing more efficient libraries for matching the pattern against string, OR
  • using more efficient library for glob matching (cons: additional dependency)

Related component

Search:Resiliency

To Reproduce

opensearch-node1       | java.lang.StackOverflowError: null
opensearch-node1       |    at java.lang.String.indexOf(String.java:2423) ~[?:?]
opensearch-node1       |    at java.lang.String.indexOf(String.java:2380) ~[?:?]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:55) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]
opensearch-node1       |    at org.opensearch.common.Glob.globMatch(Glob.java:68) ~[opensearch-common-2.11.1.jar:2.11.1]

Expected behavior

There should not be any stack overflow error

@jainankitk jainankitk added bug Something isn't working untriaged labels Jan 31, 2024
@jainankitk jainankitk changed the title [BUG] Glob matching implementation is very inefficient [BUG] Glob matching implementation is inefficient Jan 31, 2024
@jainankitk jainankitk added good first issue Good for newcomers untriaged and removed untriaged labels Jan 31, 2024
@robinf95
Copy link
Contributor

Would take this issue.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@jainankitk Thanks for filing, @robinf95 Look forward to a PR - thanks for volunteering.

@reta
Copy link
Collaborator

reta commented Jan 31, 2024

Closing as duplicate of #12065

@reta reta closed this as completed Jan 31, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers Search:Resiliency
Projects
Archived in project
Development

No branches or pull requests

4 participants