Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] default_search analyzer in index settings overrides the analyzer defined in mapping #11100

Closed
gaobinlong opened this issue Nov 6, 2023 · 6 comments
Labels
bug Something isn't working Indexing & Search

Comments

@gaobinlong
Copy link
Collaborator

Describe the bug
When there's a default_search analyzer defined in index settings and an analyzer defined in the mapping of a field, when indexing, the analyzer in mapping is used, but when searching, the default_search analyzer will be used, so the search results are not as expected.

To Reproduce

  1. Create a index
PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "default_search": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}
  1. Index a doc
POST test/_doc/1?refresh
{
  "text": "a-11"
}
  1. Search the index
POST test/_search
{
  "query": {
    "match": {
      "text": "a-11"
    }
  }
}

, nothing return.

Expected behavior
The analyzer defined in mapping takes precedence over the default_search analyzer in settings.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [2.9]
@gaobinlong gaobinlong added bug Something isn't working untriaged labels Nov 6, 2023
@reta
Copy link
Collaborator

reta commented Nov 6, 2023

@gaobinlong this is expected behaviour, the analyzer is indexing analyzer, the search_analyzer should be used instead in the mappings:

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace",
        "search_analyzer": "whitespace"
      }
    }
  }

@gaobinlong
Copy link
Collaborator Author

@gaobinlong this is expected behaviour, the analyzer is indexing analyzer, the search_analyzer should be used instead in the mappings:

"mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace",
        "search_analyzer": "whitespace"
      }
    }
  }

I think if no search_analyzer specified, analyzer will be used at both indexing time and search time, in the above case, if no default_search defined in settings, it works well:

PUT test
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

POST test/_doc/1
{
  "text": "a-11"
}

POST test/_search
{
  "query": {
    "match": {
      "text": "a-11"
    }
  }
}


POST _analyze
{
  "text":"a-11",
  "analyzer": "whitespace"
}

, and if we change default_search to default in settings, it also works well, only whitespace analyzer is used at indexing time and search time:

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "default": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "analyzer": "whitespace"
      }
    }
  }
}

@reta
Copy link
Collaborator

reta commented Nov 8, 2023

@gaobinlong shamelessly quoiting Elasticsearch docs [1] (that we have inherited). At search time, Elasticsearch determines which analyzer to use by checking the following parameters in order:

  1. The analyzer parameter in the search query. See Specify the search analyzer for a query.

  2. The search_analyzer mapping parameter for the field. See Specify the search analyzer for a field.

  3. The analysis.analyzer.default_search index setting. See Specify the default search analyzer for an index.

  4. The analyzer mapping parameter for the field. See Specify the analyzer for a field.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/specify-analyzer.html

@msfroh
Copy link
Collaborator

msfroh commented Nov 8, 2023

Yeah -- while (perhaps) confusing, it's longstanding behavior that search analyzers will take precedence over the index-time analyzers.

Maybe we could define a new field-level analyzer parameter (field_analyzer?) that implicitly sets both the index and search analyzer for the field, such that it would override the index-wide default search analyzer.

@gaobinlong
Copy link
Collaborator Author

In my understanding, analyzer defined in the mapping of the field is already field-level, I don't know why default_search analyzer will override the implicit search analyzer defined in mapping, and I see this in the document of ES: Unless overridden with the search_analyzer mapping parameter, this analyzer is used for both index and search analysis.

@gaobinlong
Copy link
Collaborator Author

Close this issue as we didn't reach consensus, will open a new one if users complain about it.

@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Indexing & Search
Projects
Archived in project
Development

No branches or pull requests

5 participants