Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update knn query spec #538

Merged
merged 4 commits into from
Aug 29, 2024
Merged

Conversation

alex-keeler
Copy link
Contributor

Description

Updates the knn query spec to match how it is currently implemented in OpenSearch.

Issues Resolved

Closes #535

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Alex Keeler <[email protected]>
@alex-keeler
Copy link
Contributor Author

The change was a bit more involved than I had originally anticipated.
Based off what I can tell, it appears the existing spec was ripped from the equivalent Elasticsearch spec. The OpenSearch implementation differs pretty significantly from this spec -- notably that the fields parameter has been replaced with a nested object, which uses the field name as the key. This nested object is what contains the rest of the parameters. Such behavior provided some challenges in designing the OpenAPI spec, so if there's a better alternative to my solution I'd be happy to update it. I also split the API between the KnnQuery and a new KnnField component to try and simplify the nested object business -- that can also be consolidated if deemed necessary.

Copy link
Contributor

github-actions bot commented Aug 28, 2024

Changes Analysis

Commit SHA: 7f5b7b8
Comparing To SHA: 19421f5

API Changes

Summary

├─┬Paths
│ ├─┬/{index}/_search
│ │ ├─┬GET
│ │ │ └─┬Requestbody
│ │ │   └─┬application/json
│ │ │     └─┬Schema
│ │ │       └─┬knn
│ │ │         ├─┬ONEOF
│ │ │         │ └──[🔀] $ref (37228:13)❌ 
│ │ │         └─┬ONEOF
│ │ │           └──[🔀] $ref (24545:21)❌ 
│ │ └─┬POST
│ │   └─┬Requestbody
│ │     └─┬application/json
│ │       └─┬Schema
│ │         └─┬knn
│ │           ├─┬ONEOF
│ │           │ └──[🔀] $ref (37228:13)❌ 
│ │           └─┬ONEOF
│ │             └──[🔀] $ref (24545:21)❌ 
│ └─┬/_search
│   ├─┬GET
│   │ └─┬Requestbody
│   │   └─┬application/json
│   │     └─┬Schema
│   │       └─┬knn
│   │         ├─┬ONEOF
│   │         │ └──[🔀] $ref (37228:13)❌ 
│   │         └─┬ONEOF
│   │           └──[🔀] $ref (24545:21)❌ 
│   └─┬POST
│     └─┬Requestbody
│       └─┬application/json
│         └─┬Schema
│           └─┬knn
│             ├─┬ONEOF
│             │ └──[🔀] $ref (37228:13)❌ 
│             └─┬ONEOF
│               └──[🔀] $ref (24545:21)❌ 
└─┬Components
  ├──[➖] schemas (29757:7)❌ 
  ├──[➖] schemas (28587:7)❌ 
  ├──[➖] schemas (28996:7)❌ 
  ├──[➕] schemas (37228:7)
  ├──[➕] schemas (28587:7)
  ├─┬_common.query_dsl:QueryContainer
  │ └──[➕] properties (37774:9)
  └─┬_core.msearch:MultisearchBody
    └─┬knn
      ├─┬ONEOF
      │ └──[🔀] $ref (37228:13)❌ 
      └─┬ONEOF
        └──[🔀] $ref (38960:15)❌ 

Document Element Total Changes Breaking Changes
paths 8 8
components 8 5
  • BREAKING Changes: 13 out of 16
  • Modifications: 10
  • Removals: 3
  • Additions: 3
  • Breaking Removals: 3
  • Breaking Modifications: 10

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10615313001/artifacts/1869972004

API Coverage

Before After Δ
Covered (%) 533 (52.2 %) 533 (52.2 %) 0 (0 %)
Uncovered (%) 488 (47.8 %) 488 (47.8 %) 0 (0 %)
Unknown 26 26 0

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 great work

  • Update CHANGELOG.
  • Annotate the APIs with x-version-added and such, 1.3.x tests fail with "[knn] query does not support [min_score])" for example.

@nhtruong @Xtansia might have some suggestions around the spec itself.

num_candidates:
description: The number of nearest neighbor candidates to consider per shard
min_score:
description: The minimum similarity score for a neighbor to be considered a hit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add periods at the end of these descriptions.

dblock
dblock previously approved these changes Aug 28, 2024
@dblock
Copy link
Member

dblock commented Aug 28, 2024

For lints, probably just easier to sort values.

/home/runner/work/opensearch-api-specification/opensearch-api-specification/tests/default/_core/search/knn.yaml
Error:    36:31  error  Expected sequence values to be in ascending order. '2.3' should be before '3.5'  yml/sort-sequence-values

Tests also need to be limited to >= 2.14 with version:, it's not smart enough to know that you're trying to test APIs that were added later to skip them.

@dblock
Copy link
Member

dblock commented Aug 28, 2024

Looks like the filter query doesn't work in these old versions either. Not sure whether that's expected.

[INFO] => POST /movies/_search ({}) [application/json] {
  "query": {
    "knn": {
      "embedding": {
        "vector": [
          1.4,
          2.3,
          3.5,
          4.1,
          9.2
        ],
        "k": 1,
        "filter": {
          "term": {
            "year": 2012
          }
        }
      }
    }
  }
}
[INFO] <= 400 (application/json) | {
  "root_cause": [
    {
      "type": "parsing_exception",
      "reason": "[knn] unknown token [START_OBJECT] after [filter]",
      "line": 1,
      "col": 77
    }
  ],
  "type": "parsing_exception",
  "reason": "[knn] unknown token [START_OBJECT] after [filter]",
  "line": 1,
  "col": 77
}

You can reproduce these failures locally pretty easily btw.

OPENSEARCH_VERSION=1.3.7 docker compose up
npm run test:spec--insecure -- --tests tests/default/_core/search/knn.yaml --verbose

@alex-keeler
Copy link
Contributor Author

Thanks for the heads up, I've only been testing against 2.16.0 locally. I'll go through and make sure all is well with earlier releases. I'm sure the failures are as annoying for you as they are for me.
I'll have to do some digging to see when the filter parameter was added. I didn't realize it came in a later release (I'm assuming, anyway).

@dblock
Copy link
Member

dblock commented Aug 28, 2024

@navneet1v from k-nn might be interested in this and can help make sure things are properly version-annotated

@navneet1v
Copy link

Looks like the filter query doesn't work in these old versions either. Not sure whether that's expected.

[INFO] => POST /movies/_search ({}) [application/json] {
  "query": {
    "knn": {
      "embedding": {
        "vector": [
          1.4,
          2.3,
          3.5,
          4.1,
          9.2
        ],
        "k": 1,
        "filter": {
          "term": {
            "year": 2012
          }
        }
      }
    }
  }
}
[INFO] <= 400 (application/json) | {
  "root_cause": [
    {
      "type": "parsing_exception",
      "reason": "[knn] unknown token [START_OBJECT] after [filter]",
      "line": 1,
      "col": 77
    }
  ],
  "type": "parsing_exception",
  "reason": "[knn] unknown token [START_OBJECT] after [filter]",
  "line": 1,
  "col": 77
}

You can reproduce these failures locally pretty easily btw.

OPENSEARCH_VERSION=1.3.7 docker compose up
npm run test:spec--insecure -- --tests tests/default/_core/search/knn.yaml --verbose

The filter query specifically provided here doesn't work for old version because filters works for 2.4 and above.

@alex-keeler
Copy link
Contributor Author

Looks like the filter query doesn't work in these old versions either. Not sure whether that's expected.

[INFO] => POST /movies/_search ({}) [application/json] {
  "query": {
    "knn": {
      "embedding": {
        "vector": [
          1.4,
          2.3,
          3.5,
          4.1,
          9.2
        ],
        "k": 1,
        "filter": {
          "term": {
            "year": 2012
          }
        }
      }
    }
  }
}
[INFO] <= 400 (application/json) | {
  "root_cause": [
    {
      "type": "parsing_exception",
      "reason": "[knn] unknown token [START_OBJECT] after [filter]",
      "line": 1,
      "col": 77
    }
  ],
  "type": "parsing_exception",
  "reason": "[knn] unknown token [START_OBJECT] after [filter]",
  "line": 1,
  "col": 77
}

You can reproduce these failures locally pretty easily btw.

OPENSEARCH_VERSION=1.3.7 docker compose up
npm run test:spec--insecure -- --tests tests/default/_core/search/knn.yaml --verbose

The filter query specifically provided here doesn't work for old version because filters works for 2.4 and above.

Thank you for the info. I ended up restraining the filter test to >= 2.9, as it appears the faiss engine didn't get filter support until then. I tried switching the embedding to lucene, but found out that lucene knn support wasn't added until version 2.2, so I figured it was better to keep things as is. For extra safety, I also constrained the entire test to >= 1.2, which is when faiss k-NN support was added.

Copy link
Contributor

Spec Test Coverage Analysis

Total Tested
559 360 (64.4 %)

@dblock dblock merged commit 14d819b into opensearch-project:main Aug 29, 2024
16 checks passed
@dblock
Copy link
Member

dblock commented Aug 29, 2024

Thanks so much for your help @alex-keeler. Please don't hesitate to pickup other missing/potentially incorrect parts of the search DSL, that would be much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] KnnQuery radial search spec inconsistency
3 participants