Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes NeuralQuery Schema #512

Merged
merged 6 commits into from
Aug 19, 2024

Conversation

urinud
Copy link
Contributor

@urinud urinud commented Aug 14, 2024

Description

Fixes NeuralQuery definition based on the documentation (and usage of OpenSearch).

Issues Resolved

  • query_image: format is not required since this is Base64 encoded image.
  • model_id: is required if there is no default. thus we cannot enforce it here.
  • query_text or query_image: at least one of them needs to be specified (both allowed).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: uri.nudelman <[email protected]>
Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Can we please test/exercise this change similarly to

in either that test or a new one?

Add to CHANGELOG.

spec/schemas/_common.query_dsl.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Aug 14, 2024

Changes Analysis

Commit SHA: 1e38f88
Comparing To SHA: 7dee041

API Changes

Summary

├─┬Paths
│ ├─┬/{index}/_search
│ │ ├─┬GET
│ │ │ └─┬Responses
│ │ │   ├──[➕] codes (26251:7)
│ │ │   └──[➕] codes (26256:7)
│ │ └─┬POST
│ │   └─┬Responses
│ │     ├──[➕] codes (26256:7)
│ │     └──[➕] codes (26251:7)
│ └─┬/_search
│   ├─┬GET
│   │ └─┬Responses
│   │   ├──[➕] codes (26256:7)
│   │   └──[➕] codes (26251:7)
│   └─┬POST
│     └─┬Responses
│       ├──[➕] codes (26251:7)
│       └──[➕] codes (26256:7)
└─┬Components
  ├──[➕] responses (26251:7)
  ├──[➕] responses (26256:7)
  ├─┬query._common:ErrorResponse
  │ ├─┬reason
  │ │ └──[➖] example (48527:20)
  │ └─┬type
  │   └──[➖] example (48524:20)
  ├─┬_common.query_dsl:NeuralQueryVectorField
  │ ├──[➖] required (36852:11)❌ 
  │ └─┬query_image
  │   ├──[➖] format (36840:19)❌ 
  │   └──[➕] contentEncoding (36866:28)❌ 
  └─┬query._common:RootCause
    ├─┬reason
    │ └──[➖] example (48540:20)
    └─┬type
      └──[➖] example (48537:20)

Document Element Total Changes Breaking Changes
paths 8 0
components 9 3
  • BREAKING Changes: 3 out of 17
  • Removals: 6
  • Additions: 11
  • Breaking Removals: 2
  • Breaking Additions: 1

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10444364681/artifacts/1827424595

API Coverage

Before After Δ
Covered (%) 524 (51.32 %) 524 (51.32 %) 0 (0 %)
Uncovered (%) 497 (48.68 %) 497 (48.68 %) 0 (0 %)
Unknown 26 26 0

Signed-off-by: uri.nudelman <[email protected]>
spec/schemas/_common.query_dsl.yaml Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Aug 14, 2024

Spec Test Coverage Analysis

Total Tested
550 331 (60.18 %)

Signed-off-by: uri.nudelman <[email protected]>
Signed-off-by: uri.nudelman <[email protected]>
@urinud
Copy link
Contributor Author

urinud commented Aug 15, 2024

Hello @dblock & @Xtansia , I been trying to add a test for query_image. The query is not a problem, but was not able to find any OpenSearch-provided pretrained model with image processing support that I can use.
If anyone know how what I can use here will appreciate it.

@dblock
Copy link
Member

dblock commented Aug 15, 2024

Hello @dblock & @Xtansia , I been trying to add a test for query_image. The query is not a problem, but was not able to find any OpenSearch-provided pretrained model with image processing support that I can use. If anyone know how what I can use here will appreciate it.

Let's ask @navneet1v and try the #ml channel on the public slack.

Is that required to make the API call successfully? I understand it's required to get some search results, but as long as we can exercise the actual change here (fails before this change, passes after this change) we will be able to merge it.

@navneet1v
Copy link

Hello @dblock & @Xtansia , I been trying to add a test for query_image. The query is not a problem, but was not able to find any OpenSearch-provided pretrained model with image processing support that I can use. If anyone know how what I can use here will appreciate it.

Let's ask @navneet1v and try the #ml channel on the public slack.

Is that required to make the API call successfully? I understand it's required to get some search results, but as long as we can exercise the actual change here (fails before this change, passes after this change) we will be able to merge it.

There is no pretrained model from Opensearch side that can process images as per my knowledge. @martin-gaievski during your testing for neural search do you know any Opensource model you used? or we were always using some closed source models.

@dblock
Copy link
Member

dblock commented Aug 15, 2024

Thanks @kolchfa-aws for pointing us to opensearch-project/ml-commons#2598

@martin-gaievski
Copy link
Member

Hello @dblock & @Xtansia , I been trying to add a test for query_image. The query is not a problem, but was not able to find any OpenSearch-provided pretrained model with image processing support that I can use. If anyone know how what I can use here will appreciate it.

Let's ask @navneet1v and try the #ml channel on the public slack.
Is that required to make the API call successfully? I understand it's required to get some search results, but as long as we can exercise the actual change here (fails before this change, passes after this change) we will be able to merge it.

There is no pretrained model from Opensearch side that can process images as per my knowledge. @martin-gaievski during your testing for neural search do you know any Opensource model you used? or we were always using some closed source models.

We used closed source models only

@urinud
Copy link
Contributor Author

urinud commented Aug 16, 2024

Thanks everyone for your answers.
Based on the current conditions, I do not know what I can use to test query_image in neural query due to lack of available model to use.
I created a test, but logically fails using the existing model in tests/default/ingest/pipeline/neural_search.yaml.
If anyone knows what I can use to add a test great, I can give it a try.
Otherwise, not sure what else can be done to complete and merge this PR.
Thank you

@dblock
Copy link
Member

dblock commented Aug 16, 2024

Otherwise, not sure what else can be done to complete and merge this PR.

How about writing a negative test that only does search with a model that doesn't exist?

              - script_score:
                  query:
                    neural:
                      passage_embedding:
                        query_image: ...data
                        model_id: does-not-exist # TODO: OpenSearch needs ..., see https://github/...
                        k: 100

This should respond with a predictable error, but one that's not a validation error for query_image which is your fix (test should fail without your fix).

@urinud
Copy link
Contributor Author

urinud commented Aug 16, 2024

This should respond with a predictable error, but one that's not a validation error for query_image which is your fix (test should fail without your fix).
Thanks, good idea. I tested:

  • If we have a wrong field name we get a 400 error parsing_exception.
  • If we set and invalid model_id we get a 404 error status_exception (Failed to find model).
    Will create a test based on this.

Signed-off-by: uri.nudelman <[email protected]>
@urinud
Copy link
Contributor Author

urinud commented Aug 16, 2024

@dblock , added the test case. I'm getting the following response and do not know how to skip the RESPONSE PAYLOAD SCHEMA.
image

@dblock
Copy link
Member

dblock commented Aug 18, 2024

The underlying issue is described in #445. For now you should be able to add a 404 response to the search request/response like here. A trivial type: object schema will suffice.

@@ -1248,8 +1248,6 @@ components:
type: number
filter:
$ref: '#/components/schemas/QueryContainer'
required:
- model_id
Copy link
Member

@dblock dblock Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is model ID not required? AFAIK it is. What's a valid query without a model ID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per documentation, it is required if the default model id is not set. Here is explained how to configure it.

@@ -79,10 +79,10 @@ components:
$ref: '#/components/schemas/RootCause'
type:
type: string
example: status_exception
# example: status_exception
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented out because fails with strict mode as follows:
Error: strict mode: unknown keyword: "example"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open an issue for these?

@urinud urinud requested review from Xtansia and dblock August 18, 2024 22:58
@dblock dblock dismissed Xtansia’s stale review August 19, 2024 12:59

Issue noted above fixed.

@dblock dblock merged commit 52ec375 into opensearch-project:main Aug 19, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants