Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Setting data_type to byte returns illegal_argument_exception #347

Closed
juntezhang opened this issue Sep 27, 2023 · 2 comments
Closed
Labels
bug Something isn't working untriaged

Comments

@juntezhang
Copy link

What is the bug?

Since OpenSearch 2.9.0, we are able to set the data_type for knn_vector types to use the Lucene byte vector. See for explanation here: https://opensearch.org/docs/latest/field-types/supported-field-types/knn-vector/#lucene-byte-vector

I quote:

In k-NN benchmarking tests, the use of byte rather than float vectors resulted in a significant reduction in storage and memory usage as well as improved indexing throughput and reduced query latency. Additionally, precision on recall was not greatly affected

However, when we enable this in the mapping, we get an illegal_argument_exception like this:

"error": {
                    "type": "mapper_parsing_exception",
                    "reason": "failed to parse field [_fulltext_vectorized.knn] of type [knn_vector] in document with id '4'. Preview of field's value: '-0.0010178537'",
                    "caused_by": {
                        "type": "illegal_argument_exception",
                        "reason": "[data_type] field was set as [byte] in index mapping. But, KNN vector values are floats instead of byte integers"
                    }
                }

How can one reproduce the bug?

Define a dense vector field like this as vector output field of the Neural Search plugin:

"knn": {
            "type": "knn_vector",
            "dimension": 768,
            "data_type": "byte",
            "method": {
              "name": "hnsw",
              "space_type": "l2",
              "engine": "lucene"
            }
          }

What is the expected behavior?

The Neural Search plugin vectorized with bytes instead of floats when byte is used as data_type in the mapping. Alternatively, allow us to configure this as a property in the ingest pipeline.

What is your host/environment?

Running OpenSearch 2.10.0 in Docker with latest version of Ubuntu.

Do you have any screenshots?

n/a

Do you have any additional context?

n/a

@juntezhang juntezhang added bug Something isn't working untriaged labels Sep 27, 2023
@heemin32
Copy link
Collaborator

@juntezhang, I think the behavior is expected. In my understanding neural search processor get vector data from model and ingest it as it is. If the value returned by model is float, it will throw an error as byte range is only between -128 and 127.

@juntezhang
Copy link
Author

Thanks for the clarification. I will close this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged
Projects
None yet
Development

No branches or pull requests

2 participants