Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position xx: invalid continuation byte #1884

Closed
ulan-yisaev opened this issue Jan 17, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@ulan-yisaev
Copy link
Contributor

ulan-yisaev commented Jan 17, 2024

What is the bug?
It is the same bug as described in the #1666, but with the connector to Azure OpenAI embedding model. I was able to add connector for the Azure OpenAI Ada embedding using this issue:
#1367

When attempting to return an embedding for a string containing some German characters like ë, ä, I get the error

[2024-01-17T13:39:57,049][ERROR][o.o.m.e.a.r.RemoteModel ] [05f4d0a3acfe] Failed to call remote model
org.opensearch.OpenSearchStatusException: Error from remote service: {
"error": {
"message": "Your request contained invalid JSON: 'utf-8' codec can't decode byte 0xeb in position 43: invalid continuation byte",
"type": "invalid_request_error",
"param": null,
"code": null
}
}

When I remove the German character it works. I am retrieving the embedding from the _predict endpoint:

POST http://localhost:9200/_plugins/_ml/models/{model_id}/_predict
{
    "parameters": {
        "input": ["This is a string containing Moët Hennessy"]
    }
}

Removing the special character and replacing it with e works. If I request the embedding directly from Azure Open AI (with special character) it works fine.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Setup Azure Open AI connector described here
  2. Retrieve embedding for string described above.

What is the expected behavior?
Embedding should be returned for strings containing special characters.

What is your host/environment?

  • OS: Ubuntu
  • OpenSearch 2.11 (latest version)

Do you have any screenshots?
image

Do you have any additional context?
Add any other context about the problem.

@ulan-yisaev ulan-yisaev added bug Something isn't working untriaged labels Jan 17, 2024
@ulan-yisaev
Copy link
Contributor Author

ulan-yisaev commented Jan 17, 2024

hmm, do I understand correctly that the fix for the previous bug is not yet included in the 2.11.1.0 release?

@ulan-yisaev
Copy link
Contributor Author

ulan-yisaev commented Jan 18, 2024

It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"

@ylwu-amzn
Copy link
Collaborator

It seems I have to wait until "OpenSearch 2.12.0 release is currently scheduled to be released on Jan 23 2024"

Yes, this bug fix #1691 will be released in 2.12.0

@ylwu-amzn ylwu-amzn moved this from Untriaged to In Progress in ml-commons projects Feb 2, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in ml-commons projects May 8, 2024
@Tiberiu07
Copy link

@ylwu-amzn Thanks for fixing the issue. Unfortunately, the problem still persists if the OpenSearch cluster is deployed on Cloud (managed by AWS). OpenSearch version: 2.13.
Screenshot 2024-06-07 at 10 40 23
Screenshot 2024-06-07 at 10 43 01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

4 participants