-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INFO] custom pre-processing function in ML connectors is returning Invalid JSON in payload
#2346
Comments
So here your pre-processing function is to translate the text docs to your model input, but seems not correct. Can you try this ?
For post-processing function, need to know your model's output. Can you share the raw model output? |
hi @ylwu-amzn, thank you for looking into this! the raw output would look like |
also I noticed an or perhaps I could use this if the Apache Commons package is installed in the Opensearch tool? {
"docs": [
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"""... ');
builder.append(escape(params.text_docs[i ...""",
" ^---- HERE"
],
"script": """ StringBuilder builder = new StringBuilder('[');
for (int i=0; i<params.text_docs.length; i ++) {
builder.append('"');
builder.append(escape(params.text_docs[i]));
builder.append('"');
if (i<params.text_docs.length - 1) {
builder.append(',');
}
}
builder.append(']');
def parameters = '{"length": ' + params.text_docs.length + ', "input": ' + builder + ' }';
return '{"parameters": ' + parameters + '}';
""",
"lang": "painless",
"position": {
"offset": 155,
"start": 130,
"end": 180
}
}
],
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"""... ');
builder.append(escape(params.text_docs[i ...""",
" ^---- HERE"
],
"script": """ StringBuilder builder = new StringBuilder('[');
for (int i=0; i<params.text_docs.length; i ++) {
builder.append('"');
builder.append(escape(params.text_docs[i]));
builder.append('"');
if (i<params.text_docs.length - 1) {
builder.append(',');
}
}
builder.append(']');
def parameters = '{"length": ' + params.text_docs.length + ', "input": ' + builder + ' }';
return '{"parameters": ' + parameters + '}';
""",
"lang": "painless",
"position": {
"offset": 155,
"start": 130,
"end": 180
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unknown call [escape] with [[org.opensearch.painless.node.EBrace@449daf]] arguments."
}
}
}
]
} |
No, that's a function we added in ml-commons (code link). Which added in 2.12 release. If you are using older version, you can manually copy the escape function to your pre/post process function You can't use Apache Commons package |
Just confirm the raw whole output doesn't contain any key ? For example
If the raw output just [ float[], float[] ], I think you can try default post-process function |
Invalid JSON in payload
Invalid JSON in payload
Thank you @ylwu-amzn ! The pre-process function almost worked. However the request body sent to my sagemaker endpoint now looks like {
"inputs": [
{
"name": "query",
"shape": [
2.0, // error is because this is not an integer
1
],
"datatype": "BYTES",
"data": [
"brand 1",
"category 1"
]
}
]
} I have tried
all 3 approaches were still returning a non-integer in the request body for sagemaker see the error below when simulating the ingest pipeline {
"docs": [
{
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": """Error from remote service: {"ErrorCode":"CLIENT_ERROR_FROM_MODEL","LogStreamArn":"log_arn","Message":"Received client error (400) from primary with message \"{\"error\":\"Unable to parse 'shape': attempt to access JSON non-unsigned-integer as unsigned-integer\"}\". See log_url in account account_number for more information.","OriginalMessage":"{\"error\":\"Unable to parse 'shape': attempt to access JSON non-unsigned-integer as unsigned-integer\"}","OriginalStatusCode":400}"""
}
],
"type": "status_exception",
"reason": """Error from remote service: {"ErrorCode":"CLIENT_ERROR_FROM_MODEL","LogStreamArn":"my_log_arn","Message":"Received client error (400) from primary with message \"{\"error\":\"Unable to parse 'shape': attempt to access JSON non-unsigned-integer as unsigned-integer\"}\". See log_url in account account_number for more information.","OriginalMessage":"{\"error\":\"Unable to parse 'shape': attempt to access JSON non-unsigned-integer as unsigned-integer\"}","OriginalStatusCode":400}"""
}
}
]
} |
Can you try this ? Wrap the
|
I tried your suggestion. The request body is now correct ✅ {
"inputs": [
{
"name": "query",
"shape": [
2, // correct, it is now integer
1
],
"datatype": "BYTES",
"data": [
"brand 1",
"category 1"
]
}
]
} but I am getting this error now in the simulator {
"docs": [
{
"error": {
"root_cause": [
{
"type": "class_cast_exception",
"reason": "class java.lang.String cannot be cast to class java.util.List (java.lang.String and java.util.List are in module java.base of loader 'bootstrap')"
}
],
"type": "class_cast_exception",
"reason": "class java.lang.String cannot be cast to class java.util.List (java.lang.String and java.util.List are in module java.base of loader 'bootstrap')"
}
}
]
} |
Hard to figure out why from this error. Can you share the log exception trace ? |
I do not have a log trace because the request did not reach the sagemaker endpoint. I was expecting the error to contain |
Will be hard to guess what's wrong. If possible, you can reach out to me on OpenSearch slack and we can jump to a call? You can join the public |
thank you @ylwu-amzn for joining a call to debug the issue! here is the raw output of the sagemaker endpoint {
"model_name": "search_ensemble",
"model_version": "1",
"parameters": {
"sequence_id": 0,
"sequence_start": false,
"sequence_end": false
},
"outputs": [
{
"name": "outputs",
"datatype": "FP32",
"shape": [
2,
512
],
"data": [ // (1024 x 1) array
-0.3834260106086731,
-0.36356380581855776,
-0.25114601850509646,
-0.12556827068328858,
-0.0514649897813797,
...
]
}
]
} turns out the raw output is not a 2D array but a json object. Thanks again 🙏🏿 |
@toyaokeke Can you try these pre/post process function?
Edit: As you are using OS 2.11, you should add the
The above
|
@ylwu-amzn this is working as expected!! Thank you very much 🙏🏿 |
Close this issue as problem solved. |
What is the bug?
I am trying to deploy an ML model that connects to an external resource
I am trying to write a pre-process function that will pass the following request body to my Sagemaker endpoint
I have created the ML connector and deployed the model following the steps below. When I do this I get the following
Invalid JSON in payload
error. Could someone assist me in understanding why the pre-processing function is not working as expected?How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
Expected generated vector embeddings
brand_name_vector
andcategory_name_vector
What is your host/environment?
Do you have any screenshots?
Do you have any additional context?
I am aware of signing the request using AWS SigV4 and providing the correct keys. The issue is not with creating the connector, I am able to create the connector fine.
My issue is when I deploy my model using the connector, and simulate the ingest pipeline, that is when I get the error described.
The text was updated successfully, but these errors were encountered: