-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Neural search: 4xx error ingesting data with Sagemaker external model #2249
Comments
@tiagoshin there was a known issue in ml-commons with 2.11 handling batch inference traffic, which was fixed in 2.12. possible for you to upgrade to 2.12 and retry? |
@tiagoshin Your connector request boday and pre/post process function are wrong. Can you try this ?
|
Hi @ylwu-amzn, I tried to run it with the code you provided but I got, for all documents in ingestion:
So I think it's not able to parse the data correctly. |
@tiagoshin Can you share your sample predict request ? |
@tiagoshin ,can you check the sample workflows results that I provided below? The below model and connector work fine from my side. Are you having problems when the _bulk request takes more than say 200 records using the same model/workflow?
|
Seems model input |
I just verified with the above test case with 120 records and results show that all 120 records ingested correctly through _bulk. |
Hi @Zhangxunmt, I noticed that you're using the same pre-process and post-process scripts as I provided in the description. I tested the workflow you provided and it works for me, but that's a small scale. |
@ylwu-amzn I understand your approach and I tested that the model in the Sagemaker endpoint can indeed receive an input like:
However, with the code you provided, I get
And I got the same error. |
@tiagoshin , How did you run the bulk ingestion with the 500 records? Was it through Python code, Lambda function or Postman, or OS dashboard? Also is it possible to share your ingested data so we can reproduce it easier. I verified ingesting 260 records both in AOS 2.11 latest version, and open source 2.11/2.13, I didn't get a single error. From the performance I don't feel that the error will happen even I add more data. I tested them through OS dashboard though. |
I'm running this test using a file in Postman in AOS 2.11. |
@tiagoshin , I have used exactly the same dataset you shared with us (a set processor to generate passage_text, total 251 records), but still couldn't duplicate the error from my side. Actually all elements are ingested correctly with this response.
Is it possible to setup a call so we can go over the scenario and find out the difference? |
@Zhangxunmt Sure, let's set up a call.
|
@tiagoshin The sagemaker endpoint uses ml.g5.xlarge instance type. I used model_id = "huggingface-sentencesimilarity-bge-base-en-v1-5" in the notebook for deploying the model with the same script in your description. The OpenSearch cluster is a 3 date-node cluster with instance type r6g.large
|
@tiagoshin As we discussed over the call, please either send me the ticket/issue for the patch upgrade or try a standard 2.11 version from your side. |
@Zhangxunmt
|
@tiagoshin have you already verified using a 2.11 version that that's only the issue in the patch? |
@Zhangxunmt Not yet, I'm still pending some answer from Nikhil regarding this topic |
@tiagoshin, I used a 2.11 patched version but still didn't duplicate the error. The primary shard number of the index is 5, and the replica shards number is 2. |
Thanks for the investigation @Zhangxunmt. |
Hi @Zhangxunmt, I got the cluster update here to 2.11.
When I define the ingestion pipeline like this and use the same setup to try to ingest the data that I sent you with 250 records, I get the same errors that I was getting before.
And index as:
Could you test with any of these setups on your side to see if you're able to reproduce it? Thank you! |
Thanks @tiagoshin . I think we are getting close to the root cause. At least our env are identical now.
I can test with your latest setup later. Will update soon. |
Hi @Zhangxunmt |
I did the below two experiments with the same connector. These are the data from the dashboard. But none of the case returned error from my side. Can you check if your connector is the same? The credential is hidden but it's just a roleArn defined inside. I used the same 250 data records that you sent me.
Pipeline 1: (I also tried using the exact pipeline that you shared with me last week)
Create the index 1
Ingest with _bulk
Pipeline 2:
Index 2:
Ingest with _bulk
Both experiments returned:
|
So you weren't able to reproduce the bug using it, right?
I cannot query using the way you defined; json doesn't recognize triple double quotes. |
@tiagoshin , No I still didn't get any errors. It's the same connector that I shared in the earlier comment. I didn't use any special cluster configuration. I didn't specify the primary shards and replicas so they're all auto set when the index is created. Do you want to setup another call to cross check again? We can go over from the very beginning step again one by one. |
@Zhangxunmt I think so. Let's go on a call and check again. That's really weird because our setup looks pretty much the same! |
@tiagoshin , I see you have 6 data nodes in your domain. I only have 3. So one explain is that you have 6 nodes sending traffic to sageMaker host in parallel, which is double the throughput that I send in my cluster. The sageMaker model service throttled some of you requests and that's why you see this "error from remote service" error. |
I have kicked off a cluster config change to 6 data nodes using the same instance type as yours. When the new data nodes are ready I will test again with 6 nodes. |
@tiagoshin , I was able to reproduce the same error as below using the scaled up cluster with 6 data nodes. So this proves the hypothesis that those errors come from the SageMaker throttling.
|
Can we try this autoscaling in SageMaker to mitigate this issue? https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html |
Another way: reduce the bulk size to not exceed Sagemaker model throughput limit. The documents in bulk request will be processed in parallel on multiple data nodes. If you cluster has multiple data nodes, then these data nodes will send request to Sagemaker model in parallel. That may exceed the Sagemaker model capability. For example, your Sagemaker model can handle 10 requests per second. Now you have 200 docs in one bulk request and these documents will be processed by 6 data nodes. Assume each data node sends out 3 requests per second, then the Sagemaker model will receive 3 * 6 = 18 requests per seconds. That exceeds Sagemaker model throughput limit of 10 requests/second. |
Hi @Zhangxunmt , thanks for the investigations and I'm glad you were able to reproduce it. |
@tiagoshin , my perception is that SageMaker metrics are not reliable at all. In my testing account, the SageMaker metrics doesn't show any 4xx errors either. It only has spikes to "1" in the "invocation per instance" when the model is invoked, but it still doesn't make sense because invocation number is absolutely more than 1 per instance. I think the 10 requests/second is an example to explain, which is likely not the exact real throttling limit. |
https://github.com/opensearch-project/ml-commons/blob/2.11/ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/AwsConnectorExecutor.java#L106. The error is from this line, which indicates the remote service returned error code and returns the error messages. From the response posted earlier, we can see that SageMaker returned 400 status code with ""message":null" in the response body. |
@tiagoshin another workaround is using a different model like the bedrock text embedding model. Tutorial: shttps://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html, https://github.com/opensearch-project/ml-commons/blob/main/docs/tutorials/aws/semantic_search_with_bedrock_cohere_embedding_model.md |
@Zhangxunmt I tried it before, but Bedrock has really low quotas. For the Cohere embedding model for example, they only support 2k requests per minute. That's not enough for a full ingestion. |
@tiagoshin SageMaker team has identified that the throttling from their side caused your ingestion error. I think we can close this issue once the concurrency limit is increased for you. |
What is the bug?
I'm trying to use a model hosted in a Sagemaker endpoint in the same AWS Account as the Opensearch cluster to perform a Neural search.
The issue that I observe is that, while ingesting data into the index, I observe the following error for many documents:
I don't see any logs in OpenSearch error logs, and I don't see any 4xx or 5xx requests in Sagemaker.
This error only happens with a reasonable amount of data in bulk ingestion, which in this case is 250 records. When I ingest only 20 records, it works.
I already tested getting some documents that failed and tried to ingest them separately, and it worked. So, the issue is not with the document or with the Sagemaker model.
How can one reproduce the bug?
bge-base-en-v1.5
embedding model in Sagemaker using this python script:What is the expected behavior?
It's expected that all documents have the following status in ingestion:
What is your host/environment?
I'm running it in the AWS OpenSearch managed version 2.11.
Do you have any screenshots?
Not applicable
Do you have any additional context?
Not applicable
The text was updated successfully, but these errors were encountered: