Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Neural search: 4xx error ingesting data with Sagemaker external model #12774

Closed
tiagoshin opened this issue Mar 19, 2024 · 5 comments
Closed
Labels
bug Something isn't working Plugins untriaged

Comments

@tiagoshin
Copy link

tiagoshin commented Mar 19, 2024

Describe the bug

I'm trying to use a model hosted in a Sagemaker endpoint in the same AWS Account as the Opensearch cluster to perform a Neural search.
The issue that I observe is that, while ingesting data into the index, I observe the following error for many documents:

        {
            "index": {
                "_index": "my-index",
                "_id": "id",
                "status": 400,
                "error": {
                    "type": "status_exception",
                    "reason": "Error from remote service: {\"message\":null}"
                }
            }
        }

I don't see any logs in OpenSearch error logs, and I don't see any 4xx or 5xx requests in Sagemaker.
This error only happens with a reasonable amount of data in bulk ingestion, which in this case is 250 records. When I ingest only 20 records, it works.
I already tested getting some documents that failed and tried to ingest them separately, and it worked. So, the issue is not with the document or with the Sagemaker model.

Related component

Plugins

To Reproduce

  1. First, deploy the bge-base-en-v1.5 embedding model in Sagemaker using this python script:
from sagemaker.jumpstart.model import JumpStartModel

model_id = "huggingface-sentencesimilarity-bge-base-en-v1-5"
env = {
	'MMS_JOB_QUEUE_SIZE': '100000',
}
text_embedding_model = JumpStartModel(
    model_id=model_id,
    env=env,
    role="<YOUR-SAGEMAKER-ROLE>",
)

predictor = text_embedding_model.deploy(
	initial_instance_count=1, 
	instance_type='ml.g5.xlarge'
)

  1. Once it's deployed, get the SageMaker endpoint.
  2. Create a Sagemaker connector in OpenSearch:
POST {{host}}/_plugins/_ml/connectors/_create
{
  "name": "Amazon Sagemaker connector",
  "description": "The connector to Sagemaker",
  "version": 1,
  "protocol": "aws_sigv4",
  "credential": {
    "roleArn": "<YOUR-ROLE>"
  },
  "parameters": {
    "region": "<YOUR-REGION>",
    "service_name": "sagemaker"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "headers": {
        "content-type": "application/json"
      },
      "URL": "<YOUR-SAGEMAKER-ENDPOINT>",
      "request_body": "{ \"text_inputs\": \"${parameters.text_inputs}\", \"mode\": \"embedding\" }",
      "pre_process_function": "\n    StringBuilder builder = new StringBuilder();\n    builder.append(\"\\\"\");\n    String first = params.text_docs[0];\n    builder.append(first);\n    builder.append(\"\\\"\");\n    def parameters = \"{\" +\"\\\"text_inputs\\\":\" + builder + \"}\";\n    return  \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
      "post_process_function": "\n      def name = \"sentence_embedding\";\n      def dataType = \"FLOAT32\";\n      if (params.embedding == null || params.embedding.length == 0) {\n        return params.message;\n      }\n      def shape = [params.embedding.length];\n      def json = \"{\" +\n                 \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n                 \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n                 \"\\\"shape\\\":\" + shape + \",\" +\n                 \"\\\"data\\\":\" + params.embedding +\n                 \"}\";\n      return json;\n    "
    }
  ]
}
  1. Get the connector id
  2. Create a model group:
POST {{host}}/_plugins/_ml/model_groups/_register
{
    "name": "sagemaker-model-group",
    "description": "Semantic search model group sagemaker"
}
  1. Get the model group id
  2. Upload the model into OpenSearch:
POST {{host}}/_plugins/_ml/models/_register
{
    "name": "bge-base",
    "function_name": "remote",
    "model_group_id": "<YOUR-MODEL-GROUP-ID>",
    "description": "test model",
    "connector_id": "<YOUR-CONNECTOR-ID>"
}
  1. Get the model_id
  2. Load the model
POST {{host}}/_plugins/_ml/models/<YOUR-MODEL-ID>/_load
  1. Create an ingestion pipeline:
PUT {{host}}/_ingest/pipeline/{{pipeline_name}}
{
    "description": "pipeline",
    "processors": [
        {
            "set": {
                "field": "passage_text",
                "value": "{{{field1}}}, {{{field2}}}"
            }
        },
        {
            "text_embedding": {
                "model_id": "<YOUR-MODEL-ID>",
                "field_map": {
                    "passage_text": "passage_embedding"
                }
            }
        }
    ]
}
  1. Create an index:
PUT {{host}}/<YOUR-INDEX-NAME>

{"mappings": <YOUR-MAPPINGS>,
"settings": ...,
                "passage_embedding": {
                    "type": "knn_vector",
                    "dimension": 768,
                    "method": {
                        "engine": "nmslib",
                        "space_type": "cosinesimil",
                        "name": "hnsw",
                        "parameters": {
                            "ef_construction": 512,
                            "m": 16
                        }
                    }
                },
                "passage_text": {
                    "type": "text"
                },
}
  1. Bulk ingest the data
PUT {{host}}/<YOUR-INDEX-NAME>/_bulk

Expected behavior

It's expected that all documents have the following status in ingestion:

{
            "index": {
                "_index": "index",
                "_id": "id",
                "_version": 1,
                "result": "created",
                "_shards": {
                    "total": 2,
                    "successful": 2,
                    "failed": 0
                },
                "_seq_no": 1,
                "_primary_term": 1,
                "status": 201
            }

Additional Details

Plugins
Neural Search plugin

Host/Environment (please complete the following information):
I'm running it in the AWS OpenSearch managed version 2.11.

@tiagoshin tiagoshin added bug Something isn't working untriaged labels Mar 19, 2024
@navneet1v
Copy link
Contributor

This issue needs to be moved to @opensearch-project/ml-commons.

@chishui
Copy link
Contributor

chishui commented Mar 20, 2024

Highly likely, your requests got throttled by sagemaker, either because it reached rate limit or its CPU usage was high.

@tiagoshin
Copy link
Author

moved to @opensearch-project/ml-commons at: opensearch-project/ml-commons#2249

@tiagoshin
Copy link
Author

@chishui the Sagemaker rate for endpoint requests is 10,000 per second, we're ingesting only 250 documents.
CPU, GPU and memory usage is very low during the execution and Sagemaker doesn't register any 4xx or 5xx requests

@andrross
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
Thanks @tiagoshin, closing this issue since it is now a duplicate of opensearch-project/ml-commons#2249.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Plugins untriaged
Projects
None yet
Development

No branches or pull requests

4 participants