[BUG] Batch ingestion API bugs #2930

ylwu-amzn · 2024-09-11T17:59:23Z

Test with OS2.17 RC4

POST /_plugins/_ml/_batch_ingestion
{
  "index_name": "my-test1",
  "field_map": {
    "_id": "$.recordId",
    "embedding": "source[0].$.modelOutput.embedding"
  },
  "credential": {
    "region": "us-east-1",
    "access_key": "xxx",
    "secret_key": "xxx",
    "session_token": "xxx"
  },
  "data_source": {
    "type": "s3",
    "source": ["s3://ylwu-model-test-output/cszce2bsex07/my_batch2.jsonl.out"]
  }
}

sample data of my_batch2.jsonl.out

{"modelInput":{"inputText":"hello word 1"},"modelOutput":{"embedding":[-0.034975495,0.072906666],"inputTextTokenCount":5},"recordId":"CALL0000001"}

It returns task id xHk64pEBG9EkCQDLzc-I

But this task stays on CREATED forever. Checked log , error happens

Remove source[0]. from embedding field map can work

[2024-09-11T17:58:07,399][ERROR][o.o.m.e.i.S3DataIngestion] [client1] Missing property in path $['source']
[2024-09-11T17:58:07,400][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [client1] uncaught exception in thread [opensearch[client1][opensearch_ml_train][T#2]]
org.opensearch.OpenSearchStatusException: Failed to batch ingest: Missing property in path $['source']
	at org.opensearch.ml.engine.ingest.S3DataIngestion.ingestSingleSource(S3DataIngestion.java:148) ~[?:?]
	at org.opensearch.ml.engine.ingest.S3DataIngestion.ingest(S3DataIngestion.java:66) ~[?:?]
	at org.opensearch.ml.action.batch.TransportBatchIngestionAction.lambda$doExecute$0(TransportBatchIngestionAction.java:96) ~[?:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:946) ~[opensearch-2.17.0.jar:2.17.0]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]

Suggestion:

Support source[0] prefix even we have one source file
Update task status as failed
I remember you will add a new dedicated thread pool for batch ingestion. Why the log still shows train tread pool opensearch_ml_train ? Can you confirm if we have dedicated thread pool ?

The text was updated successfully, but these errors were encountered:

ylwu-amzn · 2024-09-11T18:17:55Z

Another issue, ingest_fields can't work

  "field_map": {
    "_id": "$.recordId",
    "embedding": "$.modelOutput.embedding"
  },
  "ingest_fields": ["$.modelInput.inputText"],

This way can work

  "field_map": {
    "_id": "$.recordId",
    "embedding": "$.modelOutput.embedding",
    "input": "$.modelInput.inputText"
  },

Zhangxunmt · 2024-09-11T18:27:47Z

Yes it's still using the TRAIN thread pool. The initial code doesn't use this dedicated Train thread so the exceptions are caught in the main thread and ML Tasks are updated to "Failed". After I added this "TRAIN" thread, the exceptions handle in the Train thread so they are not caught in the main anymore. I forgot to move the catch exceptions from the Main to "TRAIN". After the load tests, I will create a new thread pool just for Ingestion.

ylwu-amzn · 2024-09-11T18:52:32Z

Bedrock batch inference job returns jobArn like this

{
  "jobArn": "arn:aws:bedrock:us-east-1:<account_id>:model-invocation-job/cszce2bsex07"
}

But the code currently only parse TransformJobArn and id. https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java#L367 , please enhance this part to make the parsing more general.

Suggest change this line https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java#L367C44-L367C52

 if (dataAsMap != null
        && (dataAsMap.containsKey("TransformJobArn") || dataAsMap.containsKey("id"))) {

to

  Integer statusCode = tensorOutput.getMlModelOutputs().get(0).getStatusCode();
  if (dataAsMap != null
      &&  statusCode != null && statusCode >= 200 && statusCode < 300) {

?

ylwu-amzn added bug Something isn't working untriaged labels Sep 11, 2024

ylwu-amzn assigned Zhangxunmt Sep 11, 2024

Zhangxunmt removed the untriaged label Sep 11, 2024

ylwu-amzn changed the title ~~[BUG] Batch ingestion API task stays on created~~ [BUG] Batch ingestion API bugs Sep 11, 2024

Zhangxunmt mentioned this issue Sep 12, 2024

fix field mapping, add more error handling and remove checking jobId … #2933

Merged

5 tasks

Zhangxunmt closed this as completed Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Batch ingestion API bugs #2930

[BUG] Batch ingestion API bugs #2930

ylwu-amzn commented Sep 11, 2024 •

edited

Loading

ylwu-amzn commented Sep 11, 2024 •

edited

Loading

Zhangxunmt commented Sep 11, 2024

ylwu-amzn commented Sep 11, 2024 •

edited

Loading

[BUG] Batch ingestion API bugs #2930

[BUG] Batch ingestion API bugs #2930

Comments

ylwu-amzn commented Sep 11, 2024 • edited Loading

ylwu-amzn commented Sep 11, 2024 • edited Loading

Zhangxunmt commented Sep 11, 2024

ylwu-amzn commented Sep 11, 2024 • edited Loading

ylwu-amzn commented Sep 11, 2024 •

edited

Loading

ylwu-amzn commented Sep 11, 2024 •

edited

Loading

ylwu-amzn commented Sep 11, 2024 •

edited

Loading