[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

mingshl · 2024-09-19T23:07:03Z

What is the bug?
In ML Inference Processors,

For ingest side, we consider the use case of getting value from nested object and getting value from object, so we support both dot path notation to get field value, and support json path notation that can get nested value in a list.

But for search processors, we support json path to read the object, the behavior is different, it always return the original data format. We want to have two ways of reading the object similar to ingest side.

How can one reproduce the bug?

In 2.14, using ml_inference ingest processor:

for nested document like this sample book index,

{
  "book": [
    {
      "chunk": {
        "text": [
          {
            "chapter": "first chapter",
            "context": "this is the first part"
          },
          {
            "chapter": "first chapter",
            "context": "this is the second part"
          }
        ]
      }
    },
    {
      "chunk": {
        "text": [
          {
            "chapter": "second chapter",
            "context": "this is the third part"
          },
          {
            "chapter": "second chapter",
            "context": "this is the fourth part"
          }
        ]
      }
    }
  ]
}

we can config the input maps as {"input": "$.book.*.chunk.text.*.context"}, which fetch the model input as

{
  "input": [
    "this is the first part",
    "this is the second part",
    "this is the third part",
    "this is the fourth part"
  ]
}

for simple object like this sample item index,


{
   "item_text":"red shoes"
}

if configuring input_map as {"input": "item.text"}

the model input will be in string representation.

{
  "input": "red shoes"
}

if configuring input_map as {"input": "$.item.text"}

the model input will be list representation.

{
  "input": ["red shoes"] 
}

But in search processors:

for the same item index, if configuring input_map as {"input": "$.item.text"},

the model input will be in string representation.

{
  "input": "red shoes"
}

What is the expected behavior?
we would like the similar logic as ingest processors, when using dot path notation without dollar symbol '$', it will get the original data format, but when using dollar symbol '$', it should return a list of value representations.

in search processors:

for the same item index, if configuring input_map as {"input": "$.item.text"},

the model input will be in list representation.

{
  "input": ["red shoes"]
}

for the same item index, if configuring input_map as {"input": "item.text"},

the model input will be in list representation.

{
  "input": "red shoes"
}

What is your host/environment?

OS: [2.17]
Plugins

The text was updated successfully, but these errors were encountered:

mingshl · 2024-09-24T18:14:24Z

will add documentations in both ingest and search pipelines.

mingshl · 2024-09-24T18:22:11Z

need to check the jsonpath logics in other APIs, for example predict API and create connectors.

mingshl · 2024-09-24T21:16:18Z

checked on ConnectorUtils, it will return its original form of object

ml-commons/ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/ConnectorUtils.java

Line 222 in 6a6cac1

Object filteredOutput = JsonPath.read(modelResponse, responseFilter);

mingshl · 2024-09-25T18:40:00Z

Instead of allowing different configurations when providing with/without dollar symbol.

It's better to use standard configuration across the ml-commons repo.

If users would like to use different jsonpath configurations, we should open up a new parameters to change the jsonpath configuration settings.

mingshl · 2024-10-02T20:47:49Z

fixed in #2985

mingshl added bug Something isn't working untriaged labels Sep 19, 2024

mingshl changed the title ~~[BUG] ML Inference Search Processors Cannot Get Model Input as List~~ [BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols Sep 19, 2024

mingshl mentioned this issue Sep 23, 2024

[ML Inference Search Processors] Always return list when using dollar symbol in input_maps #2978

Closed

5 tasks

ylwu-amzn added this to ml-commons projects Sep 24, 2024

mingshl moved this to In Progress in ml-commons projects Sep 24, 2024

mingshl removed the untriaged label Sep 24, 2024

mingshl self-assigned this Sep 24, 2024

mingshl closed this as completed Oct 2, 2024

github-project-automation bot moved this from In Progress to Done in ml-commons projects Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

mingshl commented Sep 19, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 25, 2024

mingshl commented Oct 2, 2024

[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

Comments

mingshl commented Sep 19, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 24, 2024

mingshl commented Sep 25, 2024

mingshl commented Oct 2, 2024