Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols #2974

Closed
mingshl opened this issue Sep 19, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@mingshl
Copy link
Collaborator

mingshl commented Sep 19, 2024

What is the bug?
In ML Inference Processors,

For ingest side, we consider the use case of getting value from nested object and getting value from object, so we support both dot path notation to get field value, and support json path notation that can get nested value in a list.

But for search processors, we support json path to read the object, the behavior is different, it always return the original data format. We want to have two ways of reading the object similar to ingest side.

How can one reproduce the bug?

In 2.14, using ml_inference ingest processor:

for nested document like this sample book index,

{
  "book": [
    {
      "chunk": {
        "text": [
          {
            "chapter": "first chapter",
            "context": "this is the first part"
          },
          {
            "chapter": "first chapter",
            "context": "this is the second part"
          }
        ]
      }
    },
    {
      "chunk": {
        "text": [
          {
            "chapter": "second chapter",
            "context": "this is the third part"
          },
          {
            "chapter": "second chapter",
            "context": "this is the fourth part"
          }
        ]
      }
    }
  ]
} 

we can config the input maps as {"input": "$.book.*.chunk.text.*.context"}, which fetch the model input as

{
  "input": [
    "this is the first part",
    "this is the second part",
    "this is the third part",
    "this is the fourth part"
  ]
}

for simple object like this sample item index,


{
   "item_text":"red shoes"
}

if configuring input_map as {"input": "item.text"}

the model input will be in string representation.

{
  "input": "red shoes"
}

if configuring input_map as {"input": "$.item.text"}

the model input will be list representation.

{
  "input": ["red shoes"] 
}

But in search processors:

for the same item index, if configuring input_map as {"input": "$.item.text"},

the model input will be in string representation.

{
  "input": "red shoes"
}

What is the expected behavior?
we would like the similar logic as ingest processors, when using dot path notation without dollar symbol '$', it will get the original data format, but when using dollar symbol '$', it should return a list of value representations.

in search processors:

for the same item index, if configuring input_map as {"input": "$.item.text"},

the model input will be in list representation.

{
  "input": ["red shoes"]
}

for the same item index, if configuring input_map as {"input": "item.text"},

the model input will be in list representation.

{
  "input": "red shoes"
}

What is your host/environment?

  • OS: [2.17]
  • Plugins
@mingshl mingshl added bug Something isn't working untriaged labels Sep 19, 2024
@mingshl mingshl changed the title [BUG] ML Inference Search Processors Cannot Get Model Input as List [BUG] ML Inference Search Processors should have different model input format when input mapping has dollar symbols Sep 19, 2024
@mingshl
Copy link
Collaborator Author

mingshl commented Sep 24, 2024

will add documentations in both ingest and search pipelines.

@mingshl mingshl moved this to In Progress in ml-commons projects Sep 24, 2024
@mingshl mingshl removed the untriaged label Sep 24, 2024
@mingshl mingshl self-assigned this Sep 24, 2024
@mingshl
Copy link
Collaborator Author

mingshl commented Sep 24, 2024

need to check the jsonpath logics in other APIs, for example predict API and create connectors.

@mingshl
Copy link
Collaborator Author

mingshl commented Sep 24, 2024

checked on ConnectorUtils, it will return its original form of object

Object filteredOutput = JsonPath.read(modelResponse, responseFilter);

@mingshl
Copy link
Collaborator Author

mingshl commented Sep 25, 2024

Instead of allowing different configurations when providing with/without dollar symbol.

It's better to use standard configuration across the ml-commons repo.

If users would like to use different jsonpath configurations, we should open up a new parameters to change the jsonpath configuration settings.

@mingshl
Copy link
Collaborator Author

mingshl commented Oct 2, 2024

fixed in #2985

@mingshl mingshl closed this as completed Oct 2, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in ml-commons projects Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

1 participant