Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ML Inference Search Processor Writing to Search Extension #3061

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mingshl
Copy link
Collaborator

@mingshl mingshl commented Oct 3, 2024

Description

Previously, ml inference search processor only writing prediction results to the hits. Now, Support ML Inference Search Processor Writing to Search Extension when many to one inference.

Note that it's not supported for one to one inference because the order of one to one inference matters and other processors might rerank the order and mess up the order to match one to one model input and prediction results

Related Issues

#2878

Sample Test Case


PUT /review_string_index/_doc/1
{
  "review": "Dr. Eric Goldberg is a fantastic doctor who has correctly diagnosed every issue that my wife and I have had. Unlike many of my past doctors, Dr. Goldberg is very accessible and we have been able to schedule appointments with him and his staff very quickly. We are happy to have him in the neighborhood and look forward to being his patients for many years to come." ,
  "label":"5 stars"
}

PUT /review_string_index/_doc/2
{
  "review": "happy visit" ,
  "label":"5 stars"
}


PUT /review_string_index/_doc/3
{
  "review": "sad place" ,
  "label":"1 stars"
}

PUT /_search/pipeline/my_pipeline_request_review_llm
{
  "response_processors": [
    {
      "ml_inference": {
        "tag": "ml_inference",
        "description": "This processor is going to run llm",
        "model_id": "uhkETJIB5-xYSMo_SPet",
        "function_name": "REMOTE",
        "input_map": [
          {
            "context": "review"
          }
        ],
        "output_map": [
          {
            "ext.ml_inference.params.model_response": "response" 
          }
        ],
        "model_config": {
          "prompt":"\n\nHuman: You are a professional data analysist. You will always answer question based on the given context first. If the answer is not directly shown in the context, you will analyze the data and find the answer. If you don't know the answer, just say I don't know. Context: ${parameters.context.toString()}. \n\n Human: please summarize the documents \n\n Assistant:"
        },
        "ignore_missing": false,
        "ignore_failure": false,
        "one_to_one":false
      }
    }
  ]
}

GET /review_string_index/_search?search_pipeline=my_pipeline_request_review_llm
{"query":{
  "match_all": {}
}
}

returnning

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "review_string_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "review": "Dr. Eric Goldberg is a fantastic doctor who has correctly diagnosed every issue that my wife and I have had. Unlike many of my past doctors, Dr. Goldberg is very accessible and we have been able to schedule appointments with him and his staff very quickly. We are happy to have him in the neighborhood and look forward to being his patients for many years to come.",
          "label": "5 stars"
        }
      },
      {
        "_index": "review_string_index",
        "_id": "2",
        "_score": 1,
        "_source": {
          "review": "happy visit",
          "label": "5 stars"
        }
      },
      {
        "_index": "review_string_index",
        "_id": "3",
        "_score": 1,
        "_source": {
          "review": "sad place",
          "label": "1 stars"
        }
      }
    ]
  },
  "ext": {
    "ml_inference": {
      "llm_response": """ Based on the context provided:

- The first document is a positive review of Dr. Eric Goldberg from a patient. It praises Dr. Goldberg for correctly diagnosing issues for the patient and their wife. It also notes that Dr. Goldberg is very accessible and appointments can be scheduled quickly with him and his staff. The patient expresses happiness that Dr. Goldberg is in their neighborhood and looks forward to being his patient for many years.

- The second document just says "happy visit". 

- The third document says "sad place".

- In summary, the first document positively reviews a doctor, Dr. Eric Goldberg. The other two documents don't provide much context on their own, just mentioning a "happy visit" and "sad place"."""
    }
  }
}

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ohltyler
Copy link
Member

ohltyler commented Oct 3, 2024

Does this change the default functionality of many-to-one, such that the outputs in the output map will always be placed outside of the individual document _sources, or is that still an option? I don't know if there's any good use cases to support the latter.

@mingshl
Copy link
Collaborator Author

mingshl commented Oct 3, 2024

Does this change the default functionality of many-to-one, such that the outputs in the output map will always be placed outside of the individual document _sources, or is that still an option? I don't know if there's any good use cases to support the latter.

The output mapping will allow users to define whether they want to save to extension or the document source. If the output mappings points to prefix ext.ml_inference, the model output will save to extension, else other mappings will default saving to document source

@ohltyler
Copy link
Member

ohltyler commented Oct 3, 2024

Does this change the default functionality of many-to-one, such that the outputs in the output map will always be placed outside of the individual document _sources, or is that still an option? I don't know if there's any good use cases to support the latter.

The output mapping will allow users to define whether they want to save to extension or the document source. If the output mappings points to prefix ext.ml_inference, the model output will save to extension, else other mappings will default saving to document source

Got it, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants