Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Mapping fashion configuration for pipeline processors #13160

Open
zane-neo opened this issue Apr 11, 2024 · 5 comments
Open
Labels
enhancement Enhancement or improvement to existing feature or request ingest-pipeline Other

Comments

@zane-neo
Copy link
Contributor

Is your feature request related to a problem? Please describe

Background

Current OpenSearch Core support field value configuration in multiple processors, e.g. AppendProcessor, SetProcessor etc. An example like below:

{
  "append": {
    "field": "your_target_field",
    "value": "{{{tenure}}}"
  }
}

With this configuration, user can append a new field or transform an existing field, but sometimes, user needs another pattern: create a new field based on an existing field value. E.g. below example has multiple new fields created with existing fields:

  1. Based on existing field a to generate a new field a_wc, a_wc records the words count field a has.
  2. Based on existing field b to generate a new field b_wc, b and b_wc are list type.
  3. Based on existing field c -> d to generate a new field d_wc, c -> d and d_wc are map type.
{
  "a": "hello world",
  "a_wc": 2,
  "b": ["hello", "world"],
  "b_wc": [1, 1],
  "c": {
    "d": "foo bar",
    "d_wc": 2
  }
}

Problem statement

Currently the configuration of processor doesn't support this multiple fields mapping configuration fashion, which makes every processor needs to implement similar logics in their own.

Describe the solution you'd like

We can support multiple fields mapping configuration in opensearch core so that it can be reused in different processors across different plugins. We can support two different configuration styles for this, e.g.:

{
  "field_map": {
    "a": "a_wc",
    "b": "b_wc",
    "c": {
      "d": "d_wc"
    }
  }
}
{
  "field_map": {
    "a": "a_wc",
    "b": "b_wc",
    "c.d": "c.d_wc"
  }
}

Related component

Other

Describe alternatives you've considered

No response

Additional context

No response

@zane-neo zane-neo added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 11, 2024
@github-actions github-actions bot added the Other label Apr 11, 2024
@shwetathareja
Copy link
Member

@zane-neo can you give example of existing processors and how they support this multiple fields mapping configuration without common support?

@shwetathareja
Copy link
Member

Also @zane-neo, you mentioned

create a new field based on an existing field value

are you planning to create a new processor to support this?

@zane-neo
Copy link
Contributor Author

@zane-neo can you give example of existing processors and how they support this multiple fields mapping configuration without common support?

@shwetathareja Currently we have text_embedding processor doing this: https://opensearch.org/docs/latest/ingest-pipelines/processors/text-embedding/

@zane-neo
Copy link
Contributor Author

Also @zane-neo, you mentioned

create a new field based on an existing field value

are you planning to create a new processor to support this?

In fact, text_embedding processor is doing this, and a new processor: https://opensearch.org/docs/latest/search-plugins/text-chunking/ is also doing this.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@zane-neo Thanks for creating this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request ingest-pipeline Other
Projects
None yet
Development

No branches or pull requests

3 participants