Skip to content

Commit

Permalink
[Feature]: add ignore missing to text chunking processor (opensearch-…
Browse files Browse the repository at this point in the history
…project#8266)

* feat: add ignore missing to text chunking processor

Signed-off-by: Ian Menendez <[email protected]>

* Update _ingest-pipelines/processors/text-chunking.md

Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Ian Menendez <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
  • Loading branch information
IanMenendez and kolchfa-aws authored Oct 8, 2024
1 parent 59bea71 commit 8feea9e
Showing 1 changed file with 11 additions and 10 deletions.
21 changes: 11 additions & 10 deletions _ingest-pipelines/processors/text-chunking.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,17 @@ The following is the syntax for the `text_chunking` processor:

The following table lists the required and optional parameters for the `text_chunking` processor.

| Parameter | Data type | Required/Optional | Description |
|:---|:---|:---|:---|
| `field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to the output field. |
| `field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating chunked passages. |
| `field_map.<output_field>` | String | Required | The name of the field in which to store the chunked results. |
| `algorithm` | Object | Required | Contains at most one key-value pair that specifies the chunking algorithm and parameters. |
| `algorithm.<name>` | String | Optional | The name of the chunking algorithm. Valid values are [`fixed_token_length`](#fixed-token-length-algorithm) or [`delimiter`](#delimiter-algorithm). Default is `fixed_token_length`. |
| `algorithm.<parameters>` | Object | Optional | The parameters for the chunking algorithm. By default, contains the default parameters of the `fixed_token_length` algorithm. |
| `description` | String | Optional | A brief description of the processor. |
| `tag` | String | Optional | An identifier tag for the processor. Useful when debugging in order to distinguish between processors of the same type. |
| Parameter | Data type | Required/Optional | Description |
|:----------------------------|:----------|:---|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `field_map` | Object | Required | Contains key-value pairs that specify the mapping of a text field to the output field. |
| `field_map.<input_field>` | String | Required | The name of the field from which to obtain text for generating chunked passages. |
| `field_map.<output_field>` | String | Required | The name of the field in which to store the chunked results. |
| `algorithm` | Object | Required | Contains at most one key-value pair that specifies the chunking algorithm and parameters. |
| `algorithm.<name>` | String | Optional | The name of the chunking algorithm. Valid values are [`fixed_token_length`](#fixed-token-length-algorithm) or [`delimiter`](#delimiter-algorithm). Default is `fixed_token_length`. |
| `algorithm.<parameters>` | Object | Optional | The parameters for the chunking algorithm. By default, contains the default parameters of the `fixed_token_length` algorithm. |
| `ignore_missing` | Boolean | Optional | If `true`, empty fields are excluded from the output. If `false`, the output will contain an empty list for every empty field. Default is `false`. |
| `description` | String | Optional | A brief description of the processor. |
| `tag` | String | Optional | An identifier tag for the processor. Useful when debugging in order to distinguish between processors of the same type. |

To perform chunking on nested fields, specify `input_field` and `output_field` values as JSON objects. Dot paths of nested fields are not supported. For example, use `"field_map": { "foo": { "bar": "bar_chunk"} }` instead of `"field_map": { "foo.bar": "foo.bar_chunk"}`.
{: .note}
Expand Down

0 comments on commit 8feea9e

Please sign in to comment.