From c2b1de70f7d1a9e93b02cd98ae460629e0d3f46a Mon Sep 17 00:00:00 2001
From: Daniel Widdis <widdis@gmail.com>
Date: Mon, 22 Jul 2024 09:17:38 -0700
Subject: [PATCH] Document new Split and Sort SearchResponseProcessors (#7767)

* Add documentation for Sort SearchRequestProcessor

Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Add documentation for Split SearchRequestProcessor

Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Doc review

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Update _ingest-pipelines/processors/split.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Update _search-plugins/search-pipelines/sort-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Update _search-plugins/search-pipelines/split-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Update _search-plugins/search-pipelines/split-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Update _search-plugins/search-pipelines/split-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

* Update _search-plugins/search-pipelines/split-processor.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Daniel Widdis <widdis@gmail.com>

---------

Signed-off-by: Daniel Widdis <widdis@gmail.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>
---
 _ingest-pipelines/processors/split.md         |   2 +-
 .../search-pipelines/search-processors.md     |   7 +-
 .../search-pipelines/sort-processor.md        | 251 ++++++++++++++++++
 .../search-pipelines/split-processor.md       | 234 ++++++++++++++++
 4 files changed, 491 insertions(+), 3 deletions(-)
 create mode 100644 _search-plugins/search-pipelines/sort-processor.md
 create mode 100644 _search-plugins/search-pipelines/split-processor.md

diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md
index c424ef671c..cdb0cfe3de 100644
--- a/_ingest-pipelines/processors/split.md
+++ b/_ingest-pipelines/processors/split.md
@@ -30,7 +30,7 @@ Parameter  | Required/Optional  | Description
 :--- | :--- | :--- 
 `field` | Required | The field containing the string to be split. 
 `separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. 
-`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. 
+`preserve_trailing` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. 
 `target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. 
 `ignore_missing` | Optional	| Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`.  
 `description` | Optional | A brief description of the processor. 
diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md
index 4630ab950c..ad515cc541 100644
--- a/_search-plugins/search-pipelines/search-processors.md
+++ b/_search-plugins/search-pipelines/search-processors.md
@@ -37,13 +37,16 @@ The following table lists all supported search response processors.
 
 Processor | Description | Earliest available version
 :--- | :--- | :---
+[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
-[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
 [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
-[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
+[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
+[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16
+[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.16
 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor.  | 2.12
 
+
 ## Search phase results processors
 
 A search phase results processor runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.
diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md
new file mode 100644
index 0000000000..dde05c1b3a
--- /dev/null
+++ b/_search-plugins/search-pipelines/sort-processor.md
@@ -0,0 +1,251 @@
+---
+layout: default
+title: Sort
+nav_order: 32
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# Sort processor
+
+The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array.
+
+## Request fields
+
+The following table lists all available request fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+`field`  | String | The field to be sorted. Must be an array. Required.
+`order`  | String | The sort order to apply. Accepts `asc` for ascending or `desc` for descending. Default is `asc`.
+`target_field` | String | The name of the field in which the sorted array is stored. If not specified, then the sorted array is stored in the same field as the original array (the `field` variable). 
+`tag` | String | The processor's identifier. 
+`description` | String | A description of the processor. 
+`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
+
+## Example 
+
+The following example demonstrates using a search pipeline with a `sort` processor.
+
+### Setup
+
+Create an index named `my_index` and index a document with the field `message` that contains an array of strings:
+
+```json
+POST /my_index/_doc/1
+{
+  "message": ["one", "two", "three", "four"], 
+  "visibility": "public"
+}
+```
+{% include copy-curl.html %}
+
+### Creating a search pipeline 
+
+Create a search pipeline with a `sort` response processor that sorts the `message` field and stores the sorted results in the `sorted_message` field:
+
+```json
+PUT /_search/pipeline/my_pipeline
+{
+  "response_processors": [
+    {
+      "sort": {
+        "field": "message",
+        "target_field": "sorted_message"
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+### Using a search pipeline
+
+Search for documents in `my_index` without a search pipeline:
+
+```json
+GET /my_index/_search
+```
+{% include copy-curl.html %}
+
+The response contains the field `message`:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 1,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ],
+          "visibility": "public"
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
+
+```json
+GET /my_index/_search?search_pipeline=my_pipeline
+```
+{% include copy-curl.html %}
+
+The `sorted_message` field contains the strings from the `message` field sorted alphabetically:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 3,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+You can also use the `fields` option to search for specific fields in a document:
+
+```json
+POST /my_index/_search?pretty&search_pipeline=my_pipeline
+{
+    "fields": ["visibility", "message"]
+}
+``` 
+{% include copy-curl.html %}
+
+In the response, the `message` field is sorted and the results are stored in the `sorted_message` field:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 2,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        },
+        "fields": {
+          "visibility": [
+            "public"
+          ],
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
\ No newline at end of file
diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md
new file mode 100644
index 0000000000..6830f81ec3
--- /dev/null
+++ b/_search-plugins/search-pipelines/split-processor.md
@@ -0,0 +1,234 @@
+---
+layout: default
+title: Split
+nav_order: 33
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# Split processor
+
+The `split` processor splits a string field into an array of substrings based on a specified delimiter.
+
+## Request fields
+
+The following table lists all available request fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+`field` | String | The field containing the string to be split. Required.
+`separator` | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
+`preserve_trailing` | Boolean | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. 
+`target_field` | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place. 
+`tag` | String | The processor's identifier. 
+`description` | String | A description of the processor. 
+`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
+
+## Example 
+
+The following example demonstrates using a search pipeline with a `split` processor.
+
+### Setup
+
+Create an index named `my_index` and index a document containing the field `message`:
+
+```json
+POST /my_index/_doc/1
+{
+  "message": "ingest, search, visualize, and analyze data",
+  "visibility": "public"
+}
+```
+{% include copy-curl.html %}
+
+### Creating a search pipeline 
+
+The following request creates a search pipeline with a `split` response processor that splits the `message` field and stores the results in the `split_message` field:
+
+```json
+PUT /_search/pipeline/my_pipeline
+{
+  "response_processors": [
+    {
+      "split": {
+        "field": "message",
+        "separator": ", ",
+        "target_field": "split_message"
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+### Using a search pipeline
+
+Search for documents in `my_index` without a search pipeline:
+
+```json
+GET /my_index/_search
+```
+{% include copy-curl.html %}
+
+The response contains the field `message`:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+```json
+{
+  "took": 3,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "message": "ingest, search, visualize, and analyze data",
+          "visibility": "public"
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
+
+```json
+GET /my_index/_search?search_pipeline=my_pipeline
+```
+{% include copy-curl.html %}
+
+The `message` field is split and the results are stored in the `split_message` field:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 6,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "message": "ingest, search, visualize, and analyze data",
+          "split_message": [
+            "ingest",
+            "search",
+            "visualize",
+            "and analyze data"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+You can also use the `fields` option to search for specific fields in a document:
+
+```json
+POST /my_index/_search?pretty&search_pipeline=my_pipeline
+{
+    "fields": ["visibility", "message"]
+}
+``` 
+{% include copy-curl.html %}
+
+In the response, the `message` field is split and the results are stored in the `split_message` field:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 7,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "message": "ingest, search, visualize, and analyze data",
+          "split_message": [
+            "ingest",
+            "search",
+            "visualize",
+            "and analyze data"
+          ]
+        },
+        "fields": {
+          "visibility": [
+            "public"
+          ],
+          "message": [
+            "ingest, search, visualize, and analyze data"
+          ],
+          "split_message": [
+            "ingest",
+            "search",
+            "visualize",
+            "and analyze data"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
\ No newline at end of file