Merge branch 'main' into 7507-collapse-search-results

opensearch-project · Jul 22, 2024 · 412f5f1 · 412f5f1
2 parents b35d5fd + 1734199
commit 412f5f1
Show file tree

Hide file tree

Showing 11 changed files with 534 additions and 5 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ Gemfile.lock
 .idea
 *.iml
 .jekyll-cache
+.project
diff --git a/_automating-configurations/api/deprovision-workflow.md b/_automating-configurations/api/deprovision-workflow.md
@@ -9,7 +9,9 @@ nav_order: 70
 
 When you no longer need a workflow, you can deprovision its resources. Most workflow steps that create a resource have corresponding workflow steps to reverse that action. To retrieve all resources currently created for a workflow, call the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). When you call the Deprovision Workflow API, resources included in the `resources_created` field of the Get Workflow Status API response will be removed using a workflow step corresponding to the one that provisioned them.
 
-The workflow executes the provisioning workflow steps in reverse order. If failures occur because of resource dependencies, such as preventing deletion of a registered model if it is still deployed, the workflow attempts retries.
+The workflow executes the provisioning steps in reverse order. If a failure occurs because of a resource dependency, such as trying to delete a registered model that is still deployed, then the workflow retries the failing step as long as at least one resource was deleted.
+
+To prevent data loss, resources created using the `create_index`, `create_search_pipeline`, and `create_ingest_pipeline` steps require the resource ID to be included in the `allow_delete` parameter.
 
 ## Path and HTTP methods
 
@@ -24,6 +26,7 @@ The following table lists the available path parameters.
 | Parameter | Data type | Description |
 | :--- | :--- | :--- |
 | `workflow_id` | String | The ID of the workflow to be deprovisioned. Required. |
+| `allow-delete` | String | A comma-separated list of resource IDs to be deprovisioned. Required if deleting resources of type `index_name` or `pipeline_id`. |
 
 ### Example request
 
@@ -53,6 +56,14 @@ If deprovisioning did not completely remove all resources, OpenSearch responds w
 In some cases, the failure happens because of another dependent resource that took some time to be removed. In this case, you can attempt to send the same request again.
 {: .tip}
 
+If deprovisioning required the `allow_delete` parameter, then OpenSearch responds with a `403 (FORBIDDEN)` status and identifies the resources that were not deprovisioned:
+
+```json
+{
+    "error": "These resources require the allow_delete parameter to deprovision: [index_name my-index]."
+}
+```
+
 To obtain a more detailed deprovisioning status than is provided by the summary in the error response, query the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). 
 
 On success, the workflow returns to a `NOT_STARTED` state. If some resources have not yet been removed, they are provided in the response.
diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md
@@ -30,7 +30,7 @@ Parameter  | Required/Optional  | Description
 :--- | :--- | :--- 
 `field` | Required | The field containing the string to be split. 
 `separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. 
-`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. 
+`preserve_trailing` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. 
 `target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. 
 `ignore_missing` | Optional	| Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`.  
 `description` | Optional | A brief description of the processor. 

diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md
@@ -212,6 +212,7 @@ Parameter	| Type | Required/Optional | Description
 `name` | String  | Optional | The tool name. Useful when an LLM needs to select an appropriate tool for a task.
 `description` | String | Optional | A description of the tool. Useful when an LLM needs to select an appropriate tool for a task.
 `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`.
+`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`.
 
 ## Execute parameters
 

diff --git a/_ml-commons-plugin/agents-tools/tools/rag-tool.md b/_ml-commons-plugin/agents-tools/tools/rag-tool.md
@@ -136,6 +136,7 @@ Parameter	| Type | Required/Optional | Description
 `prompt` | String | Optional | The prompt to provide to the LLM.
 `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is 10.
 `enable_Content_Generation` | Boolean | Optional | If `true`, returns results generated by an LLM. If `false`, returns results directly without LLM-assisted content generation. Default is `true`.
+`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`.
 
 ## Execute parameters
 

diff --git a/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md b/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md
@@ -225,6 +225,7 @@ Parameter	| Type | Required/Optional | Description
 `input` | String | Required for flow agent | Runtime input sourced from flow agent parameters. If using a large language model (LLM), this field is populated with the LLM response.
 `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`.
 `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is `10`.
+`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`.
 
 ## Execute parameters
 

diff --git a/_search-plugins/cross-cluster-search.md b/_search-plugins/cross-cluster-search.md
@@ -9,7 +9,7 @@ redirect_from:
 
 # Cross-cluster search
 
-You can use the cross-cluster search feature in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions.
+You can use cross-cluster search (CCS) in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions.
 
 ---
 

diff --git a/_search-plugins/search-pipelines/deleting-search-pipeline.md b/_search-plugins/search-pipelines/deleting-search-pipeline.md
@@ -0,0 +1,26 @@
+---
+layout: default
+title: Deleting search pipelines
+nav_order: 30
+has_children: false
+parent: Search pipelines
+grand_parent: Search
+---
+
+# Deleting search pipelines
+
+Use the following request to delete a pipeline.
+
+To delete a specific search pipeline, pass the pipeline ID as a parameter:
+
+```json
+DELETE /_search/pipeline/<pipeline-id>
+```
+{% include copy-curl.html %}
+
+To delete all search pipelines in a cluster, use the wildcard character (`*`):
+
+```json
+DELETE /_search/pipeline/*
+```
+{% include copy-curl.html %}
diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md
@@ -37,13 +37,16 @@ The following table lists all supported search response processors.
 
 Processor | Description | Earliest available version
 :--- | :--- | :---
+[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9
-[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
 [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8
 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12
-[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12
+[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12)
+[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16
+[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.16
 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor.  | 2.12
 
+
 ## Search phase results processors
 
 A search phase results processor runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase.

diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md
@@ -0,0 +1,251 @@
+---
+layout: default
+title: Sort
+nav_order: 32
+has_children: false
+parent: Search processors
+grand_parent: Search pipelines
+---
+
+# Sort processor
+
+The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array.
+
+## Request fields
+
+The following table lists all available request fields.
+
+Field | Data type | Description
+:--- | :--- | :---
+`field`  | String | The field to be sorted. Must be an array. Required.
+`order`  | String | The sort order to apply. Accepts `asc` for ascending or `desc` for descending. Default is `asc`.
+`target_field` | String | The name of the field in which the sorted array is stored. If not specified, then the sorted array is stored in the same field as the original array (the `field` variable). 
+`tag` | String | The processor's identifier. 
+`description` | String | A description of the processor. 
+`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`.
+
+## Example 
+
+The following example demonstrates using a search pipeline with a `sort` processor.
+
+### Setup
+
+Create an index named `my_index` and index a document with the field `message` that contains an array of strings:
+
+```json
+POST /my_index/_doc/1
+{
+  "message": ["one", "two", "three", "four"], 
+  "visibility": "public"
+}
+```
+{% include copy-curl.html %}
+
+### Creating a search pipeline 
+
+Create a search pipeline with a `sort` response processor that sorts the `message` field and stores the sorted results in the `sorted_message` field:
+
+```json
+PUT /_search/pipeline/my_pipeline
+{
+  "response_processors": [
+    {
+      "sort": {
+        "field": "message",
+        "target_field": "sorted_message"
+      }
+    }
+  ]
+}
+```
+{% include copy-curl.html %}
+
+### Using a search pipeline
+
+Search for documents in `my_index` without a search pipeline:
+
+```json
+GET /my_index/_search
+```
+{% include copy-curl.html %}
+
+The response contains the field `message`:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 1,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ],
+          "visibility": "public"
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter:
+
+```json
+GET /my_index/_search?search_pipeline=my_pipeline
+```
+{% include copy-curl.html %}
+
+The `sorted_message` field contains the strings from the `message` field sorted alphabetically:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 3,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
+
+You can also use the `fields` option to search for specific fields in a document:
+
+```json
+POST /my_index/_search?pretty&search_pipeline=my_pipeline
+{
+    "fields": ["visibility", "message"]
+}
+``` 
+{% include copy-curl.html %}
+
+In the response, the `message` field is sorted and the results are stored in the `sorted_message` field:
+
+<details open markdown="block">
+  <summary>
+    Response
+  </summary>
+  {: .text-delta}
+
+```json
+{
+  "took": 2,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "my_index",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "visibility": "public",
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        },
+        "fields": {
+          "visibility": [
+            "public"
+          ],
+          "sorted_message": [
+            "four",
+            "one",
+            "three",
+            "two"
+          ],
+          "message": [
+            "one",
+            "two",
+            "three",
+            "four"
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+</details>
-Original file line number
+Diff line change
@@ Expand Up / @@ -6,3 +6,4 @@ Gemfile.lock @@
     .idea
     *.iml
     .jekyll-cache
+    .project