From a927f0e86a176f902acde3dc735c68d2ccae39a7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Luk=C3=A1=C5=A1=20Vl=C4=8Dek?= <lukas.vlcek@aiven.io>
Date: Wed, 29 May 2024 15:32:57 +0200
Subject: [PATCH 01/10] Add missing cluster_manager_throttling nodes stats
 (#7241)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Add missing cluster_manager_throttling nodes stats

Closes #7240

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update _api-reference/nodes-apis/nodes-stats.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update _api-reference/nodes-apis/nodes-stats.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update _api-reference/nodes-apis/nodes-stats.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update _api-reference/nodes-apis/nodes-stats.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update _api-reference/nodes-apis/nodes-stats.md

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

* Update nodes-stats.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>

---------

Signed-off-by: Lukáš Vlček <lukas.vlcek@aiven.io>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 _api-reference/nodes-apis/nodes-stats.md | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md
index f28d30c0af..1d504aae2e 100644
--- a/_api-reference/nodes-apis/nodes-stats.md
+++ b/_api-reference/nodes-apis/nodes-stats.md
@@ -53,6 +53,7 @@ script_cache | Statistics about script cache.
 indexing_pressure | Statistics about the node's indexing pressure.
 shard_indexing_pressure | Statistics about shard indexing pressure.
 search_backpressure | Statistics related to search backpressure.
+cluster_manager_throttling | Statistics related to throttled tasks on the cluster manager node.
 resource_usage_stats | Node-level resource usage statistics, such as CPU and JVM memory.
 admission_control | Statistics about admission control.
 caches | Statistics about caches. 
@@ -832,6 +833,7 @@ http.total_opened | Integer | The total number of HTTP connections the node has
 [indexing_pressure](#indexing_pressure) | Object | Statistics related to the node's indexing pressure.
 [shard_indexing_pressure](#shard_indexing_pressure) | Object | Statistics related to indexing pressure at the shard level.
 [search_backpressure]({{site.url}}{{site.baseurl}}/opensearch/search-backpressure#search-backpressure-stats-api) | Object | Statistics related to search backpressure.
+[cluster_manager_throttling](#cluster_manager_throttling) | Object | Statistics related to throttled tasks on the cluster manager node.
 [resource_usage_stats](#resource_usage_stats) | Object | Statistics related to resource usage for the node.
 [admission_control](#admission_control) | Object | Statistics related to admission control for the node.
 [caches](#caches) | Object | Statistics related to caches on the node.
@@ -1282,6 +1284,16 @@ total_rejections_breakup_shadow_mode.throughput_degradation_limits | Integer | T
 enabled | Boolean | Specifies whether the shard indexing pressure feature is turned on for the node.
 enforced | Boolean | If true, the shard indexing pressure runs in enforced mode (there are rejections). If false, the shard indexing pressure runs in shadow mode (there are no rejections, but statistics are recorded and can be retrieved in the `total_rejections_breakup_shadow_mode` object). Only applicable if shard indexing pressure is enabled. 
 
+### `cluster_manager_throttling`
+
+The `cluster_manager_throttling` object contains statistics about throttled tasks on the cluster manager node. It is populated only for the node that is currently elected as the cluster manager.  
+
+Field | Field type | Description
+:--- | :--- | :---
+stats | Object | Statistics about throttled tasks on the cluster manager node.
+stats.total_throttled_tasks | Long | The total number of throttled tasks.
+stats.throttled_tasks_per_task_type | Object | A breakdown of statistics by individual task type, specified as key-value pairs. The keys are individual task types, and their values represent the number of requests that were throttled.
+
 ### `resource_usage_stats`
 
 The `resource_usage_stats` object contains the resource usage statistics. Each entry is specified by the node ID and has the following properties.

From 04e6902d08185a46395ccf99ed36674520531cfa Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Wed, 29 May 2024 11:40:52 -0600
Subject: [PATCH 02/10] Add doc review changes to contributor's blocked PR
 #6808 (#7265)

* Add doc review changes to contributors blocked PR

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/metric/median-absolute-deviation.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/metric/median-absolute-deviation.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/metric/median-absolute-deviation.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _aggregations/metric/median-absolute-deviation.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 .../metric/median-absolute-deviation.md       | 158 ++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 _aggregations/metric/median-absolute-deviation.md

diff --git a/_aggregations/metric/median-absolute-deviation.md b/_aggregations/metric/median-absolute-deviation.md
new file mode 100644
index 0000000000..7332d7eb2f
--- /dev/null
+++ b/_aggregations/metric/median-absolute-deviation.md
@@ -0,0 +1,158 @@
+---
+layout: default
+title: Median absolute deviation
+parent: Metric aggregations
+grand_parent: Aggregations
+nav_order: 65
+redirect_from:
+  - /query-dsl/aggregations/metric/median-absolute-deviation/
+---
+
+# Median absolute deviation aggregations
+
+The `median_absolute_deviation` metric is a single-value metric aggregation that returns a median absolute deviation field. Median absolute deviation is a statistical measure of data variability. Because the median absolute deviation measures dispersion from the median, it provides a more robust measure of variability that is less affected by outliers in a dataset. 
+
+Median absolute deviation is calculated as follows:<br>
+median_absolute_deviation = median(|X<sub>i</sub> - Median(X<sub>i</sub>)|)
+
+The following example calculates the median absolute deviation of the `DistanceMiles` field in the sample dataset `opensearch_dashboards_sample_data_flights`:
+
+
+```json
+GET opensearch_dashboards_sample_data_flights/_search
+{
+  "size": 0,
+  "aggs": {
+    "median_absolute_deviation_DistanceMiles": {
+      "median_absolute_deviation": {
+        "field": "DistanceMiles"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+#### Example response
+
+```json
+{
+  "took": 35,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 10000,
+      "relation": "gte"
+    },
+    "max_score": null,
+    "hits": []
+  },
+  "aggregations": {
+    "median_absolute_deviation_distanceMiles": {
+      "value": 1829.8993624441966
+    }
+  }
+}
+```
+
+### Missing
+
+By default, if a field is missing or has a null value in a document, it is ignored during computation. However, you can specify a value to be used for those missing or null fields by using the `missing` parameter, as shown in the following request:
+
+```json
+GET opensearch_dashboards_sample_data_flights/_search
+{
+  "size": 0,
+  "aggs": {
+    "median_absolute_deviation_distanceMiles": {
+      "median_absolute_deviation": {
+        "field": "DistanceMiles",
+        "missing": 1000
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+#### Example response
+
+```json
+{
+  "took": 7,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 10000,
+      "relation": "gte"
+    },
+    "max_score": null,
+    "hits": []
+  },
+  "aggregations": {
+    "median_absolute_deviation_distanceMiles": {
+      "value": 1829.6443646143355
+    }
+  }
+}
+```
+
+### Compression
+
+The median absolute deviation is calculated using the [t-digest](https://github.com/tdunning/t-digest/tree/main) data structure, which balances between performance and estimation accuracy through the `compression` parameter (default value: `1000`). Adjusting the `compression` value affects the trade-off between computational efficiency and precision. Lower `compression` values improve performance but may reduce estimation accuracy, while higher values enhance accuracy at the cost of increased computational overhead, as shown in the following request:
+
+```json
+GET opensearch_dashboards_sample_data_flights/_search
+{
+  "size": 0,
+  "aggs": {
+    "median_absolute_deviation_DistanceMiles": {
+      "median_absolute_deviation": {
+        "field": "DistanceMiles",
+        "compression": 10
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+#### Example response
+
+```json
+{
+  "took": 1,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 10000,
+      "relation": "gte"
+    },
+    "max_score": null,
+    "hits": []
+  },
+  "aggregations": {
+    "median_absolute_deviation_DistanceMiles": {
+      "value": 1836.265614211182
+    }
+  }
+}
+```

From 35a2e8cb3ddb0ec3b2dd1bb08fed14b9b71600bc Mon Sep 17 00:00:00 2001
From: Heather Halter <HDHALTER@AMAZON.COM>
Date: Wed, 29 May 2024 10:50:34 -0700
Subject: [PATCH 03/10] Update knn-vector-quantization.md (#7262)

Removed comma after "name": "sq"

Signed-off-by: Heather Halter <HDHALTER@AMAZON.COM>
---
 _search-plugins/knn/knn-vector-quantization.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/_search-plugins/knn/knn-vector-quantization.md b/_search-plugins/knn/knn-vector-quantization.md
index 96db75b3eb..549437f346 100644
--- a/_search-plugins/knn/knn-vector-quantization.md
+++ b/_search-plugins/knn/knn-vector-quantization.md
@@ -51,7 +51,7 @@ PUT /test-index
           "space_type": "l2",
           "parameters": {
             "encoder": {
-              "name": "sq",
+              "name": "sq"
             },
             "ef_construction": 256,
             "m": 8

From 69f9cf01f84552cc55b587848c99a4b662e96619 Mon Sep 17 00:00:00 2001
From: zane-neo <zaniu@amazon.com>
Date: Thu, 30 May 2024 09:03:59 +0800
Subject: [PATCH 04/10] Update documentation to add skills plugin to bundled
 plugins (#7269)

Signed-off-by: zane-neo <zaniu@amazon.com>
---
 _install-and-configure/plugins.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/_install-and-configure/plugins.md b/_install-and-configure/plugins.md
index 6b0b28769e..bbfbce9796 100644
--- a/_install-and-configure/plugins.md
+++ b/_install-and-configure/plugins.md
@@ -285,6 +285,7 @@ The following plugins are bundled with all OpenSearch distributions except for m
 | Job Scheduler | [opensearch-job-scheduler](https://github.com/opensearch-project/job-scheduler) | 1.0.0 |
 | k-NN | [opensearch-knn](https://github.com/opensearch-project/k-NN) | 1.0.0 |
 | ML Commons | [opensearch-ml](https://github.com/opensearch-project/ml-commons) | 1.3.0 |
+| Skills | [opensearch-skills](https://github.com/opensearch-project/skills) | 2.12.0 |
 | Neural Search | [neural-search](https://github.com/opensearch-project/neural-search) | 2.4.0 |
 | Observability | [opensearch-observability](https://github.com/opensearch-project/observability) | 1.2.0 |
 | Performance Analyzer<sup>2</sup> | [opensearch-performance-analyzer](https://github.com/opensearch-project/performance-analyzer) | 1.0.0 |

From 078a3677704a41db0f7d2fe1462c807ba9540e17 Mon Sep 17 00:00:00 2001
From: Melissa Vagi <vagimeli@amazon.com>
Date: Thu, 30 May 2024 08:56:45 -0600
Subject: [PATCH 05/10] [DOC] Add join processor documentation (#5985)

* Add join processor documentation

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Add examples and explanatory text

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Address tech review comments

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

* Update _ingest-pipelines/processors/join.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Melissa Vagi <vagimeli@amazon.com>

---------

Signed-off-by: Melissa Vagi <vagimeli@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 _ingest-pipelines/processors/join.md | 135 +++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
 create mode 100644 _ingest-pipelines/processors/join.md

diff --git a/_ingest-pipelines/processors/join.md b/_ingest-pipelines/processors/join.md
new file mode 100644
index 0000000000..c2cdcfe4de
--- /dev/null
+++ b/_ingest-pipelines/processors/join.md
@@ -0,0 +1,135 @@
+---
+layout: default
+title: Join
+parent: Ingest processors
+nav_order: 160
+---
+
+# Join processor
+
+The `join` processor concatenates the elements of an array into a single string value, using a specified separator between each element. It throws an exception if the provided input is not an array.
+
+The following is the syntax for the `join` processor:
+
+```json
+{
+  "join": {
+    "field": "field_name",
+    "separator": "separator_string"
+  }
+}
+```
+{% include copy-curl.html %}
+
+## Configuration parameters
+
+The following table lists the required and optional parameters for the `join` processor.
+
+Parameter | Required/Optional | Description |
+|-----------|-----------|-----------|
+`field` | Required | The name of the field to which the join operator is applied. Must be an array.
+`separator` | Required | A string separator to use when joining field values. If not specified, then the values are concatenated without a separator.
+`target_field` | Optional | The field to assign the cleaned value to. If not specified, then the field is updated in place.
+`description` | Optional | A description of the processor's purpose or configuration.
+`if` | Optional | Specifies to conditionally execute the processor.
+`ignore_failure` | Optional | Specifies to ignore failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`on_failure` | Optional | Specifies to handle failures for the processor. See [Handling pipeline failures]({{site.url}}{{site.baseurl}}/ingest-pipelines/pipeline-failures/).
+`tag` | Optional | An identifier for the processor. Useful for debugging and metrics.
+
+## Using the processor
+
+Follow these steps to use the processor in a pipeline.
+
+### Step 1: Create a pipeline
+
+The following query creates a pipeline named `example-join-pipeline` that uses the `join` processor to concatenate all the values of the `uri`  field, separating them with the specified separator `/`: 
+
+```json
+PUT _ingest/pipeline/example-join-pipeline  
+{  
+  "description": "Example pipeline using the join processor",  
+  "processors": [  
+    {  
+      "join": {  
+        "field": "uri",  
+        "separator": "/"  
+      }  
+    }  
+  ]  
+}  
+```
+{% include copy-curl.html %}
+
+### Step 2 (Optional): Test the pipeline
+
+It is recommended that you test your pipeline before you ingest documents.
+{: .tip}
+
+To test the pipeline, run the following query:
+
+```json
+POST _ingest/pipeline/example-join-pipeline/_simulate  
+{  
+  "docs": [  
+    {  
+      "_source": {  
+        "uri": [  
+          "app",  
+          "home",  
+          "overview"  
+        ]  
+      }  
+    }  
+  ]  
+}
+```
+{% include copy-curl.html %}
+
+#### Response
+
+The following example response confirms that the pipeline is working as expected:
+
+```json
+{  
+  "docs": [  
+    {  
+      "doc": {  
+        "_index": "_index",  
+        "_id": "_id",  
+        "_source": {  
+          "uri": "app/home/overview"  
+        },  
+        "_ingest": {  
+          "timestamp": "2024-05-24T02:16:01.00659117Z"  
+        }  
+      }  
+    }  
+  ]  
+}  
+```
+{% include copy-curl.html %}
+
+### Step 3: Ingest a document 
+
+The following query ingests a document into an index named `testindex1`:
+
+```json
+POST testindex1/_doc/1?pipeline=example-join-pipeline  
+{  
+  "uri": [  
+    "app",  
+    "home",  
+    "overview"  
+  ]  
+} 
+```
+{% include copy-curl.html %}
+
+### Step 4 (Optional): Retrieve the document
+
+To retrieve the document, run the following query:
+
+```json
+GET testindex1/_doc/1
+```
+{% include copy-curl.html %}

From 8e049cd1b77d7f57b8d293176f521693b3f24c8c Mon Sep 17 00:00:00 2001
From: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>
Date: Thu, 30 May 2024 17:23:56 +0100
Subject: [PATCH 06/10] Security best practices  - 10 points to consider -
 #5782 (#7113)

* adding top ten security best practices

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* changing nav order

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* adding to best practices

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* adding to best practices

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* adding to best practices

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* adding bonus tip

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* updates to best practices

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* integrating Darshits suggestions for improvement and reviewdog fixes

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* review suggestions to grammer

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* review suggestions to grammer

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* review suggestions to grammer

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* review suggestions to grammer

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* review suggestions to grammer

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* reviewdog update

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* Apply suggestions from code review

Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>

* reviewdog updates

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>

* Update _security/configuration/best-practices.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>

* Update best-practices.md

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update best-practices.md

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Add editorial comment

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Update best-practices.md

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update _security/configuration/best-practices.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update best-practices.md

Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update best-practices.md

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>

---------

Signed-off-by: leanne.laceybyrne@eliatra.com <leanne.laceybyrne@eliatra.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Signed-off-by: leanneeliatra <131779422+leanneeliatra@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: AntonEliatra <anton.rubin@eliatra.com>
Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 _security/configuration/best-practices.md | 133 ++++++++++++++++++++++
 1 file changed, 133 insertions(+)
 create mode 100644 _security/configuration/best-practices.md

diff --git a/_security/configuration/best-practices.md b/_security/configuration/best-practices.md
new file mode 100644
index 0000000000..97457cdb4b
--- /dev/null
+++ b/_security/configuration/best-practices.md
@@ -0,0 +1,133 @@
+---
+layout: default
+title: Best practices
+parent: Configuration
+nav_order: 11
+---
+
+# Best practices for OpenSearch security
+
+Setting up security in OpenSearch is crucial for protecting your data. Here are 10 best practices that offer clear steps for keeping your system safe.
+
+## 1. Use your own PKI to set up SSL/TLS
+
+Although using your own public key infrastructure (PKI), such as [AWS Certificate Manager](https://docs.aws.amazon.com/crypto/latest/userguide/awspki-service-acm.html), requires more initial effort, a custom PKI provides you with the flexibility needed to set up SSL/TLS in the most secure and performant way.
+
+### Enable SSL/TLS for node- and REST-layer traffic
+
+SSL/TLS is enabled by default on the transport layer, which is used for node-to-node communication. SSL/TLS is disabled by default on the REST layer.
+
+The following setting is required in order to enable encryption on the REST layer: 
+
+```
+plugins.security.ssl.http.enabled: true
+```
+{% include copy.html %}
+
+
+For additional configuration options, such as specifying certificate paths, keys, and certificate authority files, refer to [Configuring TLS certificates]({{site.url}}{{site.baseurl}}/security/configuration/tls/).
+
+### Replace all demo certificates with your own PKI
+
+The certificates generated when initializing an OpenSearch cluster with `install_demo_configuration.sh` are not suitable for production. These should be replaced with your own certificates.
+
+You can generate custom certificates in a few different ways. One approach is to use OpenSSL, described in detail at [Generating self-signed certificates]({{site.url}}{{site.baseurl}}/security/configuration/generate-certificates/). Alternatively, there are online tools available that can simplify the certificate creation process, such as the following:
+
+- [SearchGuard TLS Tool](https://docs.search-guard.com/latest/offline-tls-tool)
+- [TLSTool by dylandreimerink](https://github.com/dylandreimerink/tlstool)
+
+## 2. Prefer client certificate authentication for API authentication
+
+Client certificate authentication offers a secure alternative to password authentication and is more suitable for machine-to-machine interactions. It also ensures low performance overhead because the authentication occurs on the TLS level. Nearly all client software, such as curl and client libraries, support this authentication method.
+
+For detailed configuration instructions and additional information about client certificate authentication, see [Enabling client certificate authentication]({{site.url}}{{site.baseurl}}/security/authentication-backends/client-auth/#enabling-client-certificate-authentication).
+
+
+## 3. Prefer SSO using SAML or OpenID for OpenSearch Dashboards authentication
+
+Implementing single sign-on (SSO) with protocols like SAML or OpenID for OpenSearch Dashboards authentication enhances security by delegating credential management to a dedicated system.
+
+This approach minimizes direct interaction with passwords in OpenSearch, streamlines authentication processes, and prevents clutter in the internal user database. For more information, go to the [SAML section of the OpenSearch documentation]({{site.url}}{{site.baseurl}}/security/authentication-backends/saml/).
+
+## 4. Limit the number of roles assigned to a user
+
+Prioritizing fewer, more intricate user roles over numerous simplistic roles enhances security and simplifies administration.
+
+Additional best practices for role management include:
+
+1. Role granularity: Define roles based on specific job functions or access requirements to minimize unnecessary privileges.
+2. Regular role review: Regularly review and audit assigned roles to ensure alignment with organizational policies and access needs.
+
+For more information about roles, go to the documentation on [defining users and roles in OpenSearch]({{site.url}}{{site.baseurl}}/security/access-control/users-roles/).
+
+## 5. Verify DLS, FLS, and field masking
+
+If you have configured Document Level Security (DLS), Field Level Security (FLS), or field masking, make sure you double-check your role definitions, especially if a user is mapped to multiple roles. It is highly recommended that you test this by making a GET request to `_plugins/_security/authinfo`.
+
+The following resources provide detailed examples and additional configurations:
+
+ - [Document-level security]({{site.url}}{{site.baseurl}}/security/access-control/document-level-security/).
+ - [Field-level security]({{site.url}}{{site.baseurl}}/security/access-control/field-level-security/).
+ - [Field masking]({{site.url}}{{site.baseurl}}/security/access-control/field-masking/).
+
+## 6. Use only the essentials for the audit logging configuration
+
+Extensive audit logging can degrade system performance due to the following:
+
+- Each logged event adds to the processing load.
+- Audit logs can quickly grow in size, consuming significant disk space.
+
+To ensure optimal performance, disable unnecessary logging and be selective about which logs are used. If not strictly required by compliance regulations, consider turning off audit logging. If audit logging is essential for your cluster, configure it according to your compliance requirements.
+
+Whenever possible, adhere to these recommendations:
+
+- Set `audit.log_request_body` to `false`.
+- Set `audit.resolve_bulk_requests` to `false`.
+- Enable `compliance.write_log_diffs`.
+- Minimize entries for `compliance.read_watched_fields`.
+- Minimize entries for `compliance.write_watched_indices`.
+
+## 7. Consider disabling the private tenant
+
+In many cases, the use of private tenants is unnecessary, although this feature is enabled by default. As a result, every OpenSearch Dashboards user is provided with their own private tenant and a corresponding new index in which to save objects. This can lead to a large number of unnecessary indexes. Evaluate whether private tenants are needed in your cluster. If private tenants are not needed, disable the feature by adding the following configuration to the `config.yml` file:
+
+```yaml
+config:
+  dynamic:
+    kibana:
+      multitenancy_enabled: true
+      private_tenant_enabled: false
+```
+{% include copy.html %}
+
+## 8. Manage the configuration using `securityadmin.sh`
+
+Use `securityadmin.sh` to manage the configuration of your clusters. `securityadmin.sh` is a command-line tool provided by OpenSearch for managing security configurations. It allows administrators to efficiently manage security settings, including roles, role mappings, and other security-related configurations within an OpenSearch cluster.
+
+Using `securityadmin.sh` provides the following benefits:
+
+1. Consistency: By using `securityadmin.sh`, administrators can ensure consistency across security configurations within a cluster. This helps to maintain a standardized and secure environment.
+2. Automation: `securityadmin.sh` enables automation of security configuration tasks, making it easier to deploy and manage security settings across multiple nodes or clusters.
+3. Version control: Security configurations managed through `securityadmin.sh` can be version controlled using standard version control systems like Git. This facilitates tracking changes, auditing, and reverting to previous configurations.
+
+You can prevent configuration overrides by first creating a backup of the current configuration created using the OpenSearch Dashboards UI or the OpenSearch API by running the `securityadmin.sh` tool with the `-backup` option. This ensures that all configurations are captured before uploading the modified configuration with `securityadmin.sh`.
+
+For more detailed information about using `securityadmin.sh` and managing OpenSearch security configurations, refer to the following resources:
+- [Applying changes to configuration files]({{site.url}}{{site.baseurl}}/security/configuration/security-admin/)
+- [Modifying YAML files]({{site.url}}{{site.baseurl}}/security/configuration/yaml/)
+
+## 9. Replace all default passwords
+
+When initializing OpenSearch with the demo configuration, many default passwords are provided for internal users in `internal_users.yml`, such as `admin`, `kibanaserver`, and `logstash`.
+
+You should change the passwords for these users to strong, complex passwords either at startup or as soon as possible once the cluster is running. Creating password configurations is a straightforward procedure, especially when using the scripts bundled with OpenSearch, like `hash.sh` or `hash.bat`, located in the `plugin/OpenSearch security/tools` directory.
+
+The `kibanaserver` user is a crucial component that allows OpenSearch Dashboards to communicate with the OpenSearch cluster. By default, this user is preconfigured with a default password in the demo configuration. This should be replaced with a strong, unique password in the OpenSearch configuration, and the `opensearch_dashboards.yml` file should be updated to reflect this change.
+
+
+## 10. Getting help
+
+If you need additional help, you can do the following:
+
+- Create an issue on GitHub at [OpenSearch-project/security](https://github.com/opensearch-project/security/security) or [OpenSearch-project/OpenSearch](https://github.com/opensearch-project/OpenSearch/security).
+- Ask a question on the [OpenSearch forum](https://forum.opensearch.org/tag/cve).

From 7f51034305ba7c773504d4093bc07689e8e04d91 Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Thu, 30 May 2024 13:55:29 -0400
Subject: [PATCH 07/10] Explain k in approximate k-NN (#7194)

* Explain k in approximate k-NN

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Additional info

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Delete engine row in table

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Add a clarification to the table

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Update _search-plugins/knn/approximate-knn.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

---------

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
---
 _search-plugins/knn/approximate-knn.md | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md
index 7d3e119349..39e9da7525 100644
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@@ -127,10 +127,25 @@ GET my-knn-index-1/_search
 }
 ```
 
-`k` is the number of neighbors the search of each graph will return. You must also include the `size` option, which
-indicates how many results the query actually returns. The plugin returns `k` amount of results for each shard
-(and each segment) and `size` amount of results for the entire query. The plugin supports a maximum `k` value of 10,000.
-Starting in OpenSearch 2.14, in addition to using the `k` variable, both the `min_score` and `max_distance` variables can be used for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/).
+### The number of returned results
+
+In the preceding query, `k` represents the number of neighbors returned by the search of each graph. You must also include the `size` option, indicating the final number of results that you want the query to return.  
+
+For the NMSLIB and Faiss engines, `k` represents the maximum number of documents returned for all segments of a shard. For the Lucene engine, `k` represents the number of documents returned for a shard. The maximum value of `k` is 10,000.
+
+For any engine, each shard returns `size` results to the coordinator node. Thus, the total number of results that the coordinator node receives is `size * number of shards`. After the coordinator node consolidates the results received from all nodes, the query returns the top `size` results.
+
+The following table provides examples of the number of results returned by various engines in several scenarios. For these examples, assume that the number of documents contained in the segments and shards is sufficient to return the number of results specified in the table.
+
+`size` 	| `k` | Number of primary shards | 	Number of segments per shard | Number of returned results, Faiss/NMSLIB | Number of returned results, Lucene
+10 |	1 |	1 |	4 |	4 | 1
+10 | 10 |	1 |	4 |	10 | 10
+10 |	1 |	2 |	4 |	8 | 2
+ 
+The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results. 
+
+Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/).
+
 ### Building a k-NN index from a model
 
 For some of the algorithms that we support, the native library index needs to be trained before it can be used. It would be expensive to training every newly created segment, so, instead, we introduce the concept of a *model* that is used to initialize the native library index during segment creation. A *model* is created by calling the [Train API]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-a-model), passing in the source of training data as well as the method definition of the model. Once training is complete, the model will be serialized to a k-NN model system index. Then, during indexing, the model is pulled from this index to initialize the segments.

From 3009d16134d02835751d1e29a3bf080e2cd8a868 Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Thu, 30 May 2024 13:56:07 -0400
Subject: [PATCH 08/10] Add version to PR template (#7276)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 .github/PULL_REQUEST_TEMPLATE.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index 7eccae7052..bbf3b8d035 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -4,6 +4,8 @@ _Describe what this change achieves._
 ### Issues Resolved
 _List any issues this PR will resolve, e.g. Closes [...]._
 
+### Version
+_List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._
 
 ### Checklist
 - [ ] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the [Developers Certificate of Origin](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

From 6732d2f9e15b787a3577b4a1644783b66552ed57 Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Thu, 30 May 2024 14:01:12 -0400
Subject: [PATCH 09/10] Remove formatting from front matter in foreach
 processor (#7277)

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
---
 _ingest-pipelines/processors/foreach.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/_ingest-pipelines/processors/foreach.md b/_ingest-pipelines/processors/foreach.md
index 72a0ed1420..d0f962e618 100644
--- a/_ingest-pipelines/processors/foreach.md
+++ b/_ingest-pipelines/processors/foreach.md
@@ -1,11 +1,13 @@
 ---
 layout: default
-title: `foreach`
+title: Foreach
 parent: Ingest processors
 nav_order: 110
 ---
 
-# `foreach` processor
+<!-- vale off -->
+# Foreach processor
+<!-- vale on -->
 
 The `foreach` processor is used to iterate over a list of values in an input document and apply a transformation to each value. This can be useful for tasks like processing all the elements in an array consistently, such as converting all elements in a string to lowercase or uppercase.
 

From 9af765f6513279f456ab5c182a1d940c737a3bb1 Mon Sep 17 00:00:00 2001
From: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Date: Thu, 30 May 2024 15:07:26 -0400
Subject: [PATCH 10/10] Add table header to approximate k-NN (#7280)

Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
---
 _search-plugins/knn/approximate-knn.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md
index 39e9da7525..c0a9557728 100644
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@@ -138,6 +138,7 @@ For any engine, each shard returns `size` results to the coordinator node. Thus,
 The following table provides examples of the number of results returned by various engines in several scenarios. For these examples, assume that the number of documents contained in the segments and shards is sufficient to return the number of results specified in the table.
 
 `size` 	| `k` | Number of primary shards | 	Number of segments per shard | Number of returned results, Faiss/NMSLIB | Number of returned results, Lucene
+:--- | :--- | :--- | :--- | :--- | :---
 10 |	1 |	1 |	4 |	4 | 1
 10 | 10 |	1 |	4 |	10 | 10
 10 |	1 |	2 |	4 |	8 | 2
@@ -326,4 +327,4 @@ included in the distance function.
 With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of
 such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests
 containing the zero vector will be rejected and a corresponding exception will be thrown.
-{: .note }
\ No newline at end of file
+{: .note }