Merge branch 'main' into patch-1

opensearch-project · Jul 30, 2024 · 5cc2512 · 5cc2512
2 parents 35c53ce + 8e03f53
commit 5cc2512
Show file tree

Hide file tree

Showing 9 changed files with 229 additions and 70 deletions.
diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt
@@ -86,6 +86,7 @@ RPM Package Manager
 Ruby
 Simple Schema for Observability
 Tableau
+Textract
 TorchScript
 Tribuo
 VisBuilder

diff --git a/_api-reference/document-apis/bulk.md b/_api-reference/document-apis/bulk.md
@@ -59,7 +59,7 @@ routing | String | Routes the request to the specified shard.
 timeout | Time | How long to wait for the request to return. Default `1m`.
 type | String | (Deprecated) The default document type for documents that don't specify a type. Default is `_doc`. We highly recommend ignoring this parameter and using a type of `_doc` for all indexes.
 wait_for_active_shards | String | Specifies the number of active shards that must be available before OpenSearch processes the bulk request. Default is 1 (only the primary shard). Set to `all` or a positive integer. Values greater than 1 require replicas. For example, if you specify a value of 3, the index must have two replicas distributed across two additional nodes for the request to succeed.
-batch_size | Integer | Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `1` (documents are ingested by an ingest pipeline one at a time). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches.
+batch_size | Integer | **(Deprecated)** Specifies the number of documents to be batched and sent to an ingest pipeline to be processed together. Default is `2147483647` (documents are ingested by an ingest pipeline all at once). If the bulk request doesn't explicitly specify an ingest pipeline or the index doesn't have a default ingest pipeline, then this parameter is ignored. Only documents with `create`, `index`, or `update` actions can be grouped into batches.
 {% comment %}_source | List | asdf
 _source_excludes | list | asdf
 _source_includes | list | asdf{% endcomment %}

diff --git a/_ingest-pipelines/processors/sparse-encoding.md b/_ingest-pipelines/processors/sparse-encoding.md
@@ -41,6 +41,7 @@ The following table lists the required and optional parameters for the `sparse_e
 `field_map.<vector_field>`  | String | Required | The name of the vector field in which to store the generated vector embeddings.
 `description`  | String | Optional  | A brief description of the processor.  |
 `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
+`batch_size` | Integer | Optional | Specifies the number of documents to be batched and processed each time. Default is `1`. |
 
 ## Using the processor
 

diff --git a/_ingest-pipelines/processors/text-embedding.md b/_ingest-pipelines/processors/text-embedding.md
@@ -41,6 +41,7 @@ The following table lists the required and optional parameters for the `text_emb
 `field_map.<vector_field>`  | String | Required | The name of the vector field in which to store the generated text embeddings.
 `description`  | String | Optional  | A brief description of the processor.  |
 `tag` | String | Optional | An identifier tag for the processor. Useful for debugging to distinguish between processors of the same type. |
+`batch_size` | Integer | Optional | Specifies the number of documents to be batched and processed each time. Default is `1`. |
 
 ## Using the processor
 

diff --git a/_install-and-configure/additional-plugins/index.md b/_install-and-configure/additional-plugins/index.md
@@ -0,0 +1,40 @@
+---
+layout: default
+title: Additional plugins
+parent: Installing plugins
+nav_order: 10
+---
+
+# Additional plugins
+
+There are many more plugins available in addition to those provided by the standard distribution of OpenSearch. These additional plugins have been built by OpenSearch developers or members of the OpenSearch community. While it isn't possible to provide an exhaustive list (because many plugins are not maintained in an OpenSearch GitHub repository), the following plugins, available in the [OpenSearch/plugins](https://github.com/opensearch-project/OpenSearch/tree/main/plugins) directory on GitHub, are some of the plugins that can be installed using one of the installation options, for example, using the command `bin/opensearch-plugin install <plugin-name>`.
+
+
+| Plugin name | Earliest available version |
+| :--- | :--- |
+| analysis-icu | 1.0.0 |
+| analysis-kuromoji | 1.0.0 |
+| analysis-nori | 1.0.0 |
+| analysis-phonetic | 1.0.0 |
+| analysis-smartcn | 1.0.0 |
+| analysis-stempel | 1.0.0 |
+| analysis-ukrainian | 1.0.0 |
+| discovery-azure-classic | 1.0.0 |
+| discovery-ec2 | 1.0.0 |
+| discovery-gce | 1.0.0 |
+| ingest-attachment | 1.0.0 |
+| mapper-annotated-text | 1.0.0 |
+| mapper-murmur3 | 1.0.0 |
+| [`mapper-size`]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/) | 1.0.0 |
+| query-insights | 2.12.0 |
+| repository-azure | 1.0.0 |
+| repository-gcs | 1.0.0 |
+| repository-hdfs | 1.0.0 |
+| repository-s3 | 1.0.0 |
+| store-smb | 1.0.0 |
+| transport-nio | 1.0.0 |
+
+
+## Related articles
+[Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/)
+[`mapper-size` plugin]({{site.url}}{{site.baseurl}}/install-and-configure/additional-plugins/mapper-size-plugin/)
diff --git a/_install-and-configure/additional-plugins/mapper-size-plugin.md b/_install-and-configure/additional-plugins/mapper-size-plugin.md
@@ -0,0 +1,100 @@
+---
+layout: default
+title: Mapper-size plugin
+parent: Installing plugins
+nav_order: 20
+
+---
+
+# Mapper-size plugin
+
+The `mapper-size` plugin enables the use of the `_size` field in OpenSearch indexes. The `_size` field stores the size, in bytes, of each document.
+
+## Installing the plugin
+
+You can install the `mapper-size` plugin using the following command:
+
+```sh
+./bin/opensearch-plugin install mapper-size
+```
+
+## Examples
+
+After starting up a cluster, you can create an index with size mapping enabled, index a document, and search for documents, as shown in the following examples.
+
+### Create an index with size mapping enabled
+
+```sh
+curl -XPUT example-index -H "Content-Type: application/json" -d '{
+  "mappings": {
+    "_size": {
+      "enabled": true
+    },
+    "properties": {
+      "name": {
+        "type": "text"
+      },
+      "age": {
+        "type": "integer"
+      }
+    }
+  }
+}'
+```
+
+### Index a document
+
+```sh
+curl -XPOST example-index/_doc -H "Content-Type: application/json" -d '{
+  "name": "John Doe",
+  "age": 30
+}'
+```
+
+### Query the index
+
+```sh
+curl -XGET example-index/_search -H "Content-Type: application/json" -d '{
+  "query": {
+    "match_all": {}
+  },
+  "stored_fields": ["_size", "_source"]
+}'
+```
+
+### Query results
+
+In the following example, the `_size` field is included in the query results and shows the size, in bytes, of the indexed document:
+
+```json
+{
+  "took": 2,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1.0,
+    "hits": [
+      {
+        "_index": "example_index",
+        "_id": "Pctw0I8BLto8I5f_NLKK",
+        "_score": 1.0,
+        "_size": 37,
+        "_source": {
+          "name": "John Doe",
+          "age": 30
+        }
+      }
+    ]
+  }
+}
+```
+