Merge branch 'main' into 7507-collapse-search-results

opensearch-project · Jul 16, 2024 · 4a3effd · 4a3effd
2 parents e9f73e0 + a7a7155
commit 4a3effd
Show file tree

Hide file tree

Showing 18 changed files with 573 additions and 45 deletions.
diff --git a/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Products/accept.txt
@@ -76,7 +76,6 @@ Painless
 Peer Forwarder
 Performance Analyzer
 Piped Processing Language
-Point in Time
 Powershell
 Python
 PyTorch

diff --git a/.github/workflows/pr_checklist.yml b/.github/workflows/pr_checklist.yml
@@ -0,0 +1,43 @@
+name: PR Checklist
+
+on:
+  pull_request:
+    types: [opened]
+
+permissions:
+  pull-requests: write
+
+jobs:
+  add-checklist:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Comment PR with checklist
+        uses: peter-evans/create-or-update-comment@v3
+        with:
+          token: ${{ secrets.GITHUB_TOKEN }}
+          issue-number: ${{ github.event.pull_request.number }}
+          body: |
+            Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.
+            
+            Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a [maintainer](https://github.com/opensearch-project/documentation-website/blob/main/MAINTAINERS.md).
+
+            **When you're ready for doc review, tag the assignee of this PR**. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.
+
+      - name: Auto assign PR to repo owner
+        uses: actions/github-script@v6
+        with:
+          script: |
+            let assignee = context.payload.pull_request.user.login;
+            const prOwners = ['Naarcha-AWS', 'kolchfa-aws', 'vagimeli', 'natebower'];
+            
+            if (!prOwners.includes(assignee)) {
+              assignee = 'hdhalter'
+            }
+
+            github.rest.issues.addAssignees({
+                issue_number: context.issue.number,
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                assignees: [assignee]
+              });
diff --git a/.gitignore b/.gitignore
@@ -4,4 +4,5 @@ _site
 .DS_Store
 Gemfile.lock
 .idea
+*.iml
 .jekyll-cache
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -100,10 +100,10 @@ Follow these steps to set up your local copy of the repository:
 
 #### Troubleshooting
 
-If you encounter an error while trying to build the documentation website, find the error in the following troubleshooting list: 
+Try the following troubleshooting steps if you encounter an error when trying to build the documentation website:  
 
-- When running `rvm install 3.2` if you receive a `Error running '__rvm_make -j10'`, resolve this by running `rvm install 3.2.0 -C --with-openssl-dir=/opt/homebrew/opt/[email protected]` instead of `rvm install 3.2`.
-- If receive a `bundle install`: `An error occurred while installing posix-spawn (0.3.15), and Bundler cannot continue.` error when trying to run `bundle install`, resolve this by running `gem install posix-spawn -v 0.3.15 -- --with-cflags=\"-Wno-incompatible-function-pointer-types\"`. Then, run `bundle install`.
+- If you see the `Error running '__rvm_make -j10'` error when running `rvm install 3.2`, you can resolve it by running `rvm install 3.2.0 -C --with-openssl-dir=/opt/homebrew/opt/[email protected]` instead of `rvm install 3.2`.
+- If you see the `bundle install`: `An error occurred while installing posix-spawn (0.3.15), and Bundler cannot continue.` error when trying to run `bundle install`, you can resolve it by running `gem install posix-spawn -v 0.3.15 -- --with-cflags=\"-Wno-incompatible-function-pointer-types\"` and then `bundle install`.
 
 
 

diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md
@@ -20,9 +20,9 @@ You can include placeholder expressions in the value of workflow step fields. Fo
 
 Once a workflow is created, provide its `workflow_id` to other APIs.
 
-The `POST` method creates a new workflow. The `PUT` method updates an existing workflow. 
+The `POST` method creates a new workflow. The `PUT` method updates an existing workflow. You can specify the `update_fields` parameter to update specific fields.
 
-You can only update a workflow if it has not yet been provisioned.
+You can only update a complete workflow if it has not yet been provisioned.
 {: .note}
 
 ## Path and HTTP methods
@@ -58,11 +58,26 @@ POST /_plugins/_flow_framework/workflow?validation=none
 ```
 {% include copy-curl.html %}
 
+You cannot update a full workflow once it has been provisioned, but you can update fields other than the `workflows` field, such as `name` and `description`:
+
+```json
+PUT /_plugins/_flow_framework/workflow/<workflow_id>?update_fields=true
+{
+  "name": "new-template-name",
+  "description": "A new description for the existing template"
+}
+```
+{% include copy-curl.html %}
+
+You cannot specify both the `provision` and `update_fields` parameters at the same time.
+{: .note}
+
 The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`.
 
 | Parameter | Data type | Description |
 | :--- | :--- | :--- |
 | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. |
+| `update_fields` | Boolean | Whether to update only the fields included in the request body. Default is `false`. |
 | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. |
 | User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). |
 

diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md
@@ -26,19 +26,18 @@ The following is the syntax for the `split` processor:
 
 The following table lists the required and optional parameters for the `split` processor.
 
-Parameter | Required/Optional | Description |
-|-----------|-----------|-----------|
-`field` | Required | The field containing the string to be split.
-`separator` | Required | The delimiter used to split the string. This can be a regular expression pattern.
-`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`.
-`target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place.
-`ignore_missing` | Optional	| Specifies whether the processor should ignore documents that do not contain the specified 
-field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. 
-`description` | Optional | A brief description of the processor.
-`if` | Optional | A condition for running the processor.
-`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`.
-`on_failure` | Optional | A list of processors to run if the processor fails.
-`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type.
+Parameter  | Required/Optional  | Description 
+:--- | :--- | :--- 
+`field` | Required | The field containing the string to be split. 
+`separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. 
+`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. 
+`target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. 
+`ignore_missing` | Optional	| Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`.  
+`description` | Optional | A brief description of the processor. 
+`if` | Optional | A condition for running the processor. 
+`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters an error. If set to `true`, then failures are ignored. Default is `false`. 
+`on_failure` | Optional | A list of processors to run if the processor fails. 
+`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. 
 
 ## Using the processor
 

diff --git a/_query-dsl/compound/hybrid.md b/_query-dsl/compound/hybrid.md
@@ -12,11 +12,7 @@ You can use a hybrid query to combine relevance scores from multiple queries int
 
 ## Example
 
-Before using a `hybrid` query, you must set up a machine learning (ML) model, ingest documents, and configure a search pipeline with a [`normalization-processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/).
-
-To learn how to set up an ML model, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
-
-Once you set up an ML model, learn how to use the `hybrid` query by following the steps in [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search).
+Learn how to use the `hybrid` query by following the steps in [Using hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/#using-hybrid-search).
 
 For a comprehensive example, follow the [Neural search tutorial]({{site.url}}{{site.baseurl}}/ml-commons-plugin/semantic-search#tutorial).
 

diff --git a/_query-dsl/geo-and-xy/geopolygon.md b/_query-dsl/geo-and-xy/geopolygon.md
@@ -0,0 +1,177 @@
+---
+layout: default
+title: Geopolygon
+parent: Geographic and xy queries
+grand_parent: Query DSL
+nav_order: 30
+---
+
+# Geopolygon query
+
+A geopolygon query returns documents containing geopoints that are within the specified polygon. A document containing multiple geopoints matches the query if at least one geopoint matches the query.
+
+A polygon is specified by a list of vertices in coordinate form. Unlike specifying a polygon for a geoshape field, the polygon does not have to be closed (specifying the first and last points at the same is unnecessary). Though points do not have to follow either clockwise or counterclockwise order, it is recommended that you list them in either of these orders. This will ensure that the correct polygon is captured.
+
+The searched document field must be mapped as `geo_point`.
+{: .note}
+
+## Example
+
+Create a mapping with the `point` field mapped as `geo_point`:
+
+```json
+PUT /testindex1
+{
+  "mappings": {
+    "properties": {
+      "point": {
+        "type": "geo_point"
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+Index a geopoint, specifying its latitude and longitude:
+
+```json
+PUT testindex1/_doc/1
+{
+  "point": { 
+    "lat": 73.71,
+    "lon": 41.32
+  }
+}
+```
+{% include copy-curl.html %}
+
+Search for documents whose `point` objects are within the specified `geo_polygon`:
+
+```json
+GET /testindex1/_search
+{
+  "query": {
+    "bool": {
+      "must": {
+        "match_all": {}
+      },
+      "filter": {
+        "geo_polygon": {
+          "point": {
+            "points": [
+              { "lat": 74.5627, "lon": 41.8645 },
+              { "lat": 73.7562, "lon": 42.6526 },
+              { "lat": 73.3245, "lon": 41.6189 },
+              { "lat": 74.0060, "lon": 40.7128 }
+           ]
+          }
+        }
+      }
+    }
+  }
+}
+```
+{% include copy-curl.html %}
+
+The polygon specified in the preceding request is the quadrilateral depicted in the following image. The matching document is within this quadrilateral. The coordinates of the quadrilateral vertices are specified in `(latitude, longitude)` format.
+
+![Search for points within the specified quadrilateral]({{site.url}}{{site.baseurl}}/images/geopolygon-query.png)
+
+The response contains the matching document:
+
+```json
+{
+  "took": 6,
+  "timed_out": false,
+  "_shards": {
+    "total": 1,
+    "successful": 1,
+    "skipped": 0,
+    "failed": 0
+  },
+  "hits": {
+    "total": {
+      "value": 1,
+      "relation": "eq"
+    },
+    "max_score": 1,
+    "hits": [
+      {
+        "_index": "testindex1",
+        "_id": "1",
+        "_score": 1,
+        "_source": {
+          "point": {
+            "lat": 73.71,
+            "lon": 41.32
+          }
+        }
+      }
+    ]
+  }
+}
+```
+
+In the preceding search request, you specified the polygon vertices in clockwise order:
+
+```json
+"geo_polygon": {
+    "point": {
+    "points": [
+        { "lat": 74.5627, "lon": 41.8645 },
+        { "lat": 73.7562, "lon": 42.6526 },
+        { "lat": 73.3245, "lon": 41.6189 },
+        { "lat": 74.0060, "lon": 40.7128 }
+    ]
+    }
+}
+```
+
+Alternatively, you can specify the vertices in counterclockwise order:
+
+```json
+"geo_polygon": {
+    "point": {
+    "points": [
+        { "lat": 74.5627, "lon": 41.8645 },
+        { "lat": 74.0060, "lon": 40.7128 },
+        { "lat": 73.3245, "lon": 41.6189 },
+        { "lat": 73.7562, "lon": 42.6526 }
+    ]
+    }
+}
+```
+
+The resulting query response contains the same matching document.
+
+However, if you specify the vertices in the following order:
+
+```json
+"geo_polygon": {
+    "point": {
+    "points": [
+        { "lat": 74.5627, "lon": 41.8645 },
+        { "lat": 74.0060, "lon": 40.7128 },
+        { "lat": 73.7562, "lon": 42.6526 },
+        { "lat": 73.3245, "lon": 41.6189 }
+    ]
+    }
+}
+```
+
+The response returns no results.
+
+## Request fields
+
+Geopolygon queries accept the following fields.
+
+Field | Data type | Description
+:--- | :--- | :--- 
+`_name` | String | The name of the filter. Optional.
+`validation_method` | String | The validation method. Valid values are `IGNORE_MALFORMED` (accept geopoints with invalid coordinates), `COERCE` (try to coerce coordinates to valid values), and `STRICT` (return an error when coordinates are invalid). Optional. Default is `STRICT`.
+`ignore_unmapped` | Boolean | Specifies whether to ignore an unmapped field. If set to `true`, then the query does not return any documents that contain an unmapped field. If set to `false`, then an exception is thrown when the field is unmapped. Optional. Default is `false`.
+
+## Accepted formats
+
+You can specify the geopoint coordinates when indexing a document and searching for documents in any [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point#formats) accepted by the geopoint field type.  
diff --git a/_query-dsl/geo-and-xy/index.md b/_query-dsl/geo-and-xy/index.md
@@ -30,7 +30,7 @@ OpenSearch provides the following geographic query types:
 
 - [**Geo-bounding box queries**]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/geo-and-xy/geo-bounding-box/): Return documents with geopoint field values that are within a bounding box. 
 - [**Geodistance queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents with geopoints that are within a specified distance from the provided geopoint.
-- **Geopolygon queries**: Return documents with geopoints that are within a polygon.
+- [**Geopolygon queries**]({{site.url}}{{site.baseurl}}/query-dsl/geo-and-xy/geodistance/): Return documents containing geopoints that are within a polygon.
 - **Geoshape queries**: Return documents that contain:
     - Geoshapes and geopoints that have one of four spatial relations to the provided shape: `INTERSECTS`, `DISJOINT`, `WITHIN`, or `CONTAINS`.
     - Geopoints that intersect the provided shape.
diff --git a/_search-plugins/hybrid-search.md b/_search-plugins/hybrid-search.md
@@ -12,7 +12,7 @@ Introduced 2.11
 Hybrid search combines keyword and neural search to improve search relevance. To implement hybrid search, you need to set up a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) that runs at search time. The search pipeline you'll configure intercepts search results at an intermediate stage and applies the [`normalization_processor`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/normalization-processor/) to them. The `normalization_processor` normalizes and combines the document scores from multiple query clauses, rescoring the documents according to the chosen normalization and combination techniques. 
 
 **PREREQUISITE**<br>
-Before using hybrid search, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model).
+To follow this example, you must set up a text embedding model. For more information, see [Choosing a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/integrating-ml-models/#choosing-a-model). If you have already generated text embeddings, ingest the embeddings into an index and skip to [Step 4](#step-4-configure-a-search-pipeline).
 {: .note}
 
 ## Using hybrid search