Skip to content

Commit

Permalink
Merge branch 'main' into log-types
Browse files Browse the repository at this point in the history
  • Loading branch information
Naarcha-AWS authored Apr 3, 2024
2 parents a0f5f81 + 9ff5db3 commit 1bd4e66
Show file tree
Hide file tree
Showing 21 changed files with 167 additions and 36 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/encoding-check.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Encoding Checker

on: [pull_request]

jobs:
encoding-checker:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Check for possible file that does not follow utf-8 encoding
run: |
set +e
IFS=$(echo -en "\n\b")
COUNTER=0
for i in `find . -type f \( -name "*.txt" -o -name "*.md" -o -name "*.markdown" -o -name "*.html" \) | grep -vE "^./.git"`;
do
grep -axv '.*' "$i"
if [ "$?" -eq 0 ]; then
echo -e "######################\n$i\n######################"
COUNTER=$(( COUNTER + 1 ))
fi
done
if [ "$COUNTER" != 0 ]; then
echo "Found files that is not following utf-8 encoding, exit 1"
exit 1
fi
1 change: 1 addition & 0 deletions _about/version-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ permalink: /version-history/

OpenSearch version | Release highlights | Release date
:--- | :--- | :---
[2.13.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.13.0.md) | Makes agents and tools and the OpenSearch Assistant Toolkit generally available. Introduces vector quantization within OpenSearch. Adds LLM guardrails and hybrid search with aggregations. Adds the Bloom filter skipping index for Apache Spark data sources, I/O-based admission control, and the ability to add an alerting cluster that manages all alerting tasks. For a full list of release highlights, see the Release Notes. | 2 April 2024
[2.12.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.12.0.md) | Makes concurrent segment search and conversational search generally available. Provides an experimental OpenSearch Assistant Toolkit, including agents and tools, workflow automation, and OpenSearch Assistant for OpenSearch Dashboards UI. Adds a new match-only text field, query insights to monitor top N queries, and k-NN search on nested fields. For a full list of release highlights, see the Release Notes. | 20 February 2024
[2.11.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.11.1.md) | Includes maintenance changes and bug fixes for cross-cluster replication, alerting, observability, OpenSearch Dashboards, index management, machine learning, security, and security analytics. For a full list of release highlights, see the Release Notes. | 30 November 2023
[2.11.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.11.0.md) | Adds multimodal and sparse neural search capability and the ability to take shallow snapshots that refer to data stored in remote-backed storage. Makes the search comparison tool generally available. Includes a simplified workflow to create threat detectors in Security Analytics and improved security in OpenSearch Dashboards. Experimental features include a new framework and toolset for distributed tracing and updates to conversational search. For a full list of release highlights, see the Release Notes. | 16 October 2023
Expand Down
8 changes: 4 additions & 4 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ baseurl: "/docs/latest" # the subpath of your site, e.g. /blog
url: "https://opensearch.org" # the base hostname & protocol for your site, e.g. http://example.com
permalink: /:path/

opensearch_version: '2.12.0'
opensearch_dashboards_version: '2.12.0'
opensearch_major_minor_version: '2.12'
lucene_version: '9_9_2'
opensearch_version: '2.13.0'
opensearch_dashboards_version: '2.13.0'
opensearch_major_minor_version: '2.13'
lucene_version: '9_10_0'

# Build settings
markdown: kramdown
Expand Down
1 change: 1 addition & 0 deletions _data-prepper/pipelines/configuration/buffers/kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ Option | Required | Type | Description
#### producer_properties

Use the following configuration options to configure a Kafka producer.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`max_request_size` | No | Integer | The maximum size of the request that the producer sends to Kafka. Default is 1 MB.
Expand Down
55 changes: 55 additions & 0 deletions _data-prepper/pipelines/configuration/processors/parse-xml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
layout: default
title: parse_xml
parent: Processors
grand_parent: Pipelines
nav_order: 83
---

# parse_xml

The `parse_xml` processor parses XML data for an event.

## Configuration

You can configure the `parse_xml` processor with the following options.

| Option | Required | Type | Description |
| :--- | :--- | :--- | :--- |
| `source` | No | String | Specifies which `event` field to parse. |
| `destination` | No | String | The destination field of the parsed XML. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only string because these are not valid `event` fields. |
| `pointer` | No | String | A JSON pointer to the field to be parsed. The value is null by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event` object. If the key that is pointed to already exists in the `event` object and the `destination` is the root, then the pointer uses the entire path of the key. |
| `parse_when` | No | String | Specifies under what conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). |
| `tags_on_failure` | No | String | A list of strings that specify the tags to be set if the processor fails or an unknown exception occurs while parsing.

## Usage

The following examples show how to use the `parse_xml` processor in your pipeline.

### Example: Minimum configuration

The following example shows the minimum configuration for the `parse_xml` processor:

```yaml
parse-xml-pipeline:
source:
stdin:
processor:
- parse_xml:
source: "my_xml"
sink:
- stdout:
```
{% include copy.html %}
When the input event contains the following data:
```
{ "my_xml": "<Person><name>John Doe</name><age>30</age></Person>" }
```

The processor parses the event into the following output:

```
{ "name": "John Doe", "age": "30" }
```
11 changes: 5 additions & 6 deletions _data-prepper/pipelines/configuration/sinks/opensearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,6 @@ pipeline:

The following table describes options you can configure for the `opensearch` sink.

<!-- vale off -->
Option | Required | Type | Description
:--- | :--- |:---| :---
`hosts` | Yes | List | A list of OpenSearch hosts to write to, such as `["https://localhost:9200", "https://remote-cluster:9200"]`.
Expand Down Expand Up @@ -89,9 +88,9 @@ Option | Required | Type | Description
`normalize_index` | No | Boolean | If true, then the OpenSearch sink will try to create dynamic index names. Index names with format options specified in `${})` are valid according to the [index naming restrictions]({{site.url}}{{site.baseurl}}/api-reference/index-apis/create-index/#index-naming-restrictions). Any invalid characters will be removed. Default value is `false`.
`routing` | No | String | A string used as a hash for generating the `shard_id` for a document when it is stored in OpenSearch. Each incoming record is searched. When present, the string is used as the routing field for the document. When not present, the default routing mechanism (`document_id`) is used by OpenSearch when storing the document. Supports formatting with fields in events and [Data Prepper expressions]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), such as `${/my_field}-test-${getMetadata(\"some_metadata_key\")}`.
`document_root_key` | No | String | The key in the event that will be used as the root in the document. The default is the root of the event. If the key does not exist, then the entire event is written as the document. If `document_root_key` is of a basic value type, such as a string or integer, then the document will have a structure of `{"data": <value of the document_root_key>}`.
`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
<!-- vale on -->
`serverless` | No | Boolean | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).


## aws

Expand All @@ -101,8 +100,8 @@ Option | Required | Type | Description
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon SQS and Amazon S3. Defaults to `null`, which will use [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`sts_header_overrides` | No | Map | A map of header overrides that the IAM role assumes for the sink plugin.
`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS.
`serverless` | No | Boolean | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | **Deprecated in Data Prepper 2.7. Use this option with the `aws` configuration instead.** The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).
`serverless` | No | Boolean | Determines whether the OpenSearch backend is Amazon OpenSearch Serverless. Set this value to `true` when the destination for the `opensearch` sink is an Amazon OpenSearch Serverless collection. Default is `false`.
`serverless_options` | No | Object | The network configuration options available when the backend of the `opensearch` sink is set to Amazon OpenSearch Serverless. For more information, see [Serverless options](#serverless-options).

<!-- vale off -->
## actions
Expand Down
7 changes: 4 additions & 3 deletions _data/versions.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
{
"current": "2.12",
"current": "2.13",
"all": [
"2.12",
"2.13",
"1.3"
],
"archived": [
"2.12",
"2.11",
"2.10",
"2.9",
Expand All @@ -21,7 +22,7 @@
"1.1",
"1.0"
],
"latest": "2.12"
"latest": "2.13"
}


6 changes: 5 additions & 1 deletion _query-dsl/term/exists.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,4 +146,8 @@ The response contains the matching document:

## Parameters

The query accepts the name of the field (`<field>`) as a top-level parameter.
The query accepts the name of the field (`<field>`) as a top-level parameter.

Parameter | Data type | Description
:--- | :--- | :---
`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
5 changes: 3 additions & 2 deletions _query-dsl/term/fuzzy.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ GET _search
"fuzzy": {
"<field>": {
"value": "sample",
...
...
}
}
}
Expand All @@ -80,11 +80,12 @@ The `<field>` accepts the following parameters. All parameters except `value` ar
Parameter | Data type | Description
:--- | :--- | :---
`value` | String | The term to search for in the field specified in `<field>`.
`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
`fuzziness` | `AUTO`, `0`, or a positive integer | The number of character edits (insert, delete, substitute) needed to change one word to another when determining whether a term matched a value. For example, the distance between `wined` and `wind` is 1. The default, `AUTO`, chooses a value based on the length of each term and is a good choice for most use cases.
`max_expansions` | Positive integer | The maximum number of terms to which the query can expand. Fuzzy queries “expand to” a number of matching terms that are within the distance specified in `fuzziness`. Then OpenSearch tries to match those terms. Default is `50`.
`prefix_length` | Non-negative integer | The number of leading characters that are not considered in fuzziness. Default is `0`.
`rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`.
`transpositions` | Boolean | Specifies whether to allow transpositions of two adjacent characters (`ab` to `ba`) as edits. Default is `true`.
`transpositions` | Boolean | Specifies whether to allow transpositions of two adjacent characters (`ab` to `ba`) as edits. Default is `true`.

Specifying a large value in `max_expansions` can lead to poor performance, especially if `prefix_length` is set to `0`, because of the large number of variations of the word that OpenSearch tries to match.
{: .warning}
Expand Down
1 change: 1 addition & 0 deletions _query-dsl/term/ids.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ The query accepts the following parameter.
Parameter | Data type | Description
:--- | :--- | :---
`values` | Array of strings | The document IDs to search for. Required.
`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
5 changes: 3 additions & 2 deletions _query-dsl/term/prefix.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ GET _search
"prefix": {
"<field>": {
"value": "sample",
...
...
}
}
}
Expand All @@ -63,8 +63,9 @@ The `<field>` accepts the following parameters. All parameters except `value` ar
Parameter | Data type | Description
:--- | :--- | :---
`value` | String | The term to search for in the field specified in `<field>`.
`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
`case_insensitive` | Boolean | If `true`, allows case-insensitive matching of the value with the indexed field values. Default is `false` (case sensitivity is determined by the field's mapping).
`rewrite` | String | Determines how OpenSearch rewrites and scores multi-term queries. Valid values are `constant_score`, `scoring_boolean`, `constant_score_boolean`, `top_terms_N`, `top_terms_boost_N`, and `top_terms_blended_freqs_N`. Default is `constant_score`.

If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, prefix queries are not run. If `index_prefixes` is enabled, the `search.allow_expensive_queries` setting is ignored and an optimized query is built and run.
{: .important}
{: .important}
14 changes: 7 additions & 7 deletions _query-dsl/term/range.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ OpenSearch populates missing date components with the following values:
- `SECOND_OF_MINUTE`: `59`
- `NANO_OF_SECOND`: `999_999_999`

If the year is missing, it is not populated.
If the year is missing, it is not populated.

For example, consider the following request that specifies only the year in the start date:

Expand Down Expand Up @@ -131,7 +131,7 @@ GET products/_search
```
{% include copy-curl.html %}

In the preceding example, `2019/01/01` is the anchor date (the starting point) for the date math. After the two pipe characters (`||`), you are specifying a mathematical expression relative to the anchor date. In this example, you are subtracting 1 year (`-1y`) and 1 day (`-1d`).
In the preceding example, `2019/01/01` is the anchor date (the starting point) for the date math. After the two pipe characters (`||`), you are specifying a mathematical expression relative to the anchor date. In this example, you are subtracting 1 year (`-1y`) and 1 day (`-1d`).

You can also round off dates by adding a forward slash to the date or time unit.

Expand Down Expand Up @@ -175,16 +175,16 @@ GET /products/_search
"query": {
"range": {
"created": {
"time_zone": "-04:00",
"gte": "2022-04-17T06:00:00"
"time_zone": "-04:00",
"gte": "2022-04-17T06:00:00"
}
}
}
}
```
{% include copy-curl.html %}

The `gte` parameter in the preceding query is converted to `2022-04-17T10:00:00 UTC`, which is the UTC equivalent of `2022-04-17T06:00:00-04:00`.
The `gte` parameter in the preceding query is converted to `2022-04-17T10:00:00 UTC`, which is the UTC equivalent of `2022-04-17T06:00:00-04:00`.

The `time_zone` parameter does not affect the `now` value because `now` always corresponds to the current system time in UTC.
{: .note}
Expand All @@ -200,7 +200,7 @@ GET _search
"range": {
"<field>": {
"gt": 10,
...
...
}
}
}
Expand All @@ -215,7 +215,7 @@ Parameter | Data type | Description
:--- | :--- | :---
`format` | String | A [format]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/#formats) for dates in this query. Default is the field's mapped format.
`relation` | String | Indicates how the range query matches values for [`range`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/range/) fields. Valid values are:<br> - `INTERSECTS` (default): Matches documents whose `range` field value intersects the range provided in the query. <br> - `CONTAINS`: Matches documents whose `range` field value contains the entire range provided in the query. <br> - `WITHIN`: Matches documents whose `range` field value is entirely within the range provided in the query.
`boost` | Floating-point | Boosts the query by the given multiplier. Useful for searches that contain more than one query. Values in the [0, 1) range decrease relevance, and values greater than 1 increase relevance. Default is `1`.
`boost` | Floating-point | A floating-point value that specifies the weight of this field toward the relevance score. Values above 1.0 increase the field’s relevance. Values between 0.0 and 1.0 decrease the field’s relevance. Default is 1.0.
`time_zone` | String | The time zone used to convert [`date`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/date/) values to UTC in the query. Valid values are ISO 8601 [UTC offsets](https://en.wikipedia.org/wiki/List_of_UTC_offsets) and [IANA time zone IDs](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). For more information, see [Time zone](#time-zone).

If [`search.allow_expensive_queries`]({{site.url}}{{site.baseurl}}/query-dsl/index/#expensive-queries) is set to `false`, range queries on [`text`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/text/) and [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) fields are not run.
Expand Down
Loading

0 comments on commit 1bd4e66

Please sign in to comment.