Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
vagimeli authored Jun 6, 2024
2 parents f8ef118 + 6e2f1d4 commit d498755
Show file tree
Hide file tree
Showing 49 changed files with 5,323 additions and 118 deletions.
2 changes: 2 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ _Describe what this change achieves._
### Issues Resolved
_List any issues this PR will resolve, e.g. Closes [...]._

### Version
_List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._

### Checklist
- [ ] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the [Developers Certificate of Origin](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).
Expand Down
1 change: 1 addition & 0 deletions .github/vale/styles/Vocab/OpenSearch/Products/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Amazon SageMaker
Ansible
Auditbeat
AWS Cloud
Cohere Command
Cognito
Dashboards Query Language
Data Prepper
Expand Down
80 changes: 41 additions & 39 deletions STYLE_GUIDE.md

Large diffs are not rendered by default.

76 changes: 38 additions & 38 deletions TERMS.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _about/version-history.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ OpenSearch version | Release highlights | Release date
[2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022
[2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022
[2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022
[1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024
[1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024
[1.3.15](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.15.md) | Includes bug fixes and maintenance updates for cross-cluster replication, SQL, OpenSearch Dashboards reporting, and alerting. | 05 March 2024
[1.3.14](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.14.md) | Includes bug fixes and maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 12 December 2023
Expand Down
158 changes: 158 additions & 0 deletions _aggregations/metric/median-absolute-deviation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
layout: default
title: Median absolute deviation
parent: Metric aggregations
grand_parent: Aggregations
nav_order: 65
redirect_from:
- /query-dsl/aggregations/metric/median-absolute-deviation/
---

# Median absolute deviation aggregations

The `median_absolute_deviation` metric is a single-value metric aggregation that returns a median absolute deviation field. Median absolute deviation is a statistical measure of data variability. Because the median absolute deviation measures dispersion from the median, it provides a more robust measure of variability that is less affected by outliers in a dataset.

Median absolute deviation is calculated as follows:<br>
median_absolute_deviation = median(|X<sub>i</sub> - Median(X<sub>i</sub>)|)

The following example calculates the median absolute deviation of the `DistanceMiles` field in the sample dataset `opensearch_dashboards_sample_data_flights`:


```json
GET opensearch_dashboards_sample_data_flights/_search
{
"size": 0,
"aggs": {
"median_absolute_deviation_DistanceMiles": {
"median_absolute_deviation": {
"field": "DistanceMiles"
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"median_absolute_deviation_distanceMiles": {
"value": 1829.8993624441966
}
}
}
```

### Missing

By default, if a field is missing or has a null value in a document, it is ignored during computation. However, you can specify a value to be used for those missing or null fields by using the `missing` parameter, as shown in the following request:

```json
GET opensearch_dashboards_sample_data_flights/_search
{
"size": 0,
"aggs": {
"median_absolute_deviation_distanceMiles": {
"median_absolute_deviation": {
"field": "DistanceMiles",
"missing": 1000
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"median_absolute_deviation_distanceMiles": {
"value": 1829.6443646143355
}
}
}
```

### Compression

The median absolute deviation is calculated using the [t-digest](https://github.com/tdunning/t-digest/tree/main) data structure, which balances between performance and estimation accuracy through the `compression` parameter (default value: `1000`). Adjusting the `compression` value affects the trade-off between computational efficiency and precision. Lower `compression` values improve performance but may reduce estimation accuracy, while higher values enhance accuracy at the cost of increased computational overhead, as shown in the following request:

```json
GET opensearch_dashboards_sample_data_flights/_search
{
"size": 0,
"aggs": {
"median_absolute_deviation_DistanceMiles": {
"median_absolute_deviation": {
"field": "DistanceMiles",
"compression": 10
}
}
}
}
```
{% include copy-curl.html %}

#### Example response

```json
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"median_absolute_deviation_DistanceMiles": {
"value": 1836.265614211182
}
}
}
```
23 changes: 23 additions & 0 deletions _api-reference/nodes-apis/nodes-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ script_cache | Statistics about script cache.
indexing_pressure | Statistics about the node's indexing pressure.
shard_indexing_pressure | Statistics about shard indexing pressure.
search_backpressure | Statistics related to search backpressure.
cluster_manager_throttling | Statistics related to throttled tasks on the cluster manager node.
weighted_routing | Statistics relevant to weighted round robin requests.
resource_usage_stats | Node-level resource usage statistics, such as CPU and JVM memory.
admission_control | Statistics about admission control.
caches | Statistics about caches.
Expand Down Expand Up @@ -832,6 +834,8 @@ http.total_opened | Integer | The total number of HTTP connections the node has
[indexing_pressure](#indexing_pressure) | Object | Statistics related to the node's indexing pressure.
[shard_indexing_pressure](#shard_indexing_pressure) | Object | Statistics related to indexing pressure at the shard level.
[search_backpressure]({{site.url}}{{site.baseurl}}/opensearch/search-backpressure#search-backpressure-stats-api) | Object | Statistics related to search backpressure.
[cluster_manager_throttling](#cluster_manager_throttling) | Object | Statistics related to throttled tasks on the cluster manager node.
[weighted_routing](#weighted_routing) | Object | Statistics relevant to weighted round robin requests.
[resource_usage_stats](#resource_usage_stats) | Object | Statistics related to resource usage for the node.
[admission_control](#admission_control) | Object | Statistics related to admission control for the node.
[caches](#caches) | Object | Statistics related to caches on the node.
Expand Down Expand Up @@ -1282,6 +1286,25 @@ total_rejections_breakup_shadow_mode.throughput_degradation_limits | Integer | T
enabled | Boolean | Specifies whether the shard indexing pressure feature is turned on for the node.
enforced | Boolean | If true, the shard indexing pressure runs in enforced mode (there are rejections). If false, the shard indexing pressure runs in shadow mode (there are no rejections, but statistics are recorded and can be retrieved in the `total_rejections_breakup_shadow_mode` object). Only applicable if shard indexing pressure is enabled.

### `cluster_manager_throttling`

The `cluster_manager_throttling` object contains statistics about throttled tasks on the cluster manager node. It is populated only for the node that is currently elected as the cluster manager.

Field | Field type | Description
:--- | :--- | :---
stats | Object | Statistics about throttled tasks on the cluster manager node.
stats.total_throttled_tasks | Long | The total number of throttled tasks.
stats.throttled_tasks_per_task_type | Object | A breakdown of statistics by individual task type, specified as key-value pairs. The keys are individual task types, and their values represent the number of requests that were throttled.

### `weighted_routing`

The `weighted_routing` object contains statistics about weighted round robin requests. Specifically, it contains a counter of times this node has server a request while it was "zoned out".

Field | Field type | Description
:--- |:-----------| :---
stats | Object | Statistics about weighted routing.
fail_open_count | Integer | Number of times a shard on this node has served a request while the routing weight for the node was set to zero.

### `resource_usage_stats`

The `resource_usage_stats` object contains the resource usage statistics. Each entry is specified by the node ID and has the following properties.
Expand Down
2 changes: 1 addition & 1 deletion _benchmark/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ To perform the Quickstart steps, you'll need to fulfill the following prerequisi

## Set up an OpenSearch cluster

If you don't already have an active OpenSearch cluster, you can launch a new OpenSearch cluster to use with OpenSerch Benchmark.
If you don't already have an active OpenSearch cluster, you can launch a new OpenSearch cluster to use with OpenSearch Benchmark.

- Using **Docker Compose**. For instructions on how to use Docker Compose, see [OpenSearch Quickstart]({{site.url}}{{site.baseurl}}/quickstart/).
- Using **Tar**. For instructions on how to install OpenSearch with Tar, see [Installing OpenSearch > Tarball]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/tar#step-1-download-and-unpack-opensearch).
Expand Down
1 change: 1 addition & 0 deletions _dashboards/dashboards-assistant/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,3 +122,4 @@ The following screenshot shows a saved conversation, along with actions you can

- [Getting started guide for OpenSearch Assistant in OpenSearch Dashboards](https://github.com/opensearch-project/dashboards-assistant/blob/main/GETTING_STARTED_GUIDE.md)
- [OpenSearch Assistant configuration through the REST API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/opensearch-assistant/)
- [Build your own chatbot]({{site.url}}{{site.baseurl}}/ml-commons-plugin/tutorials/build-chatbot/)
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,6 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config

[Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the [`avro` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/s3#avro-codec) inside an `s3` sink.

## `event_json`

The `event_json` output codec converts event data and metadata into JSON format to send to a sink, such as an S3 sink. The `event_json` input codec reads the event and its metadata to create an event in Data Prepper.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Option | Required | Type | Description
identification_keys | Yes | List | An unordered list by which to group events. Events with the same values as these keys are put into the same group. If an event does not contain one of the `identification_keys`, then the value of that key is considered to be equal to `null`. At least one identification_key is required (for example, `["sourceIp", "destinationIp", "port"]`).
action | Yes | AggregateAction | The action to be performed on each group. One of the [available aggregate actions](#available-aggregate-actions) must be provided, or you can create custom aggregate actions. `remove_duplicates` and `put_all` are the available actions. For more information, see [Creating New Aggregate Actions](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#creating-new-aggregate-actions).
group_duration | No | String | The amount of time that a group should exist before it is concluded automatically. Supports ISO_8601 notation strings ("PT20.345S", "PT15M", etc.) as well as simple notation for seconds (`"60s"`) and milliseconds (`"1500ms"`). Default value is `180s`.
local_mode | No | Boolean | When `local_mode` is set to `true`, the aggregation is performed locally on each Data Prepper node instead of forwarding events to a specific node based on the `identification_keys` using a hash function. Default is `false`.

## Available aggregate actions

Expand Down Expand Up @@ -176,4 +177,4 @@ The `aggregate` processor includes the following custom metrics.

**Gauge**

* `currentAggregateGroups`: The current number of groups. This gauge decreases when a group concludes and increases when an event initiates the creation of a new group.
* `currentAggregateGroups`: This gauge represents the current number of active aggregate groups. It decreases when an aggregate group is completed and its results are emitted and increases when a new event initiates the creation of a new aggregate group.
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ You can use the `key_value` processor to parse the specified field into key-valu
| recursive | Specifies whether to recursively obtain additional key-value pairs from values. The extra key-value pairs will be stored as sub-keys of the root key. Default is `false`. The levels of recursive parsing must be defined by different brackets for each level: `[]`, `()`, and `<>`, in this order. Any other configurations specified will only be applied to the outmost keys. <br />When `recursive` is `true`: <br /> `remove_brackets` cannot also be `true`;<br />`skip_duplicate_values` will always be `true`; <br />`whitespace` will always be `"strict"`. | If `recursive` is true, `{"item1=[item1-subitem1=item1-subitem1-value&item1-subitem2=(item1-subitem2-subitem2A=item1-subitem2-subitem2A-value&item1-subitem2-subitem2B=item1-subitem2-subitem2B-value)]&item2=item2-value"}` will parse into `{"item1": {"item1-subitem1": "item1-subitem1-value", "item1-subitem2" {"item1-subitem2-subitem2A": "item1-subitem2-subitem2A-value", "item1-subitem2-subitem2B": "item1-subitem2-subitem2B-value"}}}`. |
| overwrite_if_destination_exists | Specifies whether to overwrite existing fields if there are key conflicts when writing parsed fields to the event. Default is `true`. | If `overwrite_if_destination_exists` is `true` and destination is `null`, `{"key1": "old_value", "message": "key1=new_value"}` will parse into `{"key1": "new_value", "message": "key1=new_value"}`. |
| tags_on_failure | When a `kv` operation causes a runtime exception within the processor, the operation is safely stopped without crashing the processor, and the event is tagged with the provided tags. | If `tags_on_failure` is set to `["keyvalueprocessor_failure"]`, `{"tags": ["keyvalueprocessor_failure"]}` will be added to the event's metadata in the event of a runtime exception. |
| value_grouping | Specifies whether to group values using predefined value grouping delimiters: `{...}`, `[...]', `<...>`, `(...)`, `"..."`, `'...'`, `http://... (space)`, and `https:// (space)`. If this flag is enabled, then the content between the delimiters is considered to be one entity and is not parsed for key-value pairs. Default is `false`. If `value_grouping` is `true`, then `{"key1=[a=b,c=d]&key2=value2"}` parses to `{"key1": "[a=b,c=d]", "key2": "value2"}`. |
| drop_keys_with_no_value | Specifies whether keys should be dropped if they have a null value. Default is `false`. If `drop_keys_with_no_value` is set to `true`, then `{"key1=value1&key2"}` parses to `{"key1": "value1"}`. |



Expand All @@ -42,4 +44,4 @@ Content will be added to this section.
## Metrics
Content will be added to this section. --->
Content will be added to this section. --->
18 changes: 18 additions & 0 deletions _data-prepper/pipelines/configuration/processors/write_json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
layout: default
title: write_json
parent: Processors
grand_parent: Pipelines
nav_order: 56
---

# write_json


The `write_json` processor converts an object in an event into a JSON string. You can customize the processor to choose the source and target field names.

| Option | Description | Example |
| :--- | :--- | :--- |
| source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}`, then the `write_json` processor generates `{"message": "{\"key1\":\"value`\", \"key2\":"{\"key3\":\"value3\"}"}"`.
| target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used.

2 changes: 1 addition & 1 deletion _data-prepper/pipelines/configuration/sources/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ Option | Required | Type | Description
`s3_select` | No | [s3_select](#s3_select) | The Amazon S3 Select configuration.
`scan` | No | [scan](#scan) | The S3 scan configuration.
`delete_s3_objects_on_read` | No | Boolean | When `true`, the S3 scan attempts to delete S3 objects after all events from the S3 object are successfully acknowledged by all sinks. `acknowledgments` should be enabled when deleting S3 objects. Default is `false`.
`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. Leaving this value at the default unless your S3 objects are less than 1MB. Performance may decrease for larger S3 objects. This setting only affects SQS-based sources. Default is `1`.
`workers` | No | Integer | Configures the number of worker threads that the source uses to read data from S3. Leave this value as the default unless your S3 objects are less than 1 MB in size. Performance may decrease for larger S3 objects. This setting affects SQS-based sources and S3-Scan sources. Default is `1`.



Expand Down
19 changes: 18 additions & 1 deletion _data-prepper/pipelines/expression-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,23 @@ null != <JSON Pointer>
null != /response
```

#### Conditional expression
## Type check operator

The type check operator tests whether a JSON Pointer is of a certain data type.

### Syntax
```
<JSON Pointer> typeof <DataType>
```
Supported data types are `integer`, `long`, `boolean`, `double`, `string`, `map`, and `array`.

#### Example
```
/response typeof integer
/message typeof string
```

### Conditional expression

A conditional expression is used to chain together multiple expressions and/or values.

Expand Down Expand Up @@ -218,6 +234,7 @@ White space is **required** surrounding set initializers, priority expressions,
| `==`, `!=` | Equality operators | No | `/status == 200`<br>`/status_code==200` | |
| `and`, `or`, `not` | Conditional operators | Yes | `/a<300 and /b>200` | `/b<300and/b>200` |
| `,` | Set value delimiter | No | `/a in {200, 202}`<br>`/a in {200,202}`<br>`/a in {200 , 202}` | `/a in {200,}` |
| `typeof` | Type check operator | Yes | `/a typeof integer`<br>`/a typeof long`<br>`/a typeof string`<br> `/a typeof double`<br> `/a typeof boolean`<br>`/a typeof map`<br>`/a typeof array` |`/a typeof /b`<br>`/a typeof 2` |


## Functions
Expand Down
Loading

0 comments on commit d498755

Please sign in to comment.