-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add boxplot documentation to close aggregations content gaps #7168
Changes from 6 commits
c77bc01
40d8229
cf65bee
0902037
5dd0e9f
8452b64
ecc7a88
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
--- | ||
layout: default | ||
title: Boxplot | ||
parent: Metric aggregations | ||
grand_parent: Aggregations | ||
nav_order: 15 | ||
--- | ||
|
||
# Boxplot | ||
Check failure on line 9 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L9
Raw output
|
||
|
||
A boxplot aggregation calculates the statistical distribution of a numeric field. It provides summary of the data, including the following key statistics: minimum value, first quartile, median, third quartile, and maximum value. | ||
Check failure on line 11 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L11
Raw output
|
||
|
||
## Syntax | ||
|
||
The basic syntax for the boxplot aggregation is as follows: | ||
Check failure on line 15 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L15
Raw output
|
||
|
||
```json | ||
{ | ||
"aggs": { | ||
"boxplot_agg_name": { | ||
"boxplot": { | ||
"field": "numeric_field" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Replace `boxplot_agg_name` with a descriptive name for your aggregation and `numeric_field` with the name of the numeric field you want to analyze. | ||
|
||
## Example use case | ||
|
||
Let's say you have a dataset of website load times, and you want to analyze their distribution using the boxplot aggregation. Here's an example query: | ||
Check failure on line 34 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L34
Raw output
|
||
|
||
```json | ||
GET website_logs/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"load_time_boxplot": { | ||
"boxplot": { | ||
"field": "load_time_ms" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
This query returns a response similar to the following: | ||
|
||
```json | ||
{ | ||
"aggregations": { | ||
"load_time_boxplot": { | ||
"min": 100.0, | ||
"max": 5000.0, | ||
"q1": 500.0, | ||
"q2": 1000.0, | ||
"q3": 2000.0 | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Advanced options | ||
|
||
The boxplot aggregation in OpenSearch offers several advanced options to customize its behavior: | ||
Check failure on line 70 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L70
Raw output
|
||
|
||
- Scripting: You can use scripts to transform or calculate values on-the-fly, allowing for more complex data processing. | ||
- Compression: By adjusting the compression parameter, you can control the trade-off between memory usage and approximation accuracy. | ||
- Missing value handling: You can specify how to treat documents with missing values in the target field. | ||
|
||
These advanced options provide more control over the boxplot aggregation, allowing you to handle complex scenarios and tailor the analysis to your specific requirements. | ||
Check failure on line 76 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L76
Raw output
|
||
|
||
### Scripting | ||
|
||
You can use the `script` parameter to perform custom calculations or transformations on the fly, for example, to analyze the square root of a numeric field. | ||
|
||
#### Example request | ||
|
||
```json | ||
GET website_logs/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"load_time_boxplot": { | ||
"boxplot": { | ||
"script": { | ||
"source": "Math.sqrt(doc['load_time_ms'].value)" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
### Compression | ||
|
||
The `compression` parameter controls the memory usage and accuracy trade-off for the boxplot calculation. A lower value provides better accuracy at the cost of higher memory usage, while a higher value reduces memory usage but may result in approximations. The default value is `3000`. | ||
Check failure on line 103 in _aggregations/metric/boxplot.md GitHub Actions / vale[vale] _aggregations/metric/boxplot.md#L103
Raw output
|
||
|
||
#### Example request | ||
|
||
``` | ||
GET website_logs/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"load_time_boxplot": { | ||
"boxplot": { | ||
"field": "load_time_ms", | ||
"compression": 5000 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
### Missing value handling | ||
|
||
By default, documents with missing values in the `target_field` field are ignored. However, you can specify how to handle them using the missing parameter: | ||
|
||
- `missing`: Treat missing values as if they were specified explicitly. | ||
- `missing_inv`: Treat missing values as if they were infinite values. | ||
- `missing_neg_value`: Treat missing values as if they had a specified negative value. | ||
- `missing_pos_value`: Treat missing values as if they had a specified positive value. | ||
|
||
#### Example request | ||
|
||
```json | ||
GET website_logs/_search | ||
{ | ||
"size": 0, | ||
"aggs": { | ||
"load_time_boxplot": { | ||
"boxplot": { | ||
"field": "load_time_ms", | ||
"missing": 0 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
In this example, missing values in the load_time_ms field will be treated as if they were zeros. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Developer: Please verify that the use case examples are relevant to the user and are accurate. If any changes are needed, please provide an updated example request. Thank you.