-
Notifications
You must be signed in to change notification settings - Fork 508
Commit
Signed-off-by: Bharathwaj G <[email protected]>
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
--- | ||
layout: default | ||
title: Star Tree | ||
nav_order: 61 | ||
has_children: false | ||
parent: Supported field types | ||
redirect_from: | ||
- /opensearch/supported-field-types/star-tree/ | ||
- /field-types/star-tree/ | ||
--- | ||
# Star Tree field type | ||
Check failure on line 11 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). | ||
{: .warning} | ||
|
||
Star Tree Index is a multi-field index that improves the performance of aggregations. | ||
Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested. | ||
|
||
OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. | ||
|
||
For more information, see [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/) | ||
|
||
## Prerequisites | ||
|
||
Before using star-tree field, be sure to satisfy the following prerequisites: | ||
|
||
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). | ||
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||
- Set the `index.composite_index` index setting to `true` during index creation. | ||
- Enable `doc_values` : Ensure that the `doc_values` is enabled for the dimensions and metrics fields used in your star-tree mapping. | ||
|
||
|
||
## Examples | ||
|
||
The following examples show how to use star-tree index. | ||
|
||
### Star Tree index mapping | ||
Check failure on line 37 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
Define star-tree mapping under new section `composite` in `mappings`. <br/> | ||
To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings: | ||
|
||
```json | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index.number_of_shards": 1, | ||
"index.number_of_replicas": 0, | ||
"index.composite_index": true | ||
}, | ||
"mappings": { | ||
"composite": { | ||
"startree1": { | ||
"type": "star_tree", | ||
"config": { | ||
"max_leaf_docs": 10000, | ||
"skip_star_node_creation_for_dimensions": [ | ||
"port" | ||
], | ||
"ordered_dimensions": [ | ||
{ | ||
"name": "status" | ||
}, | ||
{ | ||
"name": "port" | ||
} | ||
], | ||
"metrics": [ | ||
{ | ||
"name": "request_size", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
], | ||
"name": "latency", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
}, | ||
"properties": { | ||
"status": { | ||
"type": "integer" | ||
}, | ||
"port": { | ||
"type": "integer" | ||
}, | ||
"request_size": { | ||
"type": "integer" | ||
}, | ||
"latency": { | ||
"type": "scaled_float", | ||
"scaling_factor": 10 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
In the above example, for `startree1` , we will create an associated Star Tree index. Currently only `one` star-tree index can be created per index with support for multiple star-trees coming in future. <br/> | ||
Check warning on line 106 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
## Star Tree mapping parameters | ||
Check failure on line 108 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
Specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents. | ||
|
||
### Ordered dimensions | ||
The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star Tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration. | ||
- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. | ||
- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely. | ||
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. | ||
- Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases. | ||
- Minimum of `2` and upto maximum of `10` dimensions are supported per Star Tree index. | ||
Check failure on line 117 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
#### Properties | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. | ||
|
||
### Metrics | ||
Configure fields for which you need to perform aggregations. This is required property as part of star-tree configuration. | ||
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. | ||
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`. | ||
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest are base metrics which are indexed. | ||
- Upto `100` base metrics are supported per Star Tree index. | ||
|
||
#### Properties | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. | ||
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`. | ||
|
||
### Star Tree configuration parameters | ||
Check failure on line 139 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
Following are additional optional parameters that can be configured alongside star-tree index. | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and the other way around when increasing the value. | ||
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. | ||
|
||
## Supported queries and aggregations | ||
For more details on supported queries and aggregations, see [supported query and aggregations for Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
--- | ||
layout: default | ||
title: Star Tree index | ||
parent: Improving search performance | ||
nav_order: 54 | ||
--- | ||
|
||
# Star Tree index | ||
Check failure on line 8 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
|
||
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). | ||
{: .warning} | ||
|
||
Star Tree Index is a multi-field index that improves the performance of aggregations. | ||
|
||
OpenSearch will use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. | ||
|
||
## Star Tree index structure | ||
Check failure on line 17 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
|
||
<img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A Star Tree index containing two dimensions and two metrics" width="700"> | ||
|
||
Star Tree index structure as portrayed in the above figure, consists of mainly two parts: Star Tree and sorted and aggregated star-tree documents backed by doc-values indices. | ||
Check warning on line 21 in _search-plugins/star-tree-index.md GitHub Actions / style-job
Check failure on line 21 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
|
||
Each node in the Star Tree points to a range of star-tree documents. | ||
A node is further split into child nodes based on maxLeafDocs configuration. | ||
The number of documents a leaf node points to is than or equal to maxLeafDocs. This ensures the maximum number of documents that gets traversed to get to the aggregated value is at most maxLeafDocs, thus providing predictable latencies. | ||
|
||
There are special nodes called `star nodes (*)` which helps in skipping non-competitive nodes and also in fetching aggregated document wherever applicable during query time. | ||
|
||
The figure contains three examples explaining the Star Tree traversal during query: | ||
- Compute average request size aggregation with Terms query where port equals 8443 and status equals 200 (Support for Terms query will be added in upcoming release) | ||
- Compute count of requests aggregation with Term query where status equals 200 (query traverses via * node of `port` dimension since `port` is not present as part of query) | ||
Check warning on line 31 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
- Compute average request size aggregation with Term query where port equals 5600 (query traverses via * node of `status` dimension since `status` is not present as part of query). | ||
Check warning on line 32 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
<br/>The second and third examples uses star nodes. | ||
|
||
|
||
## When to use Star Tree index | ||
Check failure on line 36 in _search-plugins/star-tree-index.md GitHub Actions / style-job
|
||
You can be use Star Tree index to perform faster aggregations with a constant upper bound on query latency. | ||
- Star Tree natively supports multi field aggregations | ||
- Star Tree index will be created in real time as part of regular indexing, so the data in Star Tree will always be up to date with the live data. | ||
- Star Tree index consolidates the data and hence is a storage efficient index which results in efficient paging and fraction of IO utilization for search queries. | ||
|
||
## Considerations | ||
- Star Tree index ideally should be used with append-only indices, as updates or deletes are not accounted in Star Tree index. | ||
- Star Tree index will be used for aggregation queries only if the query input is a subset of the Star Tree configuration of dimensions and metrics | ||
- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index. | ||
- Changing Star Tree configuration will also require a re-index operation. | ||
- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported | ||
- Only [limited queries and aggregations](#supported-query-and-aggregations) are supported with support for more coming in future | ||
- The cardinality of the dimensions should not be very high (like _id fields), otherwise it leads to storage explosion and higher query latencies. | ||
|
||
## Enabling Star Tree index | ||
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). | ||
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||
- Set the `index.composite_index` index setting to `true` during index creation. | ||
|
||
## Examples | ||
|
||
The following examples show how to use star-tree index. | ||
|
||
### Defining Star Tree index in mappings | ||
|
||
Define star-tree configuration in index mappings when creating an index. <br/> | ||
To create star-tree index to pre-compute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping: | ||
|
||
```json | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index.number_of_shards": 1, | ||
"index.number_of_replicas": 0, | ||
"index.composite_index": true | ||
}, | ||
"mappings": { | ||
"composite": { | ||
"startree1": { | ||
"type": "star_tree", | ||
"config": { | ||
"ordered_dimensions": [ | ||
{ | ||
"name": "status" | ||
}, | ||
{ | ||
"name": "port" | ||
} | ||
], | ||
"metrics": [ | ||
{ | ||
"name": "request_size", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
], | ||
"name": "latency", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
}, | ||
"properties": { | ||
"status": { | ||
"type": "integer" | ||
}, | ||
"port": { | ||
"type": "integer" | ||
}, | ||
"request_size": { | ||
"type": "integer" | ||
}, | ||
"latency": { | ||
"type": "scaled_float", | ||
"scaling_factor": 10 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
For detailed information about Star Tree index mapping and parameters see [Star Tree field type]({{site.url}}{{site.baseurl}}/field-types/star-tree/). | ||
|
||
## Supported query and aggregations | ||
|
||
Star Tree index can be used to optimize aggregations for selected set of queries with support for more coming in upcoming releases. | ||
|
||
### Supported queries | ||
Ensure the following in star tree index mapping, | ||
- The fields present in the query must be present as part of `ordered_dimensions` as part of star-tree configuration. | ||
|
||
The following queries are supported [ when supported aggregations are specified ] <br/> | ||
|
||
- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/) | ||
- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/) | ||
|
||
### Supported aggregations | ||
Ensure the following in star tree index mapping, | ||
- The fields present in the aggregation must be present as part of `metrics` as part of star-tree configuration. | ||
- The metric aggregation type must be part of `stats` parameter. | ||
|
||
Following metric aggregations are supported. | ||
- [Sum](https://opensearch.org/docs/latest/aggregations/metric/sum/) | ||
- [Minimum](https://opensearch.org/docs/latest/aggregations/metric/minimum/) | ||
- [Maximum](https://opensearch.org/docs/latest/aggregations/metric/maximum/) | ||
- [Value count](https://opensearch.org/docs/latest/aggregations/metric/value-count/) | ||
- [Average](https://opensearch.org/docs/latest/aggregations/metric/average/) | ||
|
||
### Examples | ||
To get sum of `request_size` for all error logs with `status=500` with the [example mapping](#defining-star-tree-index-in-mappings) : | ||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"term": { | ||
"status": "500" | ||
} | ||
}, | ||
"aggs": { | ||
"sum_request_size": { | ||
"sum": { | ||
"field": "request_size" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
This query will get optimized automatically as star-tree index will be used. | ||
|
||
You can set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using star-tree index. |