-
Notifications
You must be signed in to change notification settings - Fork 504
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding documentation for star tree index feature
Signed-off-by: Bharathwaj G <[email protected]>
- Loading branch information
1 parent
9bad864
commit dee4bee
Showing
2 changed files
with
203 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,202 @@ | ||
--- | ||
layout: default | ||
title: star-tree | ||
nav_order: 61 | ||
has_children: false | ||
parent: Supported field types | ||
redirect_from: | ||
- /opensearch/supported-field-types/star-tree/ | ||
- /field-types/star-tree/ | ||
--- | ||
# Star tree field type | ||
|
||
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). | ||
{: .warning} | ||
|
||
Star tree Index is a multi-field index that improves the performance of aggregations. | ||
Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested. | ||
|
||
OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. | ||
|
||
## When to use star tree | ||
Currently, performance of aggregations scales linearly with the number of documents. This is applicable for all aggregation queries where we have to visit doc values to retrieve the results. | ||
|
||
Star tree index provides predictable latency to all queries irrespective of underlying documents since it stores the precomputed aggregations. | ||
|
||
Star tree works well for append only use cases such as time series data / data streams etc. | ||
Check failure on line 26 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
Check failure on line 26 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
Check warning on line 26 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
Star tree index consolidates the data and hence is a storage efficient index which helps in efficient paging and fraction of IO utilization for search queries. However, there is an impact in indexing performance based on the cardinality of the dimensions and the number of metric fields. | ||
|
||
## Prerequisites | ||
|
||
Before using star-tree field, be sure to satisfy the following prerequisites: | ||
|
||
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). | ||
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||
- **Enable `doc_values`**: Ensure that the doc values is enabled for the dimensions and metrics fields used in your star-tree mapping. | ||
|
||
## Limitations | ||
|
||
Currently, star-tree index have the following limitations: | ||
|
||
- Document deletions and updates are not accounted in star-tree index and hence query results will be inaccurate for segments with deleted documents | ||
- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index. | ||
- Multi-values fields are not supported | ||
- Only limited queries and aggregations are supported with support for more coming in future | ||
|
||
|
||
## Examples | ||
|
||
The following examples show how to use star-tree index. | ||
|
||
### Defining star tree index in mappings | ||
|
||
Define star-tree mapping under new section 'composite' in 'mappings'. <br/> | ||
To create star-tree index to precompute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping: | ||
Check failure on line 55 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
|
||
|
||
```json | ||
PUT logs | ||
{ | ||
"settings": { | ||
"index.number_of_shards": 1, | ||
"index.number_of_replicas": 0, | ||
"index.composite_index": true | ||
}, | ||
"mappings": { | ||
"composite": { | ||
"startree1": { | ||
"type": "star_tree", | ||
"config": { | ||
"max_leaf_docs": 10000, | ||
"skip_star_node_creation_for_dimensions": [ | ||
"port" | ||
], | ||
"ordered_dimensions": [ | ||
{ | ||
"name": "status" | ||
}, | ||
{ | ||
"name": "port" | ||
} | ||
], | ||
"metrics": [ | ||
{ | ||
"name": "request_size", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
], | ||
"name": "latency", | ||
"stats": [ | ||
"sum", | ||
"value_count", | ||
"min", | ||
"max" | ||
] | ||
} | ||
] | ||
} | ||
} | ||
}, | ||
"properties": { | ||
"status": { | ||
"type": "integer" | ||
}, | ||
"port": { | ||
"type": "integer" | ||
}, | ||
"request_size": { | ||
"type": "integer" | ||
}, | ||
"latency": { | ||
"type": "scaled_float", | ||
"scaling_factor": 10 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Star tree mapping parameters | ||
You must specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents. | ||
|
||
### Ordered dimensions | ||
The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration. | ||
- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. | ||
- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely. | ||
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. | ||
- Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases. | ||
|
||
#### Properties | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. | ||
|
||
### Metrics | ||
You can define fields for which you need to perform aggregations. This is required property as part of star-tree configuration. | ||
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. | ||
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`. | ||
|
||
#### Properties | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. | ||
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`. | ||
|
||
### Star tree configuration parameters | ||
Following are additional optional parameters that can be configured alongside star-tree index. | ||
|
||
| Parameter | Required/Optional | Description | | ||
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and vice versa. | ||
Check warning on line 156 in _field-types/supported-field-types/star-tree.md GitHub Actions / style-job
|
||
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. | ||
|
||
## Supported query and aggregations | ||
|
||
Star tree index is used to optimize aggregations for selected set of queries with support for more coming in upcoming releases. | ||
|
||
### Supported queries | ||
The fields present in the query must also be present as part of `ordered_dimensions` as part of star-tree configuration. | ||
|
||
The following queries are supported [ when supported aggregations are specified ] <br/> | ||
|
||
- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/) | ||
- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/) | ||
|
||
### Supported aggregations | ||
The fields present in the aggregation must also be present as part of `metrics` as part of star-tree configuration. | ||
And also the aggregation must be part of `stats` parameter. | ||
|
||
Following metric aggregations are supported. | ||
- SUM | ||
- MIN | ||
- MAX | ||
- VALUE COUNT | ||
- AVG | ||
|
||
### Examples | ||
To get sum of `request_size` for all error logs with `status=500` with the example mapping : | ||
```json | ||
POST /logs/_search | ||
{ | ||
"query": { | ||
"term": { | ||
"status": "500" | ||
} | ||
}, | ||
"aggs": { | ||
"sum_request_size": { | ||
"sum": { | ||
"field": "request_size" | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
This query will get optimized automatically as star-tree index will be used. |