From 1a7be5d6ec4cfb87cbb6b79b13be5272963c7899 Mon Sep 17 00:00:00 2001 From: Bharathwaj G Date: Tue, 22 Oct 2024 20:47:17 +0530 Subject: [PATCH] Adding documentation for star tree index feature Signed-off-by: Bharathwaj G --- _field-types/supported-field-types/index.md | 2 +- .../supported-field-types/star-tree.md | 202 ++++++++++++++++++ 2 files changed, 203 insertions(+), 1 deletion(-) create mode 100644 _field-types/supported-field-types/star-tree.md diff --git a/_field-types/supported-field-types/index.md b/_field-types/supported-field-types/index.md index a43da396d52..c2251683d65 100644 --- a/_field-types/supported-field-types/index.md +++ b/_field-types/supported-field-types/index.md @@ -30,7 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/): k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search. Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query. Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields. - +Star tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Allows creating materialized views by precomputing aggregations during indexing based on user-provided configuration to accelerate performance of aggregations. ## Arrays There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field. All values in the array must have the same field type. diff --git a/_field-types/supported-field-types/star-tree.md b/_field-types/supported-field-types/star-tree.md new file mode 100644 index 00000000000..ab413658efe --- /dev/null +++ b/_field-types/supported-field-types/star-tree.md @@ -0,0 +1,202 @@ +--- +layout: default +title: star-tree +nav_order: 61 +has_children: false +parent: Supported field types +redirect_from: + - /opensearch/supported-field-types/star-tree/ + - /field-types/star-tree/ +--- +# star-tree field type + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +Star tree Index is a multi-field index that improves the performance of aggregations. +Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested. + +OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests. + +## When to use star-tree +Currently, performance of aggregations scales linearly with the number of documents. This is applicable for all aggregation queries where we have to visit doc values to retrieve the results. + +Star tree index provides predictable latency to all queries irrespective of underlying documents since it stores the precomputed aggregations. + +Star tree works well for append only use cases such as time series data / data streams etc. + +Star tree index consolidates the data and hence is a storage efficient index which helps in efficient paging and fraction of IO utilization for search queries. However, there is an impact in indexing performance based on the cardinality of the dimensions and the number of metric fields. + +## Prerequisites + +Before using star-tree field, be sure to satisfy the following prerequisites: + +- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). +- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). +- **Enable `doc_values`**: Ensure that the doc values is enabled for the dimensions and metrics fields used in your star-tree mapping. + +## Limitations + +Currently, star-tree index have the following limitations: + +- Document deletions and updates are not accounted in star-tree index and hence query results will be inaccurate for segments with deleted documents +- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index. +- Multi-values fields are not supported +- Only limited queries and aggregations are supported with support for more coming in future + + +## Examples + +The following examples show how to use star-tree index. + +### Defining star-tree index in mappings + +Define star-tree mapping under new section 'composite' in 'mappings'.
+To create star-tree index to precompute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping: + + +```json +PUT logs +{ + "settings": { + "index.number_of_shards": 1, + "index.number_of_replicas": 0, + "index.composite_index": true + }, + "mappings": { + "composite": { + "startree1": { + "type": "star_tree", + "config": { + "max_leaf_docs": 10000, + "skip_star_node_creation_for_dimensions": [ + "port" + ], + "ordered_dimensions": [ + { + "name": "status" + }, + { + "name": "port" + } + ], + "metrics": [ + { + "name": "request_size", + "stats": [ + "sum", + "value_count", + "min", + "max" + ], + "name": "latency", + "stats": [ + "sum", + "value_count", + "min", + "max" + ] + } + ] + } + } + }, + "properties": { + "status": { + "type": "integer" + }, + "port": { + "type": "integer" + }, + "request_size": { + "type": "integer" + }, + "latency": { + "type": "scaled_float", + "scaling_factor": 10 + } + } + } +} +``` + +## Star tree mapping parameters +You must specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents. + +### ordered_dimensions +The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. Hence this is a required property as part of star-tree configuration. +- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning. +- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely. +- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. + - Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases. + +#### Properties + +| Parameter | Required/Optional | Description | +|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. + +### metrics +You can define fields for which you need to perform aggregations. This is required property as part of star-tree configuration. +- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. +- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`. + +#### Properties + +| Parameter | Required/Optional | Description | +|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields. +| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.
Defaults are `Sum` and `Value_count`.
`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`. + +### star-tree configuration parameters +Following are additional optional parameters that can be configured alongside star-tree index. + +| Parameter | Required/Optional | Description | +|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and vice versa. +| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. + +## Supported query and aggregations + +Star tree index is used to optimize aggregations for selected set of queries with support for more coming in upcoming releases. + +### Supported queries +The fields present in the query must also be present as part of `ordered_dimensions` as part of star-tree configuration. + +The following queries are supported [ when supported aggregations are specified ]
+ +- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/) +- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/) + +### Supported aggregations +The fields present in the aggregation must also be present as part of `metrics` as part of star-tree configuration. +And also the aggregation must be part of `stats` parameter. + +Following metric aggregations are supported. +- SUM +- MIN +- MAX +- VALUE COUNT +- AVG + +### Examples +To get sum of `request_size` for all error logs with `status=500` with the example mapping : +```json +POST /logs/_search +{ + "query": { + "term": { + "status": "500" + } + }, + "aggs": { + "sum_request_size": { + "sum": { + "field": "request_size" + } + } + } +} +``` + +This query will get optimized automatically as star-tree index will be used. \ No newline at end of file