Skip to content

Commit

Permalink
Adding documentation for star tree index feature
Browse files Browse the repository at this point in the history
Signed-off-by: Bharathwaj G <[email protected]>
  • Loading branch information
bharath-techie committed Oct 22, 2024
1 parent 9bad864 commit dee4bee
Show file tree
Hide file tree
Showing 2 changed files with 203 additions and 1 deletion.
2 changes: 1 addition & 1 deletion _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.

Star tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Allows creating materialized views by pre-computing aggregations during indexing based on user-provided configuration to accelerate performance of aggregations.
## Arrays

There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field. All values in the array must have the same field type.
Expand Down
202 changes: 202 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
---
layout: default
title: star-tree
nav_order: 61
has_children: false
parent: Supported field types
redirect_from:
- /opensearch/supported-field-types/star-tree/
- /field-types/star-tree/
---
# Star tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
{: .warning}

Star tree Index is a multi-field index that improves the performance of aggregations.
Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested.

OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests.

## When to use star tree
Currently, performance of aggregations scales linearly with the number of documents. This is applicable for all aggregation queries where we have to visit doc values to retrieve the results.

Star tree index provides predictable latency to all queries irrespective of underlying documents since it stores the precomputed aggregations.

Star tree works well for append only use cases such as time series data / data streams etc.

Check failure on line 26 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SubstitutionsError] Use 'time-series data' instead of 'time series data'. Raw Output: {"message": "[OpenSearch.SubstitutionsError] Use 'time-series data' instead of 'time series data'.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 26, "column": 56}}}, "severity": "ERROR"}

Check failure on line 26 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SpacingSlash] When using '/' between words, do not insert space on either side of it. Raw Output: {"message": "[OpenSearch.SpacingSlash] When using '/' between words, do not insert space on either side of it.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 26, "column": 68}}}, "severity": "ERROR"}

Check warning on line 26 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove. Raw Output: {"message": "[OpenSearch.LatinismsElimination] Using 'etc.' is unnecessary. Remove.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 26, "column": 88}}}, "severity": "WARNING"}

Star tree index consolidates the data and hence is a storage efficient index which helps in efficient paging and fraction of IO utilization for search queries. However, there is an impact in indexing performance based on the cardinality of the dimensions and the number of metric fields.

## Prerequisites

Before using star-tree field, be sure to satisfy the following prerequisites:

- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- **Enable `doc_values`**: Ensure that the doc values is enabled for the dimensions and metrics fields used in your star-tree mapping.

## Limitations

Currently, star-tree index have the following limitations:

- Document deletions and updates are not accounted in star-tree index and hence query results will be inaccurate for segments with deleted documents
- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index.
- Multi-values fields are not supported
- Only limited queries and aggregations are supported with support for more coming in future


## Examples

The following examples show how to use star-tree index.

### Defining star tree index in mappings

Define star-tree mapping under new section 'composite' in 'mappings'. <br/>
To create star-tree index to precompute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping:

Check failure on line 55 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: precompute. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: precompute. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 55, "column": 30}}}, "severity": "ERROR"}


```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"startree1": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
],
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```

## Star tree mapping parameters
You must specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents.

### Ordered dimensions
The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration.
- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely.
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
- Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases.

#### Properties

| Parameter | Required/Optional | Description |
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.

### Metrics
You can define fields for which you need to perform aggregations. This is required property as part of star-tree configuration.
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.

#### Properties

| Parameter | Required/Optional | Description |
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`.

### Star tree configuration parameters
Following are additional optional parameters that can be configured alongside star-tree index.

| Parameter | Required/Optional | Description |
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and vice versa.

Check warning on line 156 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsSubstitution] Use 'the other way around' instead of 'vice versa'. Raw Output: {"message": "[OpenSearch.LatinismsSubstitution] Use 'the other way around' instead of 'vice versa'.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 156, "column": 272}}}, "severity": "WARNING"}
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false.

## Supported query and aggregations

Star tree index is used to optimize aggregations for selected set of queries with support for more coming in upcoming releases.

### Supported queries
The fields present in the query must also be present as part of `ordered_dimensions` as part of star-tree configuration.

The following queries are supported [ when supported aggregations are specified ] <br/>

- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/)
- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/)

### Supported aggregations
The fields present in the aggregation must also be present as part of `metrics` as part of star-tree configuration.
And also the aggregation must be part of `stats` parameter.

Following metric aggregations are supported.
- SUM
- MIN
- MAX
- VALUE COUNT
- AVG

### Examples
To get sum of `request_size` for all error logs with `status=500` with the example mapping :
```json
POST /logs/_search
{
"query": {
"term": {
"status": "500"
}
},
"aggs": {
"sum_request_size": {
"sum": {
"field": "request_size"
}
}
}
}
```

This query will get optimized automatically as star-tree index will be used.

0 comments on commit dee4bee

Please sign in to comment.