Skip to content

Commit

Permalink
Adding documentation for star tree index feature
Browse files Browse the repository at this point in the history
Signed-off-by: Bharathwaj G <[email protected]>
  • Loading branch information
bharath-techie committed Oct 23, 2024
1 parent ddcb206 commit 26ff94f
Show file tree
Hide file tree
Showing 5 changed files with 327 additions and 2 deletions.
2 changes: 1 addition & 1 deletion _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.

Star Tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Allows creating materialized views by pre-computing aggregations during indexing based on user-provided configuration to accelerate performance of aggregations.
## Arrays

There is no dedicated array field type in OpenSearch. Instead, you can pass an array of values into any field. All values in the array must have the same field type.
Expand Down
148 changes: 148 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
---
layout: default
title: Star Tree
nav_order: 61
has_children: false
parent: Supported field types
redirect_from:
- /opensearch/supported-field-types/star-tree/
- /field-types/star-tree/
---
# Star tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
{: .warning}

Star Tree Index is a multi-field index that improves the performance of aggregations.
Once you configure star-tree index as part of index mapping by specifying the dimensions and metrics, star-tree index gets created and maintained in real-time within segments as data is ingested.

OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests.

For more information, see [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/)

## Prerequisites

Before using star-tree field, be sure to satisfy the following prerequisites:

- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- Set the `index.composite_index` index setting to `true` during index creation.
- Enable `doc_values` : Ensure that the `doc_values` is enabled for the dimensions and metrics fields used in your star-tree mapping.


## Examples

The following examples show how to use star-tree index.

### Star tree index mapping

Define star-tree mapping under new section `composite` in `mappings`. <br/>
To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"startree1": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
],
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```
In the above example, for `startree1` , we will create an associated Star Tree index. Currently only `one` star-tree index can be created per index with support for multiple star-trees coming in future. <br/>

Check warning on line 106 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 106, "column": 8}}}, "severity": "WARNING"}

## Star tree mapping parameters
Specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents.

### Ordered dimensions
The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star Tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration.
- The order of dimensions matter and you must define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely.
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
- Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases.
- Minimum of `2` and maximum of `10` dimensions are supported per Star Tree index.

#### Properties

| Parameter | Required/Optional | Description |
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.

### Metrics
Configure fields for which you need to perform aggregations. This is required property as part of star-tree configuration.
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest are base metrics which are indexed.
- Upto `100` base metrics are supported per Star Tree index.

#### Properties

| Parameter | Required/Optional | Description |
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`.

### Star tree configuration parameters
Following are additional optional parameters that can be configured alongside star-tree index.

| Parameter | Required/Optional | Description |
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and the other way around when increasing the value.
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false.

## Supported queries and aggregations
For more details on supported queries and aggregations, see [supported query and aggregations for Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations)
4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

- Improve performance of aggregations using [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
175 changes: 175 additions & 0 deletions _search-plugins/star-tree-index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
layout: default
title: Star Tree index
parent: Improving search performance
nav_order: 54
---

# Star tree index

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
{: .warning}

Star Tree Index is a multi-field index that improves the performance of aggregations.

OpenSearch will use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests.

## Star tree index structure

<img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A Star Tree index containing two dimensions and two metrics" width="700">

Star Tree index structure as portrayed in the above figure, consists of mainly two parts: Star Tree and sorted and aggregated star-tree documents backed by doc-values indexes.

Check warning on line 21 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 21, "column": 47}}}, "severity": "WARNING"}

Each node in the Star Tree points to a range of star-tree documents.
A node is further split into child nodes based on maxLeafDocs configuration.
The number of documents a leaf node points to is than or equal to maxLeafDocs. This ensures the maximum number of documents that gets traversed to get to the aggregated value is at most maxLeafDocs, thus providing predictable latencies.

There are special nodes called `star nodes (*)` which helps in skipping non-competitive nodes and also in fetching aggregated document wherever applicable during query time.

The figure contains three examples explaining the Star Tree traversal during query:
- Compute average request size aggregation with Terms query where port equals 8443 and status equals 200 (Support for Terms query will be added in upcoming release)
- Compute count of requests aggregation with Term query where status equals 200 (query traverses through * node of `port` dimension since `port` is not present as part of query)
- Compute average request size aggregation with Term query where port equals 5600 (query traverses through * node of `status` dimension since `status` is not present as part of query).
<br/>The second and third examples uses star nodes.


## When to use star tree index
You can be use Star Tree index to perform faster aggregations with a constant upper bound on query latency.
- Star Tree natively supports multi field aggregations
- Star Tree index will be created in real time as part of regular indexing, so the data in Star Tree will always be up to date with the live data.
- Star Tree index consolidates the data and hence is a storage efficient index which results in efficient paging and fraction of IO utilization for search queries.

## Considerations
- Star Tree index ideally should be used with append-only indices, as updates or deletes are not accounted in Star Tree index.

Check failure on line 43 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'. Raw Output: {"message": "[OpenSearch.SubstitutionsError] Use 'indexes' instead of 'indices'.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 43, "column": 59}}}, "severity": "ERROR"}
- Star Tree index will be used for aggregation queries only if the query input is a subset of the Star Tree configuration of dimensions and metrics
- Once star-tree index is enabled for an index, you currently cannot disable it. You have to reindex without the star-tree mapping to remove star-tree from the index.
- Changing Star Tree configuration will also require a re-index operation.
- [Multi-values/array values]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/index/#arrays) are not supported
- Only [limited queries and aggregations](#supported-query-and-aggregations) are supported with support for more coming in future
- The cardinality of the dimensions should not be very high (like _id fields), otherwise it leads to storage explosion and higher query latencies.

Check failure on line 49 in _search-plugins/star-tree-index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: _id. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: _id. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_search-plugins/star-tree-index.md", "range": {"start": {"line": 49, "column": 68}}}, "severity": "ERROR"}

## Enabling star tree index
- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- Set the `index.composite_index` index setting to `true` during index creation.

## Examples

The following examples show how to use star-tree index.

### Defining star tree index in mappings

Define star-tree configuration in index mappings when creating an index. <br/>
To create star-tree index to pre-compute aggregations for `request_size` and `latency` fields for all the combinations of values in `port` and `status` fields indexed in the `logs` index, configure the following mapping:

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"startree1": {
"type": "star_tree",
"config": {
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
],
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```

For detailed information about Star Tree index mapping and parameters see [Star Tree field type]({{site.url}}{{site.baseurl}}/field-types/star-tree/).

## Supported query and aggregations

Star Tree index can be used to optimize aggregations for selected set of queries with support for more coming in upcoming releases.

### Supported queries
Ensure the following in star tree index mapping,
- The fields present in the query must be present as part of `ordered_dimensions` as part of star-tree configuration.

The following queries are supported [ when supported aggregations are specified ] <br/>

- [Term query](https://opensearch.org/docs/latest/query-dsl/term/term/)
- [Match all docs query](https://opensearch.org/docs/latest/query-dsl/match-all/)

### Supported aggregations
Ensure the following in star tree index mapping,
- The fields present in the aggregation must be present as part of `metrics` as part of star-tree configuration.
- The metric aggregation type must be part of `stats` parameter.

Following metric aggregations are supported.
- [Sum](https://opensearch.org/docs/latest/aggregations/metric/sum/)
- [Minimum](https://opensearch.org/docs/latest/aggregations/metric/minimum/)
- [Maximum](https://opensearch.org/docs/latest/aggregations/metric/maximum/)
- [Value count](https://opensearch.org/docs/latest/aggregations/metric/value-count/)
- [Average](https://opensearch.org/docs/latest/aggregations/metric/average/)

### Examples
To get sum of `request_size` for all error logs with `status=500` with the [example mapping](#defining-star-tree-index-in-mappings) :
```json
POST /logs/_search
{
"query": {
"term": {
"status": "500"
}
},
"aggs": {
"sum_request_size": {
"sum": {
"field": "request_size"
}
}
}
}
```

This query will get optimized automatically as star-tree index will be used.

You can set the `indices.composite_index.star_tree.enabled` setting to `false` to run queries without using star-tree index.
Binary file added images/star-tree-index.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 26ff94f

Please sign in to comment.