Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for star tree index feature #8598

Merged
merged 34 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
95a47ac
Adding documentation for star tree index feature
bharath-techie Oct 22, 2024
78b4c41
addressing comments
bharath-techie Oct 23, 2024
0e0483a
addressing comments
bharath-techie Oct 23, 2024
dcf47cf
fixes and addressing comments
bharath-techie Oct 23, 2024
8ecd473
addressing comments
bharath-techie Oct 23, 2024
d8357ae
addressing comments
bharath-techie Oct 24, 2024
ffdc6dc
addressing comments
bharath-techie Oct 24, 2024
b3b5783
fixing json
bharath-techie Oct 24, 2024
05edca0
fixing json
bharath-techie Oct 24, 2024
69387a2
Merge branch 'main' into startree
Naarcha-AWS Oct 28, 2024
06848eb
addressing comments
bharath-techie Oct 29, 2024
5f51c3a
addressing comments
bharath-techie Oct 29, 2024
e5cf72d
Merge branch 'main' into startree
Naarcha-AWS Oct 30, 2024
47de351
Merge branch 'main' into startree
Naarcha-AWS Oct 31, 2024
759a258
Add edits for star tree field page
Naarcha-AWS Oct 31, 2024
db0e127
Add index edit
Naarcha-AWS Oct 31, 2024
f4d3a79
Update improving-search-performance.md
Naarcha-AWS Oct 31, 2024
b4205dd
Update star-tree-index.md
Naarcha-AWS Oct 31, 2024
4aea8bf
Update star-tree.md
Naarcha-AWS Oct 31, 2024
1dd9302
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
f7ef88f
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
37c6f11
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
704212a
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
fe891e6
Update _field-types/supported-field-types/star-tree.md
Naarcha-AWS Nov 1, 2024
01d1eef
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
521fbb0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6a5d89e
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
d249946
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6ce9d22
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
c0c5ec0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
e8bdea5
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
3e372f7
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
19eaad0
Update star-tree-index.md
Naarcha-AWS Nov 1, 2024
f98e02d
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
Star Tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Pre-computes aggregations and stores in a Star Tree Index to accelerate the performance of aggregation queries.

## Arrays

Expand Down
176 changes: 176 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
layout: default
title: Star Tree
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
nav_order: 61
has_children: false
parent: Supported field types
redirect_from:
- /opensearch/supported-field-types/star-tree/
- /field-types/star-tree/
---
# Star tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
{: .warning}

Star Tree Index (STIX) pre-computes aggregations to accelerate the performance of aggregation queries.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking if there is way to avoid aggregations twice in the same sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can wait for doc team review/feedback on this

Once you configure star-tree index as part of index mapping, it will be created and maintained in real-time within segments as data is ingested.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

OpenSearch will automatically use the star-tree index to optimize aggregations based on the input query and star-tree configuration. No changes are required in the query syntax or requests.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

For more information, see [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/)

## Prerequisites

Before using star-tree field, be sure to satisfy the following prerequisites:
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- Set the `index.composite_index` index setting to `true` during index creation.
- Enable `doc_values` : Ensure that the `doc_values` is enabled for the [dimensions](#ordered-dimensions) and [metrics](#metrics) fields used in your star-tree mapping.

natebower marked this conversation as resolved.
Show resolved Hide resolved

## Examples

The following examples show how to use star-tree index.

### Star tree index mapping

natebower marked this conversation as resolved.
Show resolved Hide resolved
Define star-tree mapping under new section `composite` in `mappings`. <br/>
To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"startree1": {
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
],
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```
In the above example, for `startree1` , we will create an associated Star Tree index. Currently only `one` star-tree index can be created per index with support for multiple star-trees coming in future. <br/>

Check warning on line 106 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 106, "column": 8}}}, "severity": "WARNING"}
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

## Star tree mapping parameters
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Specify star-tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents.

### Ordered dimensions
The `ordered_dimensions` are fields based on which the metrics will be aggregated in star-tree index. Star Tree index will be picked for query optimizations only if all the fields in the query are part of the `ordered_dimensions`. This is a required property as part of star-tree configuration.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- The order of dimensions matter and you can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid high cardinality fields as dimensions , because it'll affect storage space, indexing throughput and query performance adversely.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- Support for other field_types such as `keyword` , `ip` is coming as part of upcoming releases.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- Minimum of `2` and maximum of `10` dimensions are supported per Star Tree index.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
#### Properties

| Parameter | Required/Optional | Description |
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.

### Metrics
Configure fields for which you need to perform aggregations. This is required property as part of star-tree configuration.
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.

- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest are base metrics which are indexed.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
- Maximum of `100` base metrics are supported per Star Tree index.

For example, say you provide `Min`, `Max`, `Sum` and `Value_count` as part of all fields as part of `metrics` configuration, you can provide up to 25 fields as below

Check warning on line 132 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 132, "column": 161}}}, "severity": "WARNING"}
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
```json
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]


```


#### Properties

| Parameter | Required/Optional | Description |
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `sum` and `value_count` are present as part of metric `stats`.

### Star tree configuration parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Star tree configuration parameters
### Star-tree configuration parameters

Following are additional optional parameters that can be configured alongside star-tree index. These are final and cannot be modified post index creation.

| Parameter | Required/Optional | Description |
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `max_leaf_docs` | Optional | The maximum number of star-tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and the other way around when increasing the value. For more, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure)
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which star-tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. To know more on star nodes, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure)

## Supported queries and aggregations
For more details on supported queries and aggregations, see [supported query and aggregations for Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations)
4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

- Improve performance of aggregations using [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
Loading
Loading