Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for star tree index feature #8598

Merged
merged 34 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
95a47ac
Adding documentation for star tree index feature
bharath-techie Oct 22, 2024
78b4c41
addressing comments
bharath-techie Oct 23, 2024
0e0483a
addressing comments
bharath-techie Oct 23, 2024
dcf47cf
fixes and addressing comments
bharath-techie Oct 23, 2024
8ecd473
addressing comments
bharath-techie Oct 23, 2024
d8357ae
addressing comments
bharath-techie Oct 24, 2024
ffdc6dc
addressing comments
bharath-techie Oct 24, 2024
b3b5783
fixing json
bharath-techie Oct 24, 2024
05edca0
fixing json
bharath-techie Oct 24, 2024
69387a2
Merge branch 'main' into startree
Naarcha-AWS Oct 28, 2024
06848eb
addressing comments
bharath-techie Oct 29, 2024
5f51c3a
addressing comments
bharath-techie Oct 29, 2024
e5cf72d
Merge branch 'main' into startree
Naarcha-AWS Oct 30, 2024
47de351
Merge branch 'main' into startree
Naarcha-AWS Oct 31, 2024
759a258
Add edits for star tree field page
Naarcha-AWS Oct 31, 2024
db0e127
Add index edit
Naarcha-AWS Oct 31, 2024
f4d3a79
Update improving-search-performance.md
Naarcha-AWS Oct 31, 2024
b4205dd
Update star-tree-index.md
Naarcha-AWS Oct 31, 2024
4aea8bf
Update star-tree.md
Naarcha-AWS Oct 31, 2024
1dd9302
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
f7ef88f
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
37c6f11
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
704212a
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
fe891e6
Update _field-types/supported-field-types/star-tree.md
Naarcha-AWS Nov 1, 2024
01d1eef
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
521fbb0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6a5d89e
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
d249946
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6ce9d22
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
c0c5ec0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
e8bdea5
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
3e372f7
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
19eaad0
Update star-tree-index.md
Naarcha-AWS Nov 1, 2024
f98e02d
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
Star Tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Pre-computes aggregations and stores in a Star Tree Index to accelerate the performance of aggregation queries.

## Arrays

Expand Down
176 changes: 176 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
layout: default
title: Star Tree
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
nav_order: 61
has_children: false
parent: Supported field types
redirect_from:
- /opensearch/supported-field-types/star-tree/
- /field-types/star-tree/
---
# Star tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
{: .warning}

Star Tree Index (STIX) pre-computes aggregations to accelerate the performance of aggregation queries. <br/>
If STIX is configured as part of index mapping, it will be created and maintained in real-time as the data is ingested.<br/>
OpenSearch will automatically use the STIX to optimize aggregations if the fields queried are part of STIX dimension fields and the aggregations are on STIX metrics fields. No changes are required in the query syntax or the request parameters.

For more information, see [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/)

## Prerequisites

To use Star Tree on your index, the following prerequisites needs to be satisfied:

- Set the feature flag `opensearch.experimental.feature.composite_index.star_tree.enabled"` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
- Set the `indices.composite_index.star_tree.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings).
- Set the `index.composite_index` index setting to `true` during index creation.
- Enable `doc_values` : Ensure that the `doc_values` is enabled for the [dimensions](#ordered-dimensions) and [metrics](#metrics) fields used in your Star Tree mapping.
- STIX should only be used on indexes whose data is not updated or deleted, as updates and/or deletes are not accounted in STIX.

## Examples

The following examples show how to use Star Tree index.

natebower marked this conversation as resolved.
Show resolved Hide resolved
### Star tree index mapping

Define Star Tree mapping under new section `composite` in `mappings`. <br/>
To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```
In the above example, for `request_aggs` , it will create a corresponding STIX. Currently only `one` STIX can be created per index. Support for multiple Star Trees is a work in progress and will be released in the future. <br/>

Check warning on line 107 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 107, "column": 8}}}, "severity": "WARNING"}

## Star tree mapping parameters
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
Specify Star Tree configuration under `config` section. All parameters are final and cannot be modified without reindexing documents.

### Ordered dimensions
The `ordered_dimensions` are fields based on which the metrics will be aggregated in a STIX. STIX will be picked for querying only if all the fields in the query are part of the `ordered_dimensions`.
- The order of dimensions matter and you can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid high cardinality fields as dimensions as it'll affect storage space, indexing throughput and query performance adversely.
- Currently, supported fields for `ordered_dimensions` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`(see [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/15231)).
- Support for other field_types such as `keyword`, `ip` is currently work in progress and will be released in future versions (see [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/16232))
- Minimum of `2` and maximum of `10` dimensions are supported per Star Tree index.
bharath-techie marked this conversation as resolved.
Show resolved Hide resolved

bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
#### Properties

| Parameter | Required/Optional | Description |
|:---------------------| :--- |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.

### Metrics
Configure fields for which you need to perform aggregations. This is required property as part of Star Tree configuration.
- Currently, supported fields for `metrics` are of [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long` (see [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/15231)).
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.

- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest of the base metrics are indexed.
- Maximum of `100` base metrics are supported per Star Tree index.

For example, if `Min`, `Max`, `Sum` and `Value_count` are defined as `metrics` for each field. Then, up to 25 such fields can be configured as below.

Check warning on line 133 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'following or later' instead of 'below' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 133, "column": 144}}}, "severity": "WARNING"}
```json
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
```


#### Properties

| Parameter | Required/Optional | Description |
|:---------------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `name` | Required | Name of the field which should also be present in `properties` as part of index `mapping` and ensure `doc_values` is `enabled` for associated fields.
| `stats` | Optional | List of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Defaults are `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.

### Star tree configuration parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Star tree configuration parameters
### Star-tree configuration parameters

Following are additional optional parameters that can be configured alongside Star Tree index. These are final and cannot be modified post index creation.

| Parameter | Required/Optional | Description |
|:----------------|:------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `max_leaf_docs` | Optional | The maximum number of Star Tree documents leaf node can point to post which the nodes will be split to next dimension.10000 is the default value. Lowering the value will result in high storage size but faster query performance and the other way around when increasing the value. For more, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure)
| `skip_star_node_creation_for_dimensions` | Optional | List of dimensions for which Star Tree will skip creating star node. Setting this to `true` can reduce storage size at the expense of query performance. Default is false. To know more on star nodes, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure)

## Supported queries and aggregations
For more details on supported queries and aggregations, see [supported query and aggregations for Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations)
4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

- Improve performance of aggregations using [Star Tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
Loading
Loading