Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for star tree index feature #8598

Merged
merged 34 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
95a47ac
Adding documentation for star tree index feature
bharath-techie Oct 22, 2024
78b4c41
addressing comments
bharath-techie Oct 23, 2024
0e0483a
addressing comments
bharath-techie Oct 23, 2024
dcf47cf
fixes and addressing comments
bharath-techie Oct 23, 2024
8ecd473
addressing comments
bharath-techie Oct 23, 2024
d8357ae
addressing comments
bharath-techie Oct 24, 2024
ffdc6dc
addressing comments
bharath-techie Oct 24, 2024
b3b5783
fixing json
bharath-techie Oct 24, 2024
05edca0
fixing json
bharath-techie Oct 24, 2024
69387a2
Merge branch 'main' into startree
Naarcha-AWS Oct 28, 2024
06848eb
addressing comments
bharath-techie Oct 29, 2024
5f51c3a
addressing comments
bharath-techie Oct 29, 2024
e5cf72d
Merge branch 'main' into startree
Naarcha-AWS Oct 30, 2024
47de351
Merge branch 'main' into startree
Naarcha-AWS Oct 31, 2024
759a258
Add edits for star tree field page
Naarcha-AWS Oct 31, 2024
db0e127
Add index edit
Naarcha-AWS Oct 31, 2024
f4d3a79
Update improving-search-performance.md
Naarcha-AWS Oct 31, 2024
b4205dd
Update star-tree-index.md
Naarcha-AWS Oct 31, 2024
4aea8bf
Update star-tree.md
Naarcha-AWS Oct 31, 2024
1dd9302
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
f7ef88f
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
37c6f11
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
704212a
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
fe891e6
Update _field-types/supported-field-types/star-tree.md
Naarcha-AWS Nov 1, 2024
01d1eef
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
521fbb0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6a5d89e
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
d249946
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6ce9d22
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
c0c5ec0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
e8bdea5
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
3e372f7
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
19eaad0
Update star-tree-index.md
Naarcha-AWS Nov 1, 2024
f98e02d
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
Star-tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Pre-computes aggregations and stores them in a [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index), accelerating the performance of aggregation queries.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Arrays

Expand Down
199 changes: 199 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
layout: default
title: Star Tree
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
nav_order: 61
parent: Supported field types
---

# Star-tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
{: .warning}

[Star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index) pre-computes aggregations, accelerating the performance of aggregation queries.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.

OpenSearch will automatically use the star-tree index to optimize aggregations if the fields queried are part of star-tree index dimension fields and the aggregations are on star-tree index metrics fields. No changes are required in the query syntax or the request parameters.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

For more information, see [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/)
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Prerequisites

To use the star-tree index, follow the instruction in [Enabling the star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-the-star-tree-index).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Limitations

The star-tree index feature has the following limitations:

natebower marked this conversation as resolved.
Show resolved Hide resolved
- A star-tree index should only be used on indexes whose data is not updated or deleted, as standard updates and deletions are not accounted in the star-tree index.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
- Currently, only `one` star-tree index can be created per index. Support for multiple star-trees will be added in a future release.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Examples

The following examples show how to use a star-tree index.

### Star-tree index mapping
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

natebower marked this conversation as resolved.
Show resolved Hide resolved
Define star-tree mapping under the `composite` section in `mappings`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The following example creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example "query"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```



## Star tree mapping parameters
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Specify any star-tree configuration mapping options under the `config` section. All parameters are final and cannot be modified without reindexing documents.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The star-tree `config` section supports the following property.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the" precede "index mapping"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

### Ordered dimensions

bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
The `ordered_dimensions` parameter are fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be picked for querying only if all the fields in the query are part of the `ordered_dimensions`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
When using the `ordered_dimesions` parameter, remember the following best practices:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using the `ordered_dimesions` parameter, remember the following best practices:
When using the `ordered_dimesions` parameter, follow these best practices:


- The order of dimensions matter. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The order of dimensions matter. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.

- Avoid high cardinality fields as dimensions. High cardinality fields adversely affect storage space, indexing throughput, and query performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Avoid high cardinality fields as dimensions. High cardinality fields adversely affect storage space, indexing throughput, and query performance.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.

- Currently, supported fields for `ordered_dimensions` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Currently, supported fields for `ordered_dimensions` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).

- Support for other field_types such as `keyword`, `ip` will be released in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
Copy link
Collaborator

@natebower natebower Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Support for other field_types such as `keyword`, `ip` will be released in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).

- A minimum of `2` and maximum of `10` dimensions are supported per star-tree index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- A minimum of `2` and maximum of `10` dimensions are supported per star-tree index.
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index.


Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`ordered_dimensions` supports the following property
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the" precede "index mapping"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved


### Metrics

Configure any metric fields for which you need to perform aggregations. `Metrics` are required as part of a star-tree configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"for which" => "on which"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

When using `metrics`, remember the following best practices:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using `metrics`, remember the following best practices:
When using `metrics`, follow these best practices:


- Currently, supported fields for `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Currently, supported fields for `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/) with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).

- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg` and `Value_count`.
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.

- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest of the base metrics are indexed.
Copy link
Collaborator

@natebower natebower Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed and is derived on query time. Rest of the base metrics are indexed.
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed.

- A maximum of `100` base metrics are supported per star-tree index.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

If `Min`, `Max`, `Sum` and `Value_count` are defined as `metrics` for each field then up to 25 such fields can be configured, as shown in the following example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If `Min`, `Max`, `Sum` and `Value_count` are defined as `metrics` for each field then up to 25 such fields can be configured, as shown in the following example:
If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example:


```json
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
```


#### Properties

The `metrics` parameter supports the following properties:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `metrics` parameter supports the following properties:
The `metrics` parameter supports the following properties.


| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the" precede "index mapping"?

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric stat which will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.


### Star tree configuration parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Star tree configuration parameters
### Star-tree configuration parameters


The following optional parameters can be configured with a star-tree index. These are final and cannot be modified post index creation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following optional parameters can be configured with a star-tree index. These are final and cannot be modified post index creation.
The following parameters are optional and cannot be modified following index creation.


| Parameter | Description |
| :--- | :--- |
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
| `max_leaf_docs` | The maximum number of star-tree documents a leaf node can point to. After the maximum number is reached post which the nodes will be split to the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
Copy link
Collaborator

@natebower natebower Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `max_leaf_docs` | The maximum number of star-tree documents a leaf node can point to. After the maximum number is reached post which the nodes will be split to the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [star tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split to the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |

| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip creating the star node. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip creating the star node. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |


## Supported queries and aggregations

For more details on supported queries and aggregations, see [supported query and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations).
Copy link
Collaborator

@natebower natebower Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For more details on supported queries and aggregations, see [supported query and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-query-and-aggregations).
For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations).


Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Improve performance of aggregations using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
- Improve aggregation performance using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).

- Improve performance of aggregations using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
Loading
Loading