Skip to content

Commit

Permalink
Add documentation for star tree index feature (opensearch-project#8598)
Browse files Browse the repository at this point in the history
* Adding documentation for star tree index feature

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* fixes and addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* fixing json

Signed-off-by: Bharathwaj G <[email protected]>

* fixing json

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* addressing comments

Signed-off-by: Bharathwaj G <[email protected]>

* Add edits for star tree field page

Signed-off-by: Naarcha-AWS <[email protected]>

* Add index edit

Signed-off-by: Naarcha-AWS <[email protected]>

* Update improving-search-performance.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update star-tree-index.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update star-tree.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _field-types/supported-field-types/star-tree.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update star-tree-index.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Bharathwaj G <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Eric Pugh <[email protected]>
  • Loading branch information
3 people authored and epugh committed Nov 23, 2024
1 parent 11e4e58 commit 3a6a961
Show file tree
Hide file tree
Showing 5 changed files with 393 additions and 1 deletion.
1 change: 1 addition & 0 deletions _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ IP | [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/):
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
Star-tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Precomputes aggregations and stores them in a [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index), accelerating the performance of aggregation queries.

## Arrays

Expand Down
199 changes: 199 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
layout: default
title: Star-tree
nav_order: 61
parent: Supported field types
---

# Star-tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
{: .warning}

A [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index) precomputes aggregations, accelerating the performance of aggregation queries.
If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.

OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.

For more information, see [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).

## Prerequisites

To use a star-tree index, follow the instructions in [Enabling a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-a-star-tree-index).

## Limitations

The star-tree index feature has the following limitations:

- A star-tree index should only be enabled on indexes whose data is not updated or deleted because standard updates and deletions are not accounted for in a star-tree index.
- Currently, only `one` star-tree index can be created per index. Support for multiple star-trees will be added in a future version.

## Examples

The following examples show how to use a star-tree index.

### Star-tree index mappings

Define star-tree index mappings in the `composite` section in `mappings`.

The following example API request creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:

```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```



## Star-tree mapping parameters

Specify any star-tree configuration mapping options in the `config` section. Parameters cannot be modified without reindexing documents.

The star-tree `config` section supports the following property.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields.

### Ordered dimensions

The `ordered_dimensions` parameter contains fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be selected for querying only if all the fields in the query are part of the `ordered_dimensions`.

When using the `ordered_dimesions` parameter, follow these best practices:

- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index.

The `ordered_dimensions` parameter supports the following property.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |


### Metrics

Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree configuration.

When using `metrics`, follow these best practices:

- Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed.
- A maximum of `100` base metrics are supported per star-tree index.

If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example:

```json
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
```


#### Properties

The `metrics` parameter supports the following properties.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.

### Star-tree configuration parameters

The following parameters are optional and cannot be modified following index creation.

| Parameter | Description |
| :--- | :--- |
| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split based on the value of the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |

## Supported queries and aggregations

For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations).

4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

- Improve aggregation performance using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
Loading

0 comments on commit 3a6a961

Please sign in to comment.