Skip to content

Commit

Permalink
Added more links and command
Browse files Browse the repository at this point in the history
Signed-off-by: Fanit Kolchina <[email protected]>
  • Loading branch information
kolchfa-aws committed Dec 16, 2024
1 parent d5d8758 commit e84acb4
Showing 1 changed file with 45 additions and 32 deletions.
77 changes: 45 additions & 32 deletions _dashboards/management/scheduled-query-acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Introduced 2.17

Scheduled Query Acceleration (SQA) is designed to optimize direct queries from OpenSearch to Amazon Simple Storage Service (Amazon S3). It addresses issues often faced when managing and refreshing indexes, views, and data in an automated way.

Query acceleration is facilitated by secondary indexes like skipping indexes, covering indexes, or materialized views. These indexes store either metadata or actual data from an Amazon S3 in optimized formats. When queries run, they use these indexes instead of directly querying S3.
Query acceleration is facilitated by secondary indexes like skipping indexes, covering indexes, or materialized views. When queries run, they use these indexes instead of directly querying S3.

The secondary indexes need to be refreshed periodically to stay current with the Amazon S3 data. This refresh can be scheduled using an internal scheduler (within Spark) or an external scheduler.

Expand Down Expand Up @@ -40,43 +40,47 @@ Before configuring SQA, verify that the following requirements are met:

- Ensure you're running OpenSearch version 2.17 or later.
- Ensure you have the SQL plugin installed. The SQL plugin is part of most OpenSearch distributions. For more information, see [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/).
- Ensure you have configured an Amazon S3 and Amazon EMR Serverless.
- Ensure you have configured an Amazon S3 and Amazon EMR Serverless (needed for access to Apache Spark).

## Configuring SQA

To configure SQA, perform the following steps.

### Step 1: Configure the OpenSearch cluster settings

Configure the following cluster settings:

- **Enable asynchronous query execution**: Set `plugins.query.executionengine.async_query.enabled` to `true` (default value):

```json
PUT /_cluster/settings
{
"transient": {
"plugins.query.executionengine.async_query.enabled": "true"
}
}
```
{% include copy-curl.html %}

For more information, see [Settings](https://github.com/opensearch-project/sql/blob/main/docs/user/admin/settings.rst#pluginsqueryexecutionengineasync_queryenabled).

- **Configure the external scheduler interval for asynchronous queries**: This setting defines how often the external scheduler checks for tasks, allowing customization of refresh frequency. There is no default value for this setting so you must explicitly configure it. Adjusting the interval based on workload can optimize resources and manage costs:

```json
PUT /_cluster/settings
{
"transient": {
"plugins.query.executionengine.async_query.external_scheduler.interval": "10 minutes"
}
}
```
{% include copy-curl.html %}

For more information, see [Settings](https://github.com/opensearch-project/sql/blob/main/docs/user/admin/settings.rst#pluginsqueryexecutionengineasync_queryexternal_schedulerinterval).
Configure the following cluster settings.

#### Enable asynchronous query execution

Set `plugins.query.executionengine.async_query.enabled` to `true` (default value):

```json
PUT /_cluster/settings
{
"transient": {
"plugins.query.executionengine.async_query.enabled": "true"
}
}
```
{% include copy-curl.html %}

For more information, see [Settings](https://github.com/opensearch-project/sql/blob/main/docs/user/admin/settings.rst#pluginsqueryexecutionengineasync_queryenabled).

#### Configure the external scheduler interval for asynchronous queries

This setting defines how often the external scheduler checks for tasks, allowing customization of refresh frequency. There is no default value for this setting so you must explicitly configure it. Adjusting the interval based on workload can optimize resources and manage costs:

```json
PUT /_cluster/settings
{
"transient": {
"plugins.query.executionengine.async_query.external_scheduler.interval": "10 minutes"
}
}
```
{% include copy-curl.html %}

For more information, see [Settings](https://github.com/opensearch-project/sql/blob/main/docs/user/admin/settings.rst#pluginsqueryexecutionengineasync_queryexternal_schedulerinterval).

### Step 2: Configure Apache Spark settings

Check failure on line 85 in _dashboards/management/scheduled-query-acceleration.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Step 2: Configure Apache Spark settings' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Step 2: Configure Apache Spark settings' is a heading and should be in sentence case.", "location": {"path": "_dashboards/management/scheduled-query-acceleration.md", "range": {"start": {"line": 85, "column": 5}}}, "severity": "ERROR"}

Expand All @@ -86,6 +90,8 @@ Configure the following Apache Spark settings:

- Configure `spark.flint.job.externalScheduler.interval` (default is `5 minutes`). This setting specifies a refresh interval at which an external scheduler triggers index refresh operations. For valid time units, see [Time units](#time-units).

For more information, see [OpenSearch Spark documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#apache-spark).

### Step 3: Configure a data source

Connect OpenSearch to your Amazon S3 data source using the OpenSearch Dashboards interface. For more information, see [Connecting Amazon S3 to OpenSearch]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/).
Expand Down Expand Up @@ -243,7 +249,14 @@ We recommend the following practices to reduce costs:

## Validations

You can validate your settings by running test queries and verifying the scheduler configurations.
You can validate your settings by running a test query and verifying the scheduler configurations:

```sql
SHOW FLINT INDEXES EXTENDED
```
{% include copy.html %}

For more information, see [OpenSearch Spark documentation](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#all-indexes).

## Troubleshooting

Expand Down

0 comments on commit e84acb4

Please sign in to comment.