Skip to content

Commit

Permalink
[DOC] Add new documentation for data sources (#5127)
Browse files Browse the repository at this point in the history
*Add new documentation for 2.11

---------

Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Oct 16, 2023
1 parent 3af3257 commit da7a701
Show file tree
Hide file tree
Showing 23 changed files with 288 additions and 37 deletions.
2 changes: 1 addition & 1 deletion _dashboards/dev-tools/index-dev.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Dev Tools
nav_order: 110
nav_order: 120
has_children: true
---

Expand Down
61 changes: 61 additions & 0 deletions _dashboards/management/S3-data-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
layout: default
title: Connecting Amazon S3 to OpenSearch
parent: Data sources
nav_order: 15
has_children: true
---

# Connecting Amazon S3 to OpenSearch
Introduced 2.11
{: .label .label-purple }

Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon Simple Storage Service (Amazon S3) data source using the OpenSearch Dashboards UI. You can then query that data, optimize query performance, define tables, and integrate your S3 data within a single UI.
Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon S3 data source using the OpenSearch Dashboards user interface (UI). You can then query that data, optimize query performance, define tables, and integrate your S3 data from a single UI.

## Prerequisites

To connect data from Amazon S3 to OpenSearch using OpenSearch Dashboards, you must have:

- Access to Amazon S3 and the [AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2).
- Access to OpenSearch and OpenSearch Dashboards.
- An understanding of OpenSearch data source and connector concepts. See the [developer documentation](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction) for information about these concepts.

## Connect your Amazon S3 data source

To connect your Amazon S3 data source, follow these steps:

1. From the OpenSearch Dashboards main menu, select **Management** > **Data sources**.
2. On the **Data sources** page, select **New data source** > **S3**. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/data-sources-UI.png" alt="Amazon S3 data sources UI" width="700"/>

3. On the **Configure Amazon S3 data source** page, enter the required **Data source details**, **AWS Glue authentication details**, **AWS Glue index store details**, and **Query permissions**. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/S3-config-UI.png" alt="Amazon S3 configuration UI" width="700"/>

4. Select the **Review Configuration** button and verify the details.
5. Select the **Connect to Amazon S3** button.

## Manage your Amazon S3 data source

Once you've connected your Amazon S3 data source, you can explore your data through the **Manage data sources** tab. The following steps guide you through using this functionality:

1. On the **Manage data sources** tab, choose a date source from the list.
2. On that data source's page, you can manage the data source, choose a use case, and manage access controls and configurations. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/manage-data-source-UI.png" alt="Manage data sources UI" width="700"/>

3. (Optional) Explore the Amazon S3 use cases, including querying your data and optimizing query performance. Go to **Next steps** to learn more about each use case.

## Limitations

This feature is still under development, including the data integration functionality. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).

## Next steps

- Learn about [querying your data in Data Explorer]({{site.url}}{{site.baseurl}}/dashboards/management/query-data-source/) through OpenSearch Dashboards.
- Learn about ways to [optimize the query performance of your external data sources]({{site.url}}{{site.baseurl}}/dashboards/management/accelerate-external-data/), such as Amazon S3, through Query Workbench.
- Learn about [Amazon S3 and AWS Glue Data Catalog](https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst) and the APIS used with Amazon S3 data sources, including configuration settings and query examples.
- Learn about [managing your indexes]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards.

66 changes: 66 additions & 0 deletions _dashboards/management/accelerate-external-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
layout: default
title: Optimize query performance using OpenSearch indexing
parent: Connecting Amazon S3 to OpenSearch
grand_parent: Data sources
nav_order: 15
has_children: false
---

# Optimize query performance using OpenSearch indexing
Introduced 2.11
{: .label .label-purple }


Query performance can be slow when using external data sources for reasons such as network latency, data transformation, and data volume. You can optimize your query performance by using OpenSearch indexes, such as a skipping index or a covering index. A _skipping index_ uses skip acceleration methods, such as partition, minimum and maximum values, and value sets, to ingest and create compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality. See the [Flint Index Reference Manual](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md) for comprehensive guidance on this feature's indexing process.

## Data sources use case: Accelerate performance

To get started with the **Accelerate performance** use case available in **Data sources**, follow these steps:

1. Go to **OpenSearch Dashboards** > **Query Workbench** and select your Amazon S3 data source from the **Data sources** dropdown menu in the upper-left corner.
2. From the left-side navigation menu, select a database. An example using the `http_logs` database is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-accelerate-data.png" alt="Query Workbench accelerate data UI" width="700"/>

3. View the results in the table and confirm that you have the desired data.
4. Create an OpenSearch index by following these steps:
1. Select the **Accelerate data** button. A pop-up window appears. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/accelerate-data-popup.png" alt="Accelerate data pop-up window" width="700"/>

2. Enter your details in **Select data fields**. In the **Database** field, select the desired acceleration index: **Skipping index** or **Covering index**. A _skipping index_ uses skip acceleration methods, such as partition, min/max, and value sets, to ingest data using compact aggregate data structures. This makes them an economical option for direct querying scenarios. A _covering index_ ingests all or some of the data from the source into OpenSearch and makes it possible to use all OpenSearch Dashboards and plugin functionality.

5. Under **Index settings**, enter the information for your acceleration index. For information about naming, select **Help**. Note that an Amazon S3 table can only have one skipping index at a time. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/skipping-index-settings.png" alt="Skipping index settings" width="700"/>

### Define skipping index settings

1. Under **Skipping index definition**, select the **Add fields** button to define the skipping index acceleration method and choose the fields you want to add. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/add-fields-skipping-index.png" alt="Skipping index add fields" width="700"/>

2. Select the **Copy Query to Editor** button to apply your skipping index settings.
3. View the skipping index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/query-workbench-S3.png" alt="Run a skippping or covering index UI" width="700"/>

### Define covering index settings

1. Under **Index settings**, enter a valid index name. Note that each Amazon S3 table can have multiple covering indexes. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-naming.png" alt="Covering index settings" width="700"/>

2. Once you have added the index name, define the covering index fields by selecting `(add fields here)` under **Covering index definition**. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/covering-index-fields.png" alt="Covering index field naming" width="700"/>

3. Select the **Copy Query to Editor** button to apply your covering index settings.
4. View the covering index query details in the table pane and then select the **Run** button. Your index is added to the left-side navigation menu containing the list of your databases. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/run-index-query-workbench.png" alt="Run index in Query Workbench" width="700"/>

## Limitations

This feature is still under development, so there are some limitations. For real-time updates, see the [developer documentation on GitHub](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md#limitations).
73 changes: 73 additions & 0 deletions _dashboards/management/data-sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
---
layout: default
title: Data sources
nav_order: 110
has_children: true
---

# Data sources

OpenSearch data sources are the applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards UI.

This documentation focuses on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see the developer resources linked under [Next steps](#next-steps).

## Prerequisites

The first step in connecting your data sources to OpenSearch is to install OpenSearch and OpenSearch Dashboards on your system. You can follow the installation instructions in the [OpenSearch documentation]({{site.url}}{{site.baseurl}}/install-and-configure/index/) to install these tools.

Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources to OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard.

Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuration/#configuration-file) and installation of the `dashboards-observability` and `opensearch-sql` plugins is necessary. For more information, see [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/).

## Create a data source connection

A data source connection specifies the parameters needed to connect to a data source. These parameters form a connection string for the data source. Using Dashboards, you can add new data source connections or manage existing ones.

The following steps guide you through the basics of creating a data source connection:

1. From the OpenSearch Dashboards main menu, select **Dashboards Management** > **Data sources** > **Create data source connection**. The UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/data-source-UI.png" alt="Connecting a data source UI" width="700"/>

2. Create the data source connection by entering the appropriate information into the **Connection Details** and **Authentication Method** fields.

- Under **Connection Details**, enter a title and endpoint URL. For this tutorial, use the URL `http://localhost:5601/app/management/opensearch-dashboards/dataSources`. Entering a description is optional.

- Under **Authentication Method**, select an authentication method from the dropdown list. Once an authentication method is selected, the applicable fields for that method appear. You can then enter the required details. The authentication method options are:
- **No authentication**: No authentication is used to connect to the data source.
- **Username & Password**: A basic username and password are used to connect to the data source.
- **AWS SigV4**: An AWS Signature Version 4 authenticating request is used to connect to the data source. AWS Signature Version 4 requires an access key and a secret key.
- For AWS Signature Version 4 authentication, first specify the **Region**. Next, select the OpenSearch service in the **Service Name** list. The options are **Amazon OpenSearch Service** and **Amazon OpenSearch Serverless**. Lastly, enter the **Access Key** and **Secret Key** for authorization.

After you have populated the required fields, the **Test connection** and **Create data source** buttons become active. You can select **Test connection** to confirm that the connection is valid.

3. Select **Create data source** to save your settings. The connection is created. The active window returns to the **Data sources** main page, and the new connection appears in the list of data sources.

4. To delete a data source connection, select the checkbox to the left of the data source **Title** and then select the **Delete 1 connection** button. Selecting multiple checkboxes for multiple connections is supported. An example UI is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/delete-data-source.png" alt="Deleting a data source UI" width="700"/>

### Modify a data source connection

To make changes to a data source connection, select a connection in the list on the **Data sources** main page. The **Connection Details** window opens.

To make changes to **Connection Details**, edit one or both of the **Title** and **Description** fields and select **Save changes** in the lower-right corner of the screen. You can also cancel changes here. To change the **Authentication Method**, choose a different authentication method, enter your credentials (if applicable), and then select **Save changes** in the lower-right corner of the screen. The changes are saved.

When **Username & Password** is the selected authentication method, you can update the password by choosing **Update stored password** next to the **Password** field. In the pop-up window, enter a new password in the first field and then enter it again in the second field to confirm. Select **Update stored password** in the pop-up window. The new password is saved. Select **Test connection** to confirm that the connection is valid.

When **AWS SigV4** is the selected authentication method, you can update the credentials by selecting **Update stored AWS credential**. In the pop-up window, enter a new access key in the first field and a new secret key in the second field. Select **Update stored AWS credential** in the pop-up window. The new credentials are saved. Select **Test connection** in the upper-right corner of the screen to confirm that the connection is valid.

To delete the data source connection, select the delete icon ({::nomarkdown}<img src="{{site.url}}{{site.baseurl}}/images/dashboards/trash-can-icon.png" class="inline-icon" alt="delete icon"/>{:/}).

## Create an index pattern

Once you've created a data source connection, you can create an index pattern for the data source. An _index pattern_ is a template that OpenSearch uses to create indexes for data from the data source. See [Index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) for more information and a tutorial.

## Next steps

- Learn about [managing index patterns]({{site.url}}{{site.baseurl}}/dashboards/management/index-patterns/) through OpenSearch Dashboards.
- Learn about [indexing data using Index Management]({{site.url}}{{site.baseurl}}/dashboards/im-dashboards/index/) through OpenSearch Dashboards.
- Learn about how to connect [multiple data sources]({{site.url}}{{site.baseurl}}/dashboards/management/multi-data-sources/).
- Learn about how to [connect OpenSearch and Amazon S3 through OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/dashboards/management/S3-data-source/).
- Learn about the [Integrations]({{site.url}}{{site.baseurl}}/integrations/index/) tool, which gives you the flexibility to use various data ingestion methods and connect data from the Dashboards UI.

2 changes: 1 addition & 1 deletion _dashboards/management/index-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ An example of step 1 is shown in the following image. Note that the index patter

Once the index pattern has been created, you can view the mapping of the matching indexes. Within the table, you can see the list of fields, along with their data type and properties. An example is shown in the following image.

<img src="{{site.url}}{{site.baseurl}}/images/dashboards/index-pattern-table.png" alt="Index pattern table UI " width="700"/>
<img src="{{site.url}}{{site.baseurl}}//images/dashboards/index-pattern-table.png" alt="Index pattern table UI " width="700"/>

## Next steps

Expand Down
Loading

0 comments on commit da7a701

Please sign in to comment.