[DOC] Add new documentation for data sources #5127

vagimeli · 2023-10-02T21:27:21Z

Description

Add new end user documentation for data sources, including connecting external data sources (S3, Prometheus) and speeding up external data ingestion' revise multiple data sources to remove redundancy with general data sources page

Issues Resolved

Fixes #5061

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

vagimeli · 2023-10-05T20:16:53Z

_dashboards/management/spark.md

+
+Configuration of the YAML files and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/opensearch-sql/) is necessary. The following plugins are required for using the Apache Spark integration feature: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`. 
+
+<SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Do users need to enable `data_sources` in the YAML file? Provide configuration examples.>


Questions for SMEs:

Where is the demo? Where is the endpoint to test this feature as I write?

What are the prerequisites to use this feature?

Who are the target users? Do they need to be using OpenSearch Service?

What YAML configurations need to be made? What settings need to be configured? Do users need to set data_source.enabled: false to true?

Provide configuration examples.

vagimeli · 2023-10-10T19:25:57Z

10/10 Connected with Managed Services UX and tech writer to sync on documentation updates for open source and services; awaiting response

_dashboards/management/accelerate-external-data.md

vagimeli · 2023-10-12T17:42:52Z

_dashboards/management/S3-data-source.md

+
+Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon S3 data source using the OpenSearch Dashboards user interface (UI). You can then query that data, optimize query performance, define tables, and integrate your S3 data from a single UI.  
+
+## Prerequisites


<SMEs: What are the prerequisites? Installing specific plugins? update cluster settings? Provide settings examples.>

<Do we need to mention anything about the API?>

<Does Snapshot Management S3 documentation relate to this topic? https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#amazon-s3>

This feature is not related to snapshots.

Here are the details on prereqs for plugin: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2

Here are the API samples similar to how datasources were added for prometheus earlier: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction

_dashboards/management/S3-data-source.md

_dashboards/management/accelerate-external-data.md

vagimeli · 2023-10-12T17:45:57Z

_dashboards/management/data-sources.md

+
+Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface. 
+
+The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see the following:


Which APIs do we link user to? Please provide the links.

vagimeli · 2023-10-12T17:47:44Z

_dashboards/management/data-sources.md

+
+Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources and OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard.
+
+Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuration/#configuration-file) and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) is necessary. The data sources feature flag `data_source.enabled:` must be set to `true`. The default is `false`. The following plugins also are required for integrating your data sources and OpenSearch: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`. 


SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Provide configuration examples.

@vamsi-amazon and @derek-ho can provide more info here.

I believe The data sources feature flag data_source.enabled: must be set to true. This is incorrect, this is a different datasources - which is the one in core dashboards - this is a different feature. Technically opensearch-security is not required, and technically opensearch-observability is also not required.

Only requirements are dashboards-observability and opensearch-sql, although we may just want to call out the others as optional in case they want to use other parts of the product.

brijos

Please change all references to AWS Glue to AWS Glue Data Catalog. AWS Glue has many different features and we are working specifically with AWS Glue Data Catalog. Thank you!

_dashboards/management/accelerate-external-data.md

_dashboards/management/data-sources.md

kolchfa-aws · 2023-10-13T17:57:44Z

_dashboards/management/data-sources.md

+
+Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface. 
+
+The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see <insert links to API references>.


Don't forget to add the link

Revised and directed reader to next steps section. still don't know what API info the user needs to be directed to. SMEs to revisit after this version of documentation is released.

_dashboards/management/data-sources.md