Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add new documentation for data sources #5127

Merged
merged 72 commits into from
Oct 16, 2023
Merged

[DOC] Add new documentation for data sources #5127

merged 72 commits into from
Oct 16, 2023

Conversation

vagimeli
Copy link
Contributor

@vagimeli vagimeli commented Oct 2, 2023

Description

Add new end user documentation for data sources, including connecting external data sources (S3, Prometheus) and speeding up external data ingestion' revise multiple data sources to remove redundancy with general data sources page

Issues Resolved

Fixes #5061

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@vagimeli vagimeli added 2 - In progress Issue/PR: The issue or PR is in progress. v2.11.0 labels Oct 2, 2023
@vagimeli vagimeli self-assigned this Oct 2, 2023
@vagimeli vagimeli added this to the v2.11 milestone Oct 3, 2023

Configuration of the YAML files and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/opensearch-sql/) is necessary. The following plugins are required for using the Apache Spark integration feature: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`.

<SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Do users need to enable `data_sources` in the YAML file? Provide configuration examples.>
Copy link
Contributor Author

@vagimeli vagimeli Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions for SMEs:

  1. Where is the demo? Where is the endpoint to test this feature as I write?
  2. What are the prerequisites to use this feature?
  3. Who are the target users? Do they need to be using OpenSearch Service?
  4. What YAML configurations need to be made? What settings need to be configured? Do users need to set data_source.enabled: false to true?
  5. Provide configuration examples.

@vagimeli vagimeli changed the title [DOC] Add new documentation for Spark [DOC] Add new documentation for data sources Oct 9, 2023
@vagimeli
Copy link
Contributor Author

10/10 Connected with Managed Services UX and tech writer to sync on documentation updates for open source and services; awaiting response


Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon S3 data source using the OpenSearch Dashboards user interface (UI). You can then query that data, optimize query performance, define tables, and integrate your S3 data from a single UI.

## Prerequisites
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised


Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface.

The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see the following:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Which APIs do we link user to? Please provide the links.


Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources and OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard.

Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuration/#configuration-file) and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) is necessary. The data sources feature flag `data_source.enabled:` must be set to `true`. The default is `false`. The following plugins also are required for integrating your data sources and OpenSearch: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Provide configuration examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamsi-amazon and @derek-ho can provide more info here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe The data sources feature flag data_source.enabled: must be set to true. This is incorrect, this is a different datasources - which is the one in core dashboards - this is a different feature. Technically opensearch-security is not required, and technically opensearch-observability is also not required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only requirements are dashboards-observability and opensearch-sql, although we may just want to call out the others as optional in case they want to use other parts of the product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised

Copy link

@brijos brijos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change all references to AWS Glue to AWS Glue Data Catalog. AWS Glue has many different features and we are working specifically with AWS Glue Data Catalog. Thank you!

@hdhalter hdhalter added 3 - Tech review PR: Tech review in progress and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Oct 12, 2023
@vagimeli vagimeli added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Oct 13, 2023
_dashboards/management/accelerate-external-data.md Outdated Show resolved Hide resolved
_dashboards/management/accelerate-external-data.md Outdated Show resolved Hide resolved
_dashboards/management/accelerate-external-data.md Outdated Show resolved Hide resolved
_dashboards/management/data-sources.md Outdated Show resolved Hide resolved

Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface.

The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see <insert links to API references>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to add the link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revised and directed reader to next steps section. still don't know what API info the user needs to be directed to. SMEs to revisit after this version of documentation is released.

_dashboards/management/data-sources.md Outdated Show resolved Hide resolved
_dashboards/management/query-data-source.md Outdated Show resolved Hide resolved
@kolchfa-aws
Copy link
Collaborator

Also, please fix the links before merging

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vagimeli Please see my comments and changes. Apologies for the number of comments re: capitalization of UI elements, but even in the screenshots, it looks like there is a page named "Data Sources" and a page named "Data sources." I just flagged what looked potentially odd to me, so feel free to ignore any of my comments on this particular issue if you know the capitalization to be correct as reflected in the UI. Thanks!

_dashboards/management/S3-data-source.md Show resolved Hide resolved
_dashboards/management/S3-data-source.md Outdated Show resolved Hide resolved
_dashboards/management/S3-data-source.md Outdated Show resolved Hide resolved
_dashboards/management/S3-data-source.md Outdated Show resolved Hide resolved
_dashboards/management/S3-data-source.md Outdated Show resolved Hide resolved

## Use Query Workbench with your Amazon S3 data source

[Query Workbench]({{site.url}}{{site.baseurl}}/search-plugins/sql/workbench/) runs on-demand SQL queries, translates SQL into its REST equivalent, and views and saves results as text, JSON, JDBC, or CSV.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"saves results in a text, JSON, JDBC, or CSV format"?

_dashboards/management/query-data-source.md Outdated Show resolved Hide resolved
<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-S3.png" alt="Query Workbench Amazon S3 data loading UI" width="700">

3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**.
4. Choose the **Describe Index** button to learn more about how data is stored in that particular index.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this button in title case in the UI?


3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**.
4. Choose the **Describe Index** button to learn more about how data is stored in that particular index.
5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As formatted, "Amazon S3 Spark" looks like the name of a product or service. If we need to reference both services, would "Amazon S3 Apache Spark job" work?

Copy link
Contributor Author

@vagimeli vagimeli Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's the intended product. For this version, I revised to use "the job that refreshes...." We'll be updating this content for 2.12, and the Spark topic will be discussed with SMEs because we need clarity.

_dashboards/management/query-data-source.md Outdated Show resolved Hide resolved
vagimeli and others added 22 commits October 13, 2023 17:30
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
This reverts commit f3007fc.

Signed-off-by: Melissa Vagi <[email protected]>
@vagimeli vagimeli merged commit da7a701 into main Oct 16, 2023
@vagimeli vagimeli deleted the spark-support branch October 16, 2023 14:42
harshavamsi pushed a commit to harshavamsi/documentation-website that referenced this pull request Oct 31, 2023
*Add new documentation for 2.11

---------

Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
vagimeli added a commit that referenced this pull request Dec 21, 2023
*Add new documentation for 2.11

---------

Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Doc review PR: Doc review in progress v2.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Spark Support - Dashboard Sources, Materialized Views, and Covering Indexes
9 participants