-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC] Add new documentation for data sources #5127
Conversation
_dashboards/management/spark.md
Outdated
|
||
Configuration of the YAML files and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/opensearch-sql/) is necessary. The following plugins are required for using the Apache Spark integration feature: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`. | ||
|
||
<SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Do users need to enable `data_sources` in the YAML file? Provide configuration examples.> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions for SMEs:
- Where is the demo? Where is the endpoint to test this feature as I write?
- What are the prerequisites to use this feature?
- Who are the target users? Do they need to be using OpenSearch Service?
- What YAML configurations need to be made? What settings need to be configured? Do users need to set
data_source.enabled: false
totrue
? - Provide configuration examples.
10/10 Connected with Managed Services UX and tech writer to sync on documentation updates for open source and services; awaiting response |
|
||
Starting with OpenSearch 2.11, you can connect OpenSearch to your Amazon S3 data source using the OpenSearch Dashboards user interface (UI). You can then query that data, optimize query performance, define tables, and integrate your S3 data from a single UI. | ||
|
||
## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- <SMEs: What are the prerequisites? Installing specific plugins? update cluster settings? Provide settings examples.>
- <Do we need to mention anything about the API?>
- <Does Snapshot Management S3 documentation relate to this topic? https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#amazon-s3>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This feature is not related to snapshots.
- Here are the details on prereqs for plugin: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/connectors/s3glue_connector.rst#id2
- Here are the API samples similar to how datasources were added for prometheus earlier: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/admin/datasources.rst#introduction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised
|
||
Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface. | ||
|
||
The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Which APIs do we link user to? Please provide the links.
|
||
Once you have installed OpenSearch and OpenSearch Dashboards, you can use Dashboards to connect your data sources and OpenSearch and then use Dashboards to manage data sources, create index patterns based on those data sources, run queries against a specific data source, and combine visualizations in one dashboard. | ||
|
||
Configuration of the [YAML files]({{site.url}}{{site.baseurl}}/install-and-configure/configuration/#configuration-file) and installation of certain [OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/) is necessary. The data sources feature flag `data_source.enabled:` must be set to `true`. The default is `false`. The following plugins also are required for integrating your data sources and OpenSearch: `opensearch-sql`, `opensearch-security`, and `opensearch-observability`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SME provide information: What are prerequisites? Do you need to have OpenSearch Service to use this feature? What YAML configuration is necessary? What settings need to be configured? Provide configuration examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vamsi-amazon and @derek-ho can provide more info here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe The data sources feature flag data_source.enabled: must be set to true.
This is incorrect, this is a different datasources - which is the one in core dashboards - this is a different feature. Technically opensearch-security
is not required, and technically opensearch-observability
is also not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only requirements are dashboards-observability
and opensearch-sql
, although we may just want to call out the others as optional in case they want to use other parts of the product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change all references to AWS Glue to AWS Glue Data Catalog. AWS Glue has many different features and we are working specifically with AWS Glue Data Catalog. Thank you!
|
||
Data sources in OpenSearch are the system and applications that OpenSearch can connect to and ingest data from. Once your data sources have been connected and your data has been ingested, it can be indexed, searched, and analyzed using [REST APIs]({{site.url}}{{site.baseurl}}/api-reference/index/) or the OpenSearch Dashboards user interface. | ||
|
||
The focus of this documentation is on using the OpenSeach Dashboards interface to connect and manage your data sources. For information about using an API to connect data sources, see <insert links to API references>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add the link
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised and directed reader to next steps section. still don't know what API info the user needs to be directed to. SMEs to revisit after this version of documentation is released.
Also, please fix the links before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vagimeli Please see my comments and changes. Apologies for the number of comments re: capitalization of UI elements, but even in the screenshots, it looks like there is a page named "Data Sources" and a page named "Data sources." I just flagged what looked potentially odd to me, so feel free to ignore any of my comments on this particular issue if you know the capitalization to be correct as reflected in the UI. Thanks!
|
||
## Use Query Workbench with your Amazon S3 data source | ||
|
||
[Query Workbench]({{site.url}}{{site.baseurl}}/search-plugins/sql/workbench/) runs on-demand SQL queries, translates SQL into its REST equivalent, and views and saves results as text, JSON, JDBC, or CSV. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"saves results in a text, JSON, JDBC, or CSV format"?
<img src="{{site.url}}{{site.baseurl}}/images/dashboards/query-workbench-S3.png" alt="Query Workbench Amazon S3 data loading UI" width="700"> | ||
|
||
3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**. | ||
4. Choose the **Describe Index** button to learn more about how data is stored in that particular index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this button in title case in the UI?
|
||
3. View the databases listed in the left-side navigation menu and select a database to view its details. Any information about acceleration indexes is listed under **Acceleration index destination**. | ||
4. Choose the **Describe Index** button to learn more about how data is stored in that particular index. | ||
5. Choose the **Drop index** button to delete and clear both the OpenSearch index and the Amazon S3 Spark job that refreshes the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As formatted, "Amazon S3 Spark" looks like the name of a product or service. If we need to reference both services, would "Amazon S3 Apache Spark job" work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that's the intended product. For this version, I revised to use "the job that refreshes...." We'll be updating this content for 2.12, and the Spark topic will be discussed with SMEs because we need clarity.
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
This reverts commit f3007fc. Signed-off-by: Melissa Vagi <[email protected]>
02d6f94
to
c384b10
Compare
This reverts commit c384b10.
Signed-off-by: Melissa Vagi <[email protected]>
*Add new documentation for 2.11 --------- Signed-off-by: Melissa Vagi <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
*Add new documentation for 2.11 --------- Signed-off-by: Melissa Vagi <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Description
Add new end user documentation for data sources, including connecting external data sources (S3, Prometheus) and speeding up external data ingestion' revise multiple data sources to remove redundancy with general data sources page
Issues Resolved
Fixes #5061
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.