Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clarification about OpenSearch ingest pipelines compared to Data … #6786

Merged
merged 11 commits into from
Apr 4, 2024

Conversation

vagimeli
Copy link
Contributor

@vagimeli vagimeli commented Mar 25, 2024

…Prepper

Description

Clarifies preferred data ingestion tools for OpenSearch

Issues Resolved

Closes parts of #6429

Checklist

  • [ X] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@vagimeli
Copy link
Contributor Author

@dlvenable @gaobinlong Thank you for providing context about ingest pipelines and Data Prepper. Please review this PR at your availability. I'll be out of office Mar 26-April 2, so please reach out to @hdhalter if this PR needs changes or reviews before I return. Thank you, Melissa

@hdhalter
Copy link
Contributor

Thanks, Melissa! I will update the label to 'tech review' for sign off by @dlvenable and @gaobinlong.

@hdhalter hdhalter added 3 - Tech review PR: Tech review in progress and removed 2 - In progress Issue/PR: The issue or PR is in progress. labels Mar 25, 2024
@hdhalter
Copy link
Contributor

Tagging @Naarcha-AWS for doc review.

@hdhalter hdhalter assigned Naarcha-AWS and unassigned vagimeli Mar 25, 2024
@hdhalter hdhalter assigned dlvenable and unassigned Naarcha-AWS Mar 26, 2024
@vagimeli
Copy link
Contributor Author

vagimeli commented Apr 3, 2024

Tagging @Naarcha-AWS for doc review.

Hi @Naarcha-AWS @dlvenable Please review this PR as soon as your schedule permits. Thank you.

@vagimeli vagimeli requested a review from epugh as a code owner April 3, 2024 19:36
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this clarity @vagimeli !

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vagimeli Please see my comments and changes and let me know if you have any questions. Thanks!

_data-prepper/index.md Outdated Show resolved Hide resolved
_data-prepper/index.md Outdated Show resolved Hide resolved

Data Prepper lets users build custom pipelines to improve the operational view of applications. Two common uses for Data Prepper are trace and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/observability-plugin/trace/index/) can help you visualize the flow of events and identify performance problems, and [log analytics]({{site.url}}{{site.baseurl}}/observability-plugin/log-analytics/) can improve searching, analyzing and provide insights into your application.
With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common uses for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/observability-plugin/trace/index/) can help you visualize events flow and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/observability-plugin/log-analytics/) can help improve searching, analyzing and give you deeper insights into your application.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence needs some clarification. Improve search and analytics what or how?

Copy link
Contributor Author

@vagimeli vagimeli Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this read better? Log analytics equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

_ingest-pipelines/index.md Outdated Show resolved Hide resolved
_ingest-pipelines/index.md Outdated Show resolved Hide resolved
_ingest-pipelines/index.md Outdated Show resolved Hide resolved

OpenSearch ingest pipelines perform actions on indexes and are preferred for use cases involving pre-processing simple datasets, [machine learning processors]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/sparse-encoding/), and [vector embedding processors]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/text-image-embedding/). OpenSearch ingest pipelines are recommended for simple data pre-processing and small datasets.

Data Prepper is the preferred data ingestion tool for OpenSearch. Data Prepper is recommended for any data processing that it supports and for use cases involving the transferring and fetching of large datasets and complex data pre-processing. Refer to the [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/) documentation to learn more information.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence: This reads as though a noun should follow "data processing". Data processing what that it supports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a better rewrite? Data Prepper is recommended for any data processing tasks it supports, particularly when dealing with large datasets and complex data pre-processing requirements. It streamlines the process of transferring and fetching large datasets, while providing robust capabilities for intricate data preparation and transformation operations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that works, but no comma in the second sentence 😄


Data Prepper is the preferred data ingestion tool for OpenSearch. Data Prepper is recommended for any data processing that it supports and for use cases involving the transferring and fetching of large datasets and complex data pre-processing. Refer to the [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/) documentation to learn more information.

OpenSearch ingest pipelines can only be managed using [ingest API operations]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the link, should "Ingest" be capitalized (is it the name of the API)?

_ingest-pipelines/processors/grok.md Outdated Show resolved Hide resolved
vagimeli and others added 2 commits April 4, 2024 09:36
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
vagimeli and others added 6 commits April 4, 2024 09:37
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Melissa Vagi <[email protected]>
@vagimeli vagimeli merged commit fd8bd45 into main Apr 4, 2024
5 checks passed
@vagimeli vagimeli deleted the add-pipelines-use-cases branch April 4, 2024 16:48
@vagimeli vagimeli removed the 3 - Tech review PR: Tech review in progress label Apr 4, 2024
@vagimeli vagimeli added the backport 2.13 PR: Backport label for 2.13 label May 8, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request May 8, 2024
#6786)

* Add clarification about OpenSearch ingest pipelines compared to Data Prepper

---------

Signed-off-by: Melissa Vagi <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit fd8bd45)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants