Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created the blog post announcing Data Prepper 2.0 #1066

Merged

Conversation

dlvenable
Copy link
Member

@dlvenable dlvenable commented Oct 7, 2022

Description

We are releasing Data Prepper 2.0.0 on Oct 10. This is our announcement blog post.

This requires the bio for @oeyh as supplied in #1067.

Issues Resolved

N/A

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Co-authored-by: Hai Yan <[email protected]>
Signed-off-by: David Venable <[email protected]>
@oeyh oeyh mentioned this pull request Oct 8, 2022
1 task
Co-authored-by: Hai Yan <[email protected]>
Signed-off-by: David Venable <[email protected]>
@dlvenable dlvenable marked this pull request as ready for review October 8, 2022 17:58
@dlvenable dlvenable requested a review from a team as a code owner October 8, 2022 17:58
@dlvenable
Copy link
Member Author

I added Hai to the list of authors based on the username he supplied in #1067. We will need to merge that in prior to this PR.

Copy link
Contributor

@Naarcha-AWS Naarcha-AWS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my rewrites for each section in the review. One comment = one section.

Might need to wait to add documentation links until this PR is merged: opensearch-project/documentation-website#1510

- technical-post
---

Today the maintainers are announcing the release of Data Prepper 2.0. It has been over a year since Data Prepper 1.0 was first introduced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this paragraph to:

The Data Prepper maintainers are proud to announce the release of Data Prepper 2.0. This release makes Data Prepper easier to use and helps you improve your observability stack based on feedback from our users.

Here are some of the major changes and enhancements made for Data Prepper 2.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe:

The Data Prepper maintainers are proud to announce the release of Data Prepper 2.0. This release makes Data Prepper easier to use and helps you improve your observability stack based on feedback from you, our users.

Here are some of the major changes and enhancements made for Data Prepper 2.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable: Could we add a line in this intro or somewhere in the blog about OpenSearch compatibility? Data Prepper 2.0 is compatible with all OpenSearch versions, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the following:

Data Prepper 2.0 retains compatibility with all current versions of OpenSearch.

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. The OTel Trace Source supported these options and now pipeline authors can configure them for their log ingestion use-cases as well.
* Data Prepper now requires Java 11 and the Docker image deploys with JDK 17.

Please see our release notes for a complete list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a link to these release notes?

- technical-post
---

Today the maintainers are announcing the release of Data Prepper 2.0. It has been over a year since Data Prepper 1.0 was first introduced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe:

The Data Prepper maintainers are proud to announce the release of Data Prepper 2.0. This release makes Data Prepper easier to use and helps you improve your observability stack based on feedback from you, our users.

Here are some of the major changes and enhancements made for Data Prepper 2.0.

Signed-off-by: David Venable <[email protected]>
@dlvenable
Copy link
Member Author

Thanks @Naarcha-AWS ! I took most of the changes to all sections except the Directory Structure. I want to check with @oeyh on those first.

I did make some tweaks from your suggestions - most of them were to try to be more accurate.

I also wasn't quite sure about some of the paragraphs. Did you intend all those paragraphs? The ones in the examples read too broken up and didn't keep the same train of thought.

Copy link
Contributor

@Naarcha-AWS Naarcha-AWS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minor tweaks before we pass them off to @natebower.

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
accepts log data from external sources such as Fluent Bit.

The pipeline then uses the `grok` processor to split the log line into multiple fields.
The `grok` processor adds named `loglevel` to the event. Pipeline authors can use that field in routes. This pipeline has two OpenSearch sinks. The first sink only receives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this up a little more:


The pipeline then uses the grok processor to split the log line into multiple fields. The grok processor adds a named loglevel to the event. Pipeline authors can use that field in routes.

This pipeline contains two OpenSearch sinks. The first sink will only receive logs with a log level of WARN or ERROR. Data Prepper will route all events to the second sink.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your suggestion and made one clarification by adding "field" which you can see here: "... adds a
field named loglevel ..."

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
Copy link
Contributor

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things:

Comment on lines 27 to 28
Simply pick a name appropriate for the domain and a Data Prepper expression.
Then for any sink that should only have some data coming through, define one or more routes to apply Data Prepper will evaluate
Copy link
Contributor

@oeyh oeyh Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this supposed to be a space, not a line break; also missing a period in front of Data Prepper will evaluate...:

Suggested change
Simply pick a name appropriate for the domain and a Data Prepper expression.
Then for any sink that should only have some data coming through, define one or more routes to apply Data Prepper will evaluate
Simply pick a name appropriate for the domain and a Data Prepper expression. Then for any sink that should only have some data coming through, define one or more routes to apply. Data Prepper will evaluate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line breaks should not affect the rendered page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a space and line break which did create a new paragraph in the rendered page. Thanks for noting that!

@dlvenable dlvenable force-pushed the data-prepper-2.0.0-blog-post branch from 509bfed to 3bc00b6 Compare October 11, 2022 19:12
@dlvenable
Copy link
Member Author

It took all the suggested changes.

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable Please see my changes and comments, and let me know if you have any questions. Thanks!

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved

One common use case for conditional routing is reducing the volume of data going to some clusters.
When you want info logs that produce large volumes of data to go to a cluster, index with more frequent rollovers, or add deletions to clear out large volumes of data, you can now configure pipelines to route the data with your chosen action.
deletions to clear out these large volumes of data, you now configure pipelines to route your data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deletions to clear out these large volumes of data, you now configure pipelines to route your data.

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved


Simply pick a name appropriate for the domain and a Data Prepper expression.
Then for any sink that should only have some data coming through, define one or more routes to apply. Data Prepper will evaluate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second sentence: "to route these events to"?

_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md Outdated Show resolved Hide resolved
For example, when one large object includes a serialized JSON string, you can use the `parse_json` processor to extract
the fields from the JSON into your event.

Data Prepper can now import CSV or TSV formatted files from Amazon S3 sources. This is useful for systems like Amazon CloudFront
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove "formatted"? Otherwise, this would need to be "CSV- or TSV-formatted files".

Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them.

* The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones.
* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.
* The HTTP source now supports loading SSL/TLS credentials from either Amazon S3 or AWS Certificate Manager (ACM). Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe either SSL/TLS or TLS/SSL is in use. I intentially chose TLS/SSL because we are using TLS. The SSL part is mostly there for historical reasons.

You can also see that the term TLS/SSL is used in the following Wikipedia article.

https://en.wikipedia.org/wiki/Transport_Layer_Security

Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them.

* The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones.
* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we were referring to AWS Certificate Manager (ACM).

* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.
* Data Prepper now requires Java 11 or higher. The Docker image deploys with JDK 17.

Please see our [release notes](https://github.com/opensearch-project/data-prepper/releases/tag/2.0.0) for a complete list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing we're missing here is a call to action. We need to conclude with a couple sentences telling the reader what we'd like for them to do next or where they can go to learn more. The below is an example from a recent blog post announcing Snapshot Management (SM):

Wrapping it up

SM automates taking snapshots of your cluster and provides useful features like notifications. To learn more about SM, check out the SM documentation section. For more technical details, read the SM meta issue.

If you’re interested in snapshots, consider contributing to the next improvement we’re working on: searchable snapshots.

@dlvenable dlvenable force-pushed the data-prepper-2.0.0-blog-post branch from 4fd8501 to 0f160f1 Compare October 12, 2022 18:18
Copy link
Member Author

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed all the changes except the final call-to-action section. I will push that soon.

With peer forwarding as a core feature, pipeline authors can perform stateful
aggregations on multiple Data Prepper nodes. When performing stateful aggregations, Data Prepper uses a hash ring to determine
which nodes are responsible for processing different events based on the values of certain fields. Peer forwarder
routes events to the node responsible for processing the event. That node then holds all the state necessary for performing the aggregation.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the change to "states" here. Using a singular noun for state is quite common.

In information technology and computer science, a system is described as stateful if it is designed to remember preceding events or user interactions; the remembered information is called the state of the system.

https://en.wikipedia.org/wiki/State_(computer_science)

Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them.

* The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones.
* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe either SSL/TLS or TLS/SSL is in use. I intentially chose TLS/SSL because we are using TLS. The SSL part is mostly there for historical reasons.

You can also see that the term TLS/SSL is used in the following Wikipedia article.

https://en.wikipedia.org/wiki/Transport_Layer_Security

@natebower
Copy link
Collaborator

@dlvenable I changed to "states" to match @Naarcha-AWS edits and also to avoid "all the state". If you want to use "state", remove "all".

@dlvenable
Copy link
Member Author

@dlvenable I changed to "states" to match @Naarcha-AWS edits and also to avoid "all the state". If you want to use "state", remove "all".

That makes sense. I've removed "all" from the sentence.

@dlvenable
Copy link
Member Author

I have also pushed a short conclusion section.


## Try Data Prepper 2.0

Data Prepper 2.0 is available for [download](https://opensearch.org/downloads.html#data-prepper) now. The maintainers encourage you to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a blog, an exclamation point would work after the first sentence. Other than that small nit, LGTM.

natebower
natebower previously approved these changes Oct 12, 2022
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: David Venable <[email protected]>
Copy link
Member

@krisfreedain krisfreedain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@krisfreedain krisfreedain merged commit b535e82 into opensearch-project:main Oct 12, 2022
@dlvenable dlvenable deleted the data-prepper-2.0.0-blog-post branch July 12, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants