From 1b1a91e3163b61c881b49620a38a43987c8bb51f Mon Sep 17 00:00:00 2001 From: David Venable Date: Fri, 7 Oct 2022 18:14:45 -0500 Subject: [PATCH 01/11] Created the blog post announcing Data Prepper 2.0. Co-authored-by: Hai Yan Signed-off-by: David Venable --- ...022-10-10-Announcing-Data-Prepper-2.0.0.md | 150 ++++++++++++++++++ 1 file changed, 150 insertions(+) create mode 100644 _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md new file mode 100644 index 0000000000..8859d40ce4 --- /dev/null +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -0,0 +1,150 @@ +--- +layout: post +title: "Announcing Data Prepper 2.0.0" +authors: +- dlv +date: 2022-10-10 15:00:00 -0500 +categories: + - technical-post +--- + +Today the maintainers are announcing the release of Data Prepper 2.0. It has been over a year since Data Prepper 1.0 was first introduced +and this release introduces significant changes based on feedback from our users. This release makes Data Prepper easier to use and helps +you improve your observability stack. This post will highlight some major changes and enhancements in this release. + +## Conditional routing + +Often time with log ingestion, pipeline authors need to send different logs to certain OpenSearch clusters. One example of this is routing logs based on log levels. +Perhaps you want info logs which produce large volumes of data to go to a cluster or index that has more frequent rollovers or deletions to clear out these large volumes of data. + +Now Data Prepper supports conditional routing to help with use-cases such as these. A pipeline author can configure routes. +The author will define a name that is appropriate for the domain and a Data Prepper expression. +Then for any sink that should only have some data coming through, define one or more routes to apply Data Prepper will evaluate +these expressions for each event to determine which sinks to route these events to. Any sink that has no routes defined will accept all events. + +Continuing with log-levels, consider an application log which includes log data. A common Java application log might look like the following. + +``` +2022-10-10T10:10:10,421 [main] INFO org.example.Application - Saving 10 records to SQL table "orders" +``` + +The text that reads `INFO` indicates that this is an INFO-level log. Data Prepper pipeline authors can now route logs with this level to only certain OpenSearch clusters. + +The following example pipeline shows how this works. This pipeline takes application logs from the `http` source. This source +accepts log data from external sources such as Fluent Bit. The pipeline then uses the `grok` processor to split the log line into multiple fields. +Now the event has a field named `loglevel` that authors can use in routes. This pipeline has two OpenSearch sinks. The first sink only receives +logs with a log level of `WARN` or `ERROR`. Data Prepper will route all events to the second sink. + +``` +application-log-pipeline: + workers: 4 + delay: "50" + source: + http: + processor: + - grok: + match: + log: [ "%{NOTSPACE:time} %{NOTSPACE:thread} %{NOTSPACE:loglevel} %{NOTSPACE:class} - %{GREEDYDATA:message}" ] + + route: + - warn_and_above: '/loglevel == "WARN" or /loglevel == "ERROR"' + sink: + - opensearch: + routes: + - warn_and_above + hosts: ["https://opensearch:9200"] + insecure: true + username: "admin" + password: "admin" + index: warn-and-above-logs + - opensearch: + hosts: ["https://opensearch:9200"] + insecure: true + username: "admin" + password: "admin" + index: all-logs +``` + +There are many other use-cases that conditional routing can support. If there are other conditional expressions +you’d like to see support for, please create an issue in GitHub. + +## Peer Forwarder + +Data Prepper supports stateful aggregations for traces and logs. With these, pipeline authors can improve the quality of the data going into OpenSearch. +Previous to Data Prepper 2.0, performing stateful trace aggregations required using the `peer-forwarder` processor plugin. +But this plugin only worked for traces and would send data back to the source. Also, log aggregations only worked on a single node. + +Data Prepper introduces peer forwarding as a core feature in Data Prepper 2.0. This allows pipeline authors to perform stateful +aggregations on multiple Data Prepper nodes. When performing stateful aggregations, Data Prepper uses a hash ring to determine +which nodes are responsible for processing different events based on the values of certain fields. Data Prepper's core peer-forwarder +routes events to the node responsible for processing the event. That node then holds all the state necessary for performing the aggregation. + +To use peer forwarding, you will configure how Data Prepper discovers other nodes and the security for connections in your +`data-prepper-config.yaml` file. The following snippet shows an example of how to do this. + +``` +peer_forwarder: + discovery_mode: dns + domain_name: "my-data-prepper-cluster.production" + ssl_certificate_file: /usr/share/data-prepper/config/my-certificate.crt + ssl_key_file: /usr/share/data-prepper/config/my-certificate.key + ssl_fingerprint_verification_only: true + authentication: + mutual_tls: +``` + +In the example above, Data Prepper will discover other peers using DNS. It will perform a DNS query on the domain `my-data-prepper-cluster.production`. +This DNS record should be an A record with a list of IP addresses for peers. The configuration uses a custom certificate and private key. +It performs host verification by checking the fingerprint of the certificate. And finally it configures each server to authenticate requests using +Mutual TLS (mTLS) to prevent tampering of data. + +## Directory structure + +Previously, Data Prepper was distributed as a single executable JAR file. This is simple and convenient, but also makes it difficult for Data Prepper +to include custom plugins. Data Prepper 2.0 introduces a change for it and now distributes the application in a bundled directory structure. +The new directory structure features a shell script to launch Data Prepper and dedicated subdirectories for JAR files, configurations, pipelines, logs, and more. +The directory structure looks like this: + +``` +data-prepper-2.0.0/ + bin/ + data-prepper # Shell script to run Data Prepper + config/ + data-prepper-config.yaml # The Data Prepper configuration file + log4j.properties # Logging configuration + pipelines/ # New directory for pipelines + trace-analytics.yaml + log-ingest.yaml + lib/ + data-prepper-core.jar + ... any other jar files + logs/ +``` + +With this change, a user can launch Data Prepper by simply running `bin/data-prepper`. No additional command line arguments or Java system property definitions +are required. Instead, the application will load configurations from `config/` subdirectory. + +Data Prepper will also read pipeline configurations from `pipelines/` subdirectory. Users can now define pipelines across +multiple YAML files in the subdirectory, where each file contains the configuration for one or more pipelines. This will +allow users to keep their pipeline definitions distinct and thus more compact and focused. + +## JSON & CSV parsing + +Many of our users have incoming data with embedded JSON or CSV fields. Now Data Prepper supports parsing either JSON or CSV. + +A common example of this is when one larger object includes a serialized JSON string. If your incoming event data has a +serialized JSON string, you can use the `parse_json` processor to extract the fields from the JSON into your event. + +Data Prepper can now import CSV or TSV formatted files from Amazon S3 sources. This is useful for systems like Amazon CloudFront +which write their access logs as TSV files. Now you can parse these logs using Data Prepper. Additionally, if your events have +CSV or TSV fields, Data Prepper has a `csv` processor which can create fields from your incoming CSV data. + +## Other improvements + +Data Prepper 2.0 includes a number of other improvements. We’d like to highlight a few of them. + +* The OpenSearch sink now supports create actions to OpenSearch. When Data Prepper writes documents to OpenSearch it normally does this via an update action. This will create the document if it does not exist or update it. Now a pipeline author can configure Data Prepper to use the create action. When this is configured, the OpenSearch cluster will not update the document if it already exists. Some scenarios call of for using this so that documents are only saved once and never updated. +* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. The OTel Trace Source supported these options and now pipeline authors can configure them for their log ingestion use-cases as well. +* Data Prepper now requires Java 11 and the Docker image deploys with JDK 17. + +Please see our release notes for a complete list. From 616f36269d120f2287a0aeac6e86c046f43548dc Mon Sep 17 00:00:00 2001 From: David Venable Date: Sat, 8 Oct 2022 12:57:32 -0500 Subject: [PATCH 02/11] Adding oeyh to the author list. Co-authored-by: Hai Yan Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 8859d40ce4..8128fe41b0 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -3,6 +3,7 @@ layout: post title: "Announcing Data Prepper 2.0.0" authors: - dlv +- oeyh date: 2022-10-10 15:00:00 -0500 categories: - technical-post From 8e1ec9e2cc569386c7a92650ee08b39d7f63c0bd Mon Sep 17 00:00:00 2001 From: David Venable Date: Tue, 11 Oct 2022 12:19:56 -0500 Subject: [PATCH 03/11] PR feedback on the blog post. Signed-off-by: David Venable --- ...022-10-10-Announcing-Data-Prepper-2.0.0.md | 78 +++++++++++-------- 1 file changed, 46 insertions(+), 32 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 8128fe41b0..9a8c4cdab7 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -9,21 +9,26 @@ categories: - technical-post --- -Today the maintainers are announcing the release of Data Prepper 2.0. It has been over a year since Data Prepper 1.0 was first introduced -and this release introduces significant changes based on feedback from our users. This release makes Data Prepper easier to use and helps -you improve your observability stack. This post will highlight some major changes and enhancements in this release. +The Data Prepper maintainers are proud to announce the release of Data Prepper 2.0. This release makes Data Prepper +easier to use and helps you improve your observability stack based on feedback from our users. Data Prepper 2.0 retains +compatibility with all current versions of OpenSearch. + +Here are some of the major changes and enhancements made for Data Prepper 2.0. ## Conditional routing -Often time with log ingestion, pipeline authors need to send different logs to certain OpenSearch clusters. One example of this is routing logs based on log levels. -Perhaps you want info logs which produce large volumes of data to go to a cluster or index that has more frequent rollovers or deletions to clear out these large volumes of data. +Now Data Prepper 2.0 supports conditional routing to help pipeline authors send different logs to specific OpenSearch clusters. + +One common use-case this supports is to reducing the volume of data going to some clusters. +When you want info logs that produce large volumes of data to go to a cluster or index with more frequent rollovers or +deletions to clear out these large volumes of data, you now configure pipelines to route your data. -Now Data Prepper supports conditional routing to help with use-cases such as these. A pipeline author can configure routes. -The author will define a name that is appropriate for the domain and a Data Prepper expression. + +Simply pick a name appropriate for the domain and a Data Prepper expression. Then for any sink that should only have some data coming through, define one or more routes to apply Data Prepper will evaluate these expressions for each event to determine which sinks to route these events to. Any sink that has no routes defined will accept all events. -Continuing with log-levels, consider an application log which includes log data. A common Java application log might look like the following. +For example, consider an application log that includes log data. A typical Java application log might look like the following. ``` 2022-10-10T10:10:10,421 [main] INFO org.example.Application - Saving 10 records to SQL table "orders" @@ -31,9 +36,11 @@ Continuing with log-levels, consider an application log which includes log data. The text that reads `INFO` indicates that this is an INFO-level log. Data Prepper pipeline authors can now route logs with this level to only certain OpenSearch clusters. -The following example pipeline shows how this works. This pipeline takes application logs from the `http` source. This source -accepts log data from external sources such as Fluent Bit. The pipeline then uses the `grok` processor to split the log line into multiple fields. -Now the event has a field named `loglevel` that authors can use in routes. This pipeline has two OpenSearch sinks. The first sink only receives +The following example pipeline takes application logs from the `http` source. This source +accepts log data from external sources such as Fluent Bit. + +The pipeline then uses the `grok` processor to split the log line into multiple fields. +The `grok` processor adds named `loglevel` to the event. Pipeline authors can use that field in routes. This pipeline has two OpenSearch sinks. The first sink only receives logs with a log level of `WARN` or `ERROR`. Data Prepper will route all events to the second sink. ``` @@ -71,17 +78,25 @@ you’d like to see support for, please create an issue in GitHub. ## Peer Forwarder -Data Prepper supports stateful aggregations for traces and logs. With these, pipeline authors can improve the quality of the data going into OpenSearch. -Previous to Data Prepper 2.0, performing stateful trace aggregations required using the `peer-forwarder` processor plugin. -But this plugin only worked for traces and would send data back to the source. Also, log aggregations only worked on a single node. +Data Prepper 2.0 introduces peer forwarding as a core feature. -Data Prepper introduces peer forwarding as a core feature in Data Prepper 2.0. This allows pipeline authors to perform stateful +Previous to Data Prepper 2.0, performing stateful trace aggregations required using the peer-forwarder processor plugin. +But this plugin only worked for traces and would send data back to the source. Also, log aggregations only worked on a +single node. + +With peer forwarding as a core feature, pipeline authors can perform stateful aggregations on multiple Data Prepper nodes. When performing stateful aggregations, Data Prepper uses a hash ring to determine -which nodes are responsible for processing different events based on the values of certain fields. Data Prepper's core peer-forwarder +which nodes are responsible for processing different events based on the values of certain fields. Peer forwarder routes events to the node responsible for processing the event. That node then holds all the state necessary for performing the aggregation. -To use peer forwarding, you will configure how Data Prepper discovers other nodes and the security for connections in your -`data-prepper-config.yaml` file. The following snippet shows an example of how to do this. +To use peer forwarding, configure how Data Prepper discovers other nodes and the security for connections in your +`data-prepper-config.yaml` file. + +In the following example, Data Prepper discovers other peers using a DNS query on the `my-data-prepper-cluster.production` domain. +When using peer forwarder with DNS, the DNS record should be an A record with a list of IP addresses for peers. The example also uses a custom certificate and private key. +For host verification, it checks the fingerprint of the certificate. Lastly, it configures each server to authenticate requests using +Mutual TLS (mTLS) to prevent data tampering. + ``` peer_forwarder: @@ -94,10 +109,6 @@ peer_forwarder: mutual_tls: ``` -In the example above, Data Prepper will discover other peers using DNS. It will perform a DNS query on the domain `my-data-prepper-cluster.production`. -This DNS record should be an A record with a list of IP addresses for peers. The configuration uses a custom certificate and private key. -It performs host verification by checking the fingerprint of the certificate. And finally it configures each server to authenticate requests using -Mutual TLS (mTLS) to prevent tampering of data. ## Directory structure @@ -131,21 +142,24 @@ allow users to keep their pipeline definitions distinct and thus more compact an ## JSON & CSV parsing -Many of our users have incoming data with embedded JSON or CSV fields. Now Data Prepper supports parsing either JSON or CSV. +Many of our users have incoming data with embedded JSON or CSV fields. To help in these use-cases, Data Prepper 2.0 +supports parsing JSON or CSV. -A common example of this is when one larger object includes a serialized JSON string. If your incoming event data has a -serialized JSON string, you can use the `parse_json` processor to extract the fields from the JSON into your event. +For example, when one large object includes a serialized JSON string, you can use the `parse_json` processor to extract +the fields from the JSON into your event. Data Prepper can now import CSV or TSV formatted files from Amazon S3 sources. This is useful for systems like Amazon CloudFront -which write their access logs as TSV files. Now you can parse these logs using Data Prepper. Additionally, if your events have -CSV or TSV fields, Data Prepper has a `csv` processor which can create fields from your incoming CSV data. +which write their access logs as TSV files. Now you can parse these logs using Data Prepper. + +Additionally, if your events have +CSV or TSV fields, Data Prepper 2.0 now contains a `csv` processor which can create fields from your incoming CSV data. ## Other improvements -Data Prepper 2.0 includes a number of other improvements. We’d like to highlight a few of them. +Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them. -* The OpenSearch sink now supports create actions to OpenSearch. When Data Prepper writes documents to OpenSearch it normally does this via an update action. This will create the document if it does not exist or update it. Now a pipeline author can configure Data Prepper to use the create action. When this is configured, the OpenSearch cluster will not update the document if it already exists. Some scenarios call of for using this so that documents are only saved once and never updated. -* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. The OTel Trace Source supported these options and now pipeline authors can configure them for their log ingestion use-cases as well. -* Data Prepper now requires Java 11 and the Docker image deploys with JDK 17. +* The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones. +* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. The OTel Trace Source supported these options; pipeline authors can now configure them for their log ingestion use-cases. +* Data Prepper now requires Java 11 or higher, and the Docker image deploys with JDK 17. -Please see our release notes for a complete list. +Please see our [release notes](https://github.com/opensearch-project/data-prepper/releases/tag/2.0.0) for a complete list. From 39fb09486b14686738361358b33715e5b0342d69 Mon Sep 17 00:00:00 2001 From: David Venable Date: Tue, 11 Oct 2022 12:35:10 -0500 Subject: [PATCH 04/11] PR feedback to the Directory structure section. Signed-off-by: David Venable --- ...022-10-10-Announcing-Data-Prepper-2.0.0.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 9a8c4cdab7..47d082dd87 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -112,10 +112,11 @@ peer_forwarder: ## Directory structure -Previously, Data Prepper was distributed as a single executable JAR file. This is simple and convenient, but also makes it difficult for Data Prepper -to include custom plugins. Data Prepper 2.0 introduces a change for it and now distributes the application in a bundled directory structure. -The new directory structure features a shell script to launch Data Prepper and dedicated subdirectories for JAR files, configurations, pipelines, logs, and more. -The directory structure looks like this: +Before the release of Data Prepper 2.0, we distributed Data Prepper as a single executable JAR file. While convenient, +it made it difficult for us to include custom plugins. + +We now distribute Data Prepper 2.0 in a bundled directory structure. This structure features a shell script to launch +Data Prepper and dedicated subdirectories for JAR files, configurations, pipelines, logs, and more. ``` data-prepper-2.0.0/ @@ -133,12 +134,12 @@ data-prepper-2.0.0/ logs/ ``` -With this change, a user can launch Data Prepper by simply running `bin/data-prepper`. No additional command line arguments or Java system property definitions -are required. Instead, the application will load configurations from `config/` subdirectory. +You now can launch Data Prepper by running `bin/data-prepper`; no need for additional command line arguments or Java system +property definitions. Instead, the application loads configurations from the `config/` subdirectory. -Data Prepper will also read pipeline configurations from `pipelines/` subdirectory. Users can now define pipelines across -multiple YAML files in the subdirectory, where each file contains the configuration for one or more pipelines. This will -allow users to keep their pipeline definitions distinct and thus more compact and focused. +Data Prepper 2.0 reads pipeline configurations from the `pipelines/` subdirectory. You can now define pipelines across +multiple YAML files in the subdirectory, where each file contains the definition for one or more pipelines. The directory +also helps keep pipeline definition distinct and, therefore, more compact and focused. ## JSON & CSV parsing From 3bc00b6248c5b07e23ad7c8baee9ef12b915beb9 Mon Sep 17 00:00:00 2001 From: David Venable Date: Tue, 11 Oct 2022 14:12:43 -0500 Subject: [PATCH 05/11] Applied suggestions from review. Signed-off-by: David Venable --- ...022-10-10-Announcing-Data-Prepper-2.0.0.md | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 47d082dd87..3b758de749 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -19,14 +19,14 @@ Here are some of the major changes and enhancements made for Data Prepper 2.0. Now Data Prepper 2.0 supports conditional routing to help pipeline authors send different logs to specific OpenSearch clusters. -One common use-case this supports is to reducing the volume of data going to some clusters. -When you want info logs that produce large volumes of data to go to a cluster or index with more frequent rollovers or +One common use case for conditional routing is reducing the volume of data going to some clusters. +When you want info logs that produce large volumes of data to go to a cluster, index with more frequent rollovers, or add deletions to clear out large volumes of data, you can now configure pipelines to route the data with your chosen action. deletions to clear out these large volumes of data, you now configure pipelines to route your data. Simply pick a name appropriate for the domain and a Data Prepper expression. -Then for any sink that should only have some data coming through, define one or more routes to apply Data Prepper will evaluate -these expressions for each event to determine which sinks to route these events to. Any sink that has no routes defined will accept all events. +Then for any sink that should only have some data coming through, define one or more routes to apply. Data Prepper will evaluate +these expressions for each event to determine which sinks to route these events. Any sink that has no routes defined will accept all events. For example, consider an application log that includes log data. A typical Java application log might look like the following. @@ -73,14 +73,14 @@ application-log-pipeline: index: all-logs ``` -There are many other use-cases that conditional routing can support. If there are other conditional expressions +There are many other use cases that conditional routing can support. If there are other conditional expressions you’d like to see support for, please create an issue in GitHub. ## Peer Forwarder Data Prepper 2.0 introduces peer forwarding as a core feature. -Previous to Data Prepper 2.0, performing stateful trace aggregations required using the peer-forwarder processor plugin. +Previous to Data Prepper 2.0, performing stateful trace aggregations required using the peer forwarder processor plugin. But this plugin only worked for traces and would send data back to the source. Also, log aggregations only worked on a single node. @@ -143,7 +143,7 @@ also helps keep pipeline definition distinct and, therefore, more compact and fo ## JSON & CSV parsing -Many of our users have incoming data with embedded JSON or CSV fields. To help in these use-cases, Data Prepper 2.0 +Many of our users have incoming data with embedded JSON or CSV fields. To help in these use cases, Data Prepper 2.0 supports parsing JSON or CSV. For example, when one large object includes a serialized JSON string, you can use the `parse_json` processor to extract @@ -153,14 +153,14 @@ Data Prepper can now import CSV or TSV formatted files from Amazon S3 sources. T which write their access logs as TSV files. Now you can parse these logs using Data Prepper. Additionally, if your events have -CSV or TSV fields, Data Prepper 2.0 now contains a `csv` processor which can create fields from your incoming CSV data. +CSV or TSV fields, Data Prepper 2.0 now contains a `csv` processor that can create fields from your incoming CSV data. ## Other improvements Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them. * The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones. -* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. The OTel Trace Source supported these options; pipeline authors can now configure them for their log ingestion use-cases. -* Data Prepper now requires Java 11 or higher, and the Docker image deploys with JDK 17. +* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options. +* Data Prepper now requires Java 11 or higher. The Docker image deploys with JDK 17. Please see our [release notes](https://github.com/opensearch-project/data-prepper/releases/tag/2.0.0) for a complete list. From b57dd0b4e5536fc8d7f9b76c8df735252549e304 Mon Sep 17 00:00:00 2001 From: David Venable Date: Tue, 11 Oct 2022 14:35:29 -0500 Subject: [PATCH 06/11] Other corrections from the review. Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 3b758de749..463aad576f 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -24,7 +24,7 @@ When you want info logs that produce large volumes of data to go to a cluster, i deletions to clear out these large volumes of data, you now configure pipelines to route your data. -Simply pick a name appropriate for the domain and a Data Prepper expression. +Simply pick a name appropriate for the domain and a Data Prepper expression. Then for any sink that should only have some data coming through, define one or more routes to apply. Data Prepper will evaluate these expressions for each event to determine which sinks to route these events. Any sink that has no routes defined will accept all events. @@ -39,9 +39,11 @@ The text that reads `INFO` indicates that this is an INFO-level log. Data Preppe The following example pipeline takes application logs from the `http` source. This source accepts log data from external sources such as Fluent Bit. -The pipeline then uses the `grok` processor to split the log line into multiple fields. -The `grok` processor adds named `loglevel` to the event. Pipeline authors can use that field in routes. This pipeline has two OpenSearch sinks. The first sink only receives -logs with a log level of `WARN` or `ERROR`. Data Prepper will route all events to the second sink. +The pipeline then uses the `grok` processor to split the log line into multiple fields. The `grok` processor adds a +field named `loglevel` to the event. Pipeline authors can use that field in routes. + +This pipeline contains two OpenSearch sinks. The first sink will only receive logs with a log level of `WARN` or `ERROR`. +Data Prepper will route all events to the second sink. ``` application-log-pipeline: From 0f160f1472282852f6e4f2ded9ffbe7e0fbbacdf Mon Sep 17 00:00:00 2001 From: David Venable Date: Wed, 12 Oct 2022 13:18:41 -0500 Subject: [PATCH 07/11] Applied suggestions from review. Signed-off-by: David Venable --- ...2022-10-10-Announcing-Data-Prepper-2.0.0.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 463aad576f..154ab2f683 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -17,14 +17,14 @@ Here are some of the major changes and enhancements made for Data Prepper 2.0. ## Conditional routing -Now Data Prepper 2.0 supports conditional routing to help pipeline authors send different logs to specific OpenSearch clusters. +Data Prepper 2.0 supports conditional routing to help pipeline authors send different logs to specific OpenSearch clusters. One common use case for conditional routing is reducing the volume of data going to some clusters. -When you want info logs that produce large volumes of data to go to a cluster, index with more frequent rollovers, or add deletions to clear out large volumes of data, you can now configure pipelines to route the data with your chosen action. +When you want info logs that produce large volumes of data to go to a cluster, to index with more frequent rollovers, or to add deletions to clear out large volumes of data, you can now configure pipelines to route the data with your chosen action. deletions to clear out these large volumes of data, you now configure pipelines to route your data. -Simply pick a name appropriate for the domain and a Data Prepper expression. +Simply choose a name appropriate for the domain and a Data Prepper expression. Then for any sink that should only have some data coming through, define one or more routes to apply. Data Prepper will evaluate these expressions for each event to determine which sinks to route these events. Any sink that has no routes defined will accept all events. @@ -78,7 +78,7 @@ application-log-pipeline: There are many other use cases that conditional routing can support. If there are other conditional expressions you’d like to see support for, please create an issue in GitHub. -## Peer Forwarder +## Peer forwarder Data Prepper 2.0 introduces peer forwarding as a core feature. @@ -115,7 +115,7 @@ peer_forwarder: ## Directory structure Before the release of Data Prepper 2.0, we distributed Data Prepper as a single executable JAR file. While convenient, -it made it difficult for us to include custom plugins. +this made it difficult for us to include custom plugins. We now distribute Data Prepper 2.0 in a bundled directory structure. This structure features a shell script to launch Data Prepper and dedicated subdirectories for JAR files, configurations, pipelines, logs, and more. @@ -136,22 +136,22 @@ data-prepper-2.0.0/ logs/ ``` -You now can launch Data Prepper by running `bin/data-prepper`; no need for additional command line arguments or Java system +You now can launch Data Prepper by running `bin/data-prepper`; there is no need for additional command line arguments or Java system property definitions. Instead, the application loads configurations from the `config/` subdirectory. Data Prepper 2.0 reads pipeline configurations from the `pipelines/` subdirectory. You can now define pipelines across multiple YAML files in the subdirectory, where each file contains the definition for one or more pipelines. The directory also helps keep pipeline definition distinct and, therefore, more compact and focused. -## JSON & CSV parsing +## JSON and CSV parsing Many of our users have incoming data with embedded JSON or CSV fields. To help in these use cases, Data Prepper 2.0 -supports parsing JSON or CSV. +supports parsing JSON and CSV. For example, when one large object includes a serialized JSON string, you can use the `parse_json` processor to extract the fields from the JSON into your event. -Data Prepper can now import CSV or TSV formatted files from Amazon S3 sources. This is useful for systems like Amazon CloudFront +Data Prepper can now import CSV or TSV formatted files from Amazon Simple Storage Service (Amazon S3) sources. This is useful for systems like Amazon CloudFront, which write their access logs as TSV files. Now you can parse these logs using Data Prepper. Additionally, if your events have From faa17e2e07d9bcb6d6d6bbbc058e33aa5ccbee5c Mon Sep 17 00:00:00 2001 From: David Venable Date: Wed, 12 Oct 2022 13:31:16 -0500 Subject: [PATCH 08/11] Other minor tweaks from review that were not auto-accepted. Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 154ab2f683..dafd907e55 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -89,12 +89,12 @@ single node. With peer forwarding as a core feature, pipeline authors can perform stateful aggregations on multiple Data Prepper nodes. When performing stateful aggregations, Data Prepper uses a hash ring to determine which nodes are responsible for processing different events based on the values of certain fields. Peer forwarder -routes events to the node responsible for processing the event. That node then holds all the state necessary for performing the aggregation. +routes events to the node responsible for processing them. That node then holds all the state necessary for performing the aggregation. To use peer forwarding, configure how Data Prepper discovers other nodes and the security for connections in your `data-prepper-config.yaml` file. -In the following example, Data Prepper discovers other peers using a DNS query on the `my-data-prepper-cluster.production` domain. +In the following example, Data Prepper discovers other peers by using a DNS query on the `my-data-prepper-cluster.production` domain. When using peer forwarder with DNS, the DNS record should be an A record with a list of IP addresses for peers. The example also uses a custom certificate and private key. For host verification, it checks the fingerprint of the certificate. Lastly, it configures each server to authenticate requests using Mutual TLS (mTLS) to prevent data tampering. @@ -149,7 +149,7 @@ Many of our users have incoming data with embedded JSON or CSV fields. To help i supports parsing JSON and CSV. For example, when one large object includes a serialized JSON string, you can use the `parse_json` processor to extract -the fields from the JSON into your event. +the fields from the JSON string into your event. Data Prepper can now import CSV or TSV formatted files from Amazon Simple Storage Service (Amazon S3) sources. This is useful for systems like Amazon CloudFront, which write their access logs as TSV files. Now you can parse these logs using Data Prepper. @@ -162,7 +162,7 @@ CSV or TSV fields, Data Prepper 2.0 now contains a `csv` processor that can crea Data Prepper 2.0 includes a number of other improvements. We want to highlight a few of them. * The OpenSearch sink now supports `create` actions for OpenSearch when writing documents. Pipeline authors can configure their pipelines to only create new documents and not update existing ones. -* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or Amazon Certificate Manager. Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options. +* The HTTP source now supports loading TLS/SSL credentials from either Amazon S3 or AWS Certificate Manager (ACM). Pipeline authors can now configure them for their log ingestion use cases. Before Data Prepper 2.0, only the OTel Trace Source supported these options. * Data Prepper now requires Java 11 or higher. The Docker image deploys with JDK 17. Please see our [release notes](https://github.com/opensearch-project/data-prepper/releases/tag/2.0.0) for a complete list. From 87cb277419388e9783994b08e072c21f1a0b6461 Mon Sep 17 00:00:00 2001 From: David Venable Date: Wed, 12 Oct 2022 14:12:22 -0500 Subject: [PATCH 09/11] Added a call to action at the end. Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index dafd907e55..f8c3d42670 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -166,3 +166,8 @@ Data Prepper 2.0 includes a number of other improvements. We want to highlight a * Data Prepper now requires Java 11 or higher. The Docker image deploys with JDK 17. Please see our [release notes](https://github.com/opensearch-project/data-prepper/releases/tag/2.0.0) for a complete list. + +## Try Data Prepper 2.0 + +Data Prepper 2.0 is available for [download](https://opensearch.org/downloads.html#data-prepper) now. The maintainers encourage you to +read the [latest documentation](https://opensearch.org/docs/latest/clients/data-prepper/index/) and try out the new features. From 64fb60720842a9ce10c37f2b300e14652dc781fb Mon Sep 17 00:00:00 2001 From: David Venable Date: Wed, 12 Oct 2022 14:13:12 -0500 Subject: [PATCH 10/11] Removed "all" per recommendation. Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index f8c3d42670..0be046984d 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -89,7 +89,7 @@ single node. With peer forwarding as a core feature, pipeline authors can perform stateful aggregations on multiple Data Prepper nodes. When performing stateful aggregations, Data Prepper uses a hash ring to determine which nodes are responsible for processing different events based on the values of certain fields. Peer forwarder -routes events to the node responsible for processing them. That node then holds all the state necessary for performing the aggregation. +routes events to the node responsible for processing them. That node then holds the state necessary for performing the aggregation. To use peer forwarding, configure how Data Prepper discovers other nodes and the security for connections in your `data-prepper-config.yaml` file. From c65831f56815f0beac983896ee02405ebe0ac336 Mon Sep 17 00:00:00 2001 From: David Venable Date: Wed, 12 Oct 2022 16:07:50 -0500 Subject: [PATCH 11/11] Exclamation point! Signed-off-by: David Venable --- _posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md index 0be046984d..afc624db76 100644 --- a/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md +++ b/_posts/2022-10-10-Announcing-Data-Prepper-2.0.0.md @@ -169,5 +169,5 @@ Please see our [release notes](https://github.com/opensearch-project/data-preppe ## Try Data Prepper 2.0 -Data Prepper 2.0 is available for [download](https://opensearch.org/downloads.html#data-prepper) now. The maintainers encourage you to +Data Prepper 2.0 is available for [download](https://opensearch.org/downloads.html#data-prepper) now! The maintainers encourage you to read the [latest documentation](https://opensearch.org/docs/latest/clients/data-prepper/index/) and try out the new features.