Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deriving metrics from logs use case to Data Prepper #6248

Merged
merged 29 commits into from
Jul 3, 2024
Merged
Changes from 24 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
39d2161
Add use case to Data Prepper
vagimeli Jan 23, 2024
06fb2a3
Add content
vagimeli Jan 23, 2024
a0fe1f0
Copy edits
vagimeli Jan 23, 2024
a5bcd1b
Merge branch 'main' into metrics-logs
vagimeli Jan 31, 2024
a47d898
Merge branch 'main' into metrics-logs
vagimeli Feb 5, 2024
217cee3
Merge branch 'main' into metrics-logs
vagimeli Feb 22, 2024
803b748
Merge branch 'main' into metrics-logs
vagimeli Feb 26, 2024
364619c
Update metrics-logs.md
vagimeli Mar 6, 2024
6ecd3db
Merge branch 'main' into metrics-logs
vagimeli Apr 3, 2024
e60fdb9
Merge branch 'main' into metrics-logs
vagimeli Apr 4, 2024
223b20c
Merge branch 'main' into metrics-logs
vagimeli Apr 4, 2024
a6b0a6a
Merge branch 'main' into metrics-logs
vagimeli Apr 9, 2024
39a2c4a
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Apr 25, 2024
111b669
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Apr 25, 2024
889cada
Merge branch 'main' into metrics-logs
vagimeli Apr 25, 2024
8c54196
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
e453c50
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
d38691d
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
c02adb0
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli May 8, 2024
44500d8
Merge branch 'main' into metrics-logs
vagimeli May 8, 2024
c761ef5
Merge branch 'main' into metrics-logs
vagimeli May 13, 2024
db890e3
Merge branch 'main' into metrics-logs
vagimeli Jun 5, 2024
4b0d81b
Merge branch 'main' into metrics-logs
vagimeli Jun 26, 2024
3afc5da
Update metrics-logs.md
vagimeli Jun 26, 2024
9cb87a5
Merge branch 'main' into metrics-logs
vagimeli Jun 28, 2024
540b97c
Update metrics-logs.md
vagimeli Jun 28, 2024
871bf11
Update _data-prepper/common-use-cases/metrics-logs.md
vagimeli Jun 28, 2024
9b0b40a
Merge branch 'main' into metrics-logs
vagimeli Jul 2, 2024
20a6d8f
Merge branch 'main' into metrics-logs
vagimeli Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions _data-prepper/common-use-cases/metrics-logs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
layout: default
title: Deriving metrics from logs
parent: Common use cases
nav_order: 15
---

# Deriving metrics from logs

You can use Data Prepper to derive metrics from logs.

The following example pipeline receives incoming logs using the [`http` source plugin]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/http-source) and the [`grok` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/grok/). It then uses the [`aggregate` processor]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/aggregate/) to extract the metric bytes aggregated during a 30-second window and derives histograms from the results.

This pipeline writes data to two different OpenSearch indexes:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable Please review the text added at lines 14--27 and let me know if these additions address your feedback and are accurate. Thank you.


- `logs`: This index stores the original, un-aggregated log events after being processed by the `grok` processor.
- `histogram_metrics`: This index stores the derived histogram metrics extracted from the log events using the `aggregate` processor.

The pipeline contains two sub-pipelines:

- `apache-log-pipeline-with-metrics`: Receives logs through an HTTP client like FluentBit, uses `grok` to extract important values from the logs by matching the value in the log
key against the [Apache Common Log Format](https://httpd.apache.org/docs/2.4/logs.html#accesslog). It then forwards the grokked logs to two destinations:

- An OpenSearch index named `logs` to store the original log events.
- The `log-to-metrics-pipeline` for further aggregation and metric derivation.

- `log-to-metrics-pipeline`: Receives the grokked logs from the `apache-log-pipeline-with-metrics` pipeline, aggregates the logs, and derives histogram metrics of bytes based on the values in the `clientip` and `request` keys. Finally, it sends the derived histogram metrics to an OpenSearch index named `histogram_metrics`.

#### Example pipeline

```json
apache-log-pipeline-with-metrics:
source:
http:
# Provide the path for ingestion. ${pipelineName} will be replaced with pipeline name configured for this pipeline.
# In this case it would be "/apache-log-pipeline-with-metrics/logs". This will be the FluentBit output URI value.
path: "/${pipelineName}/logs"
processor:
- grok:
match:
log: [ "%{COMMONAPACHELOG_DATATYPED}" ]
sink:
- opensearch:
...
index: "logs"
- pipeline:
name: "log-to-metrics-pipeline"

log-to-metrics-pipeline:
source:
pipeline:
name: "apache-log-pipeline-with-metrics"
processor:
- aggregate:
# Specify the required identification keys
identification_keys: ["clientip", "request"]
action:
histogram:
# Specify the appropriate values for each of the following fields
key: "bytes"
record_minmax: true
units: "bytes"
buckets: [0, 25000000, 50000000, 75000000, 100000000]
# Pick the required aggregation period
group_duration: "30s"
sink:
- opensearch:
...
index: "histogram_metrics"
```
{% include copy-curl.html %}
Loading