description |
---|
Select or exclude records per patterns |
The Grep Filter plugin allows you to match or exclude specific records based on regular expression patterns for values or nested values.
The plugin supports the following configuration parameters:
Key | Value Format | Description |
---|---|---|
Regex | KEY REGEX | Keep records in which the content of KEY matches the regular expression. |
Exclude | KEY REGEX | Exclude records in which the content of KEY matches the regular expression. |
Logical_Op | Operation | Specify which logical operator to use. AND , OR and legacy are allowed as an Operation. Default is legacy for backward compatibility. In legacy mode the behaviour is either AND or OR depending whether the grep is including (uses AND) or excluding (uses OR). Only available from 2.1+. |
This plugin enables the Record Accessor feature to specify the KEY. Using the record accessor is suggested if you want to match values against nested values.
In order to start filtering records, you can run the filter from the command line or through the configuration file. The following example assumes that you have a file called lines.txt
with the following content:
{"log": "aaa"}
{"log": "aab"}
{"log": "bbb"}
{"log": "ccc"}
{"log": "ddd"}
{"log": "eee"}
{"log": "fff"}
{"log": "ggg"}
Note: using the command line mode need special attention to quote the regular expressions properly. It's suggested to use a configuration file.
The following command will load the tail plugin and read the content of lines.txt
file. Then the grep filter will apply a regular expression rule over the log field (created by tail plugin) and only pass the records which field value starts with aa:
$ bin/fluent-bit -i tail -p 'path=lines.txt' -F grep -p 'regex=log aa' -m '*' -o stdout
{% tabs %} {% tab title="fluent-bit.conf" %}
[SERVICE]
parsers_file /path/to/parsers.conf
[INPUT]
name tail
path lines.txt
parser json
[FILTER]
name grep
match *
regex log aa
[OUTPUT]
name stdout
match *
{% endtab %}
{% tab title="fluent-bit.yaml" %}
service:
parsers_file: /path/to/parsers.conf
pipeline:
inputs:
- name: tail
path: lines.txt
parser: json
filters:
- name: grep
match: '*'
regex: log aa
outputs:
- name: stdout
match: '*'
{% endtab %} {% endtabs %}
The filter allows to use multiple rules which are applied in order, you can have many Regex and Exclude entries as required.
If you want to match or exclude records based on nested values, you can use a Record Accessor format as the KEY name. Consider the following record example:
{
"log": "something",
"kubernetes": {
"pod_name": "myapp-0",
"namespace_name": "default",
"pod_id": "216cd7ae-1c7e-11e8-bb40-000c298df552",
"labels": {
"app": "myapp"
},
"host": "minikube",
"container_name": "myapp",
"docker_id": "370face382c7603fdd309d8c6aaaf434fd98b92421ce"
}
}
if you want to exclude records that match given nested field (for example kubernetes.labels.app
), you can use the following rule:
{% tabs %} {% tab title="fluent-bit.conf" %}
[FILTER]
Name grep
Match *
Exclude $kubernetes['labels']['app'] myapp
{% endtab %}
{% tab title="fluent-bit.yaml" %}
filters:
- name: grep
match: '*'
exclude: $kubernetes['labels']['app'] myapp
{% endtab %} {% endtabs %}
It may be that in your processing pipeline you want to drop records that are missing certain keys.
A simple way to do this is just to exclude
with a regex that matches anything, a missing key will fail this check.
Here is an example that checks for a specific valid value for the key as well:
{% tabs %} {% tab title="fluent-bit.conf" %}
# Use Grep to verify the contents of the iot_timestamp value.
# If the iot_timestamp key does not exist, this will fail
# and exclude the row.
[FILTER]
Name grep
Alias filter-iots-grep
Match iots_thread.*
Regex iot_timestamp ^\d{4}-\d{2}-\d{2}
{% endtab %}
{% tab title="fluent-bit.yaml" %}
filters:
- name: grep
alias: filter-iots-grep
match: iots_thread.*
regex: iot_timestamp ^\d{4}-\d{2}-\d{2}
{% endtab %} {% endtabs %}
The specified key iot_timestamp
must match the expected expression - if it does not or is missing/empty then it will be excluded.
If you want to set multiple Regex
or Exclude
, you can use Logical_Op
property to use logical conjuction or disjunction.
Note: If Logical_Op
is set, setting both 'Regex' and Exclude
results in an error.
{% tabs %} {% tab title="fluent-bit.conf" %}
[INPUT]
Name dummy
Dummy {"endpoint":"localhost", "value":"something"}
Tag dummy
[FILTER]
Name grep
Match *
Logical_Op or
Regex value something
Regex value error
[OUTPUT]
Name stdout
{% endtab %}
{% tab title="fluent-bit.yaml" %}
pipeline:
inputs:
- name: dummy
dummy: '{"endpoint":"localhost", "value":"something"}'
tag: dummy
filters:
- name: grep
match: '*'
logical_op: or
regex:
- value something
- value error
outputs:
- name: stdout
{% endtab %} {% endtabs %}
Output will be
Fluent Bit v2.0.9
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
[2023/01/22 09:46:49] [ info] [fluent bit] version=2.0.9, commit=16eae10786, pid=33268
[2023/01/22 09:46:49] [ info] [storage] ver=1.2.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/01/22 09:46:49] [ info] [cmetrics] version=0.5.8
[2023/01/22 09:46:49] [ info] [ctraces ] version=0.2.7
[2023/01/22 09:46:49] [ info] [input:dummy:dummy.0] initializing
[2023/01/22 09:46:49] [ info] [input:dummy:dummy.0] storage_strategy='memory' (memory only)
[2023/01/22 09:46:49] [ info] [filter:grep:grep.0] OR mode
[2023/01/22 09:46:49] [ info] [sp] stream processor started
[2023/01/22 09:46:49] [ info] [output:stdout:stdout.0] worker #0 started
[0] dummy: [1674348410.558341857, {"endpoint"=>"localhost", "value"=>"something"}]
[0] dummy: [1674348411.546425499, {"endpoint"=>"localhost", "value"=>"something"}]