Skip to content

Commit

Permalink
ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE
Browse files Browse the repository at this point in the history
  • Loading branch information
luigidellaquila committed Oct 22, 2024
1 parent ebd363d commit 72292ed
Show file tree
Hide file tree
Showing 11 changed files with 197 additions and 38 deletions.
30 changes: 20 additions & 10 deletions docs/reference/esql/esql-process-data-with-dissect-grok.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ delimiter-based pattern, and extracts the specified keys as columns.
For example, the following pattern:
[source,txt]
----
%{clientip} [%{@timestamp}] %{status}
%{clientip} [%{@timestamp}] %{status}
----

matches a log line of this format:
Expand Down Expand Up @@ -76,8 +76,8 @@ ignore certain fields, append fields, skip over padding, etc.
===== Terminology

dissect pattern::
the set of fields and delimiters describing the textual
format. Also known as a dissection.
the set of fields and delimiters describing the textual
format. Also known as a dissection.
The dissection is described using a set of `%{}` sections:
`%{a} - %{b} - %{c}`

Expand All @@ -91,14 +91,14 @@ Any set of characters other than `%{`, `'not }'`, or `}` is a delimiter.
key::
+
--
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
and the ordinal suffix.
the text between the `%{` and `}`, exclusive of the `?`, `+`, `&` prefixes
and the ordinal suffix.

Examples:

* `%{?aaa}` - the key is `aaa`
* `%{+bbb/3}` - the key is `bbb`
* `%{&ccc}` - the key is `ccc`
* `%{?aaa}` - the key is `aaa`
* `%{+bbb/3}` - the key is `bbb`
* `%{&ccc}` - the key is `ccc`
--

[[esql-dissect-examples]]
Expand Down Expand Up @@ -239,7 +239,7 @@ with a `\`. For example, in the earlier pattern:
%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
----
In {esql} queries, the backslash character itself is a special character that
In {esql} queries, when using single quotes for strings, the backslash character itself is a special character that
needs to be escaped with another `\`. For this example, the corresponding {esql}
query becomes:
[source.merge.styled,esql]
Expand All @@ -248,6 +248,16 @@ include::{esql-specs}/docs.csv-spec[tag=grokWithEscape]
----
====

For this reason, in general it is more convenient to use triple quotes `"""` for GROK patterns,
that do not require escaping for backslash.

[source.merge.styled,esql]
----
include::{esql-specs}/docs.csv-spec[tag=grokWithEscapeTripleQuotes]
----
====
[[esql-grok-patterns]]
===== Grok patterns
Expand Down Expand Up @@ -318,4 +328,4 @@ as the `GROK` command.
The `GROK` command does not support configuring <<custom-patterns,custom
patterns>>, or <<trace-match,multiple patterns>>. The `GROK` command is not
subject to <<grok-watchdog,Grok watchdog settings>>.
// end::grok-limitations[]
// end::grok-limitations[]
4 changes: 2 additions & 2 deletions docs/reference/esql/functions/kibana/definition/like.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/esql/functions/kibana/definition/rlike.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/esql/functions/kibana/docs/like.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions docs/reference/esql/functions/kibana/docs/rlike.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions docs/reference/esql/functions/like.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,22 @@ include::{esql-specs}/docs.csv-spec[tag=like]
|===
include::{esql-specs}/docs.csv-spec[tag=like-result]
|===

Matching the exact characters `*` and `.` will require escaping.
The escape character is backslash `\`. Since also backslash is a special character in string literals,
it will require further escaping.

[source.merge.styled,esql]
----
include::{esql-specs}/string.csv-spec[tag=likeEscapingSingleQuotes]
----
====
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
[source.merge.styled,esql]
----
include::{esql-specs}/string.csv-spec[tag=likeEscapingTripleQuotes]
----
====
// end::body[]
18 changes: 18 additions & 0 deletions docs/reference/esql/functions/rlike.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,22 @@ include::{esql-specs}/docs.csv-spec[tag=rlike]
|===
include::{esql-specs}/docs.csv-spec[tag=rlike-result]
|===

Matching special characters (eg. `.`, `*`, `(`...) will require escaping.
The escape character is backslash `\`. Since also backslash is a special character in string literals,
it will require further escaping.

[source.merge.styled,esql]
----
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingSingleQuotes]
----
====
To reduce the overhead of escaping, we suggest using triple quotes strings `"""`
[source.merge.styled,esql]
----
include::{esql-specs}/string.csv-spec[tag=rlikeEscapingTripleQuotes]
----
====
// end::body[]
42 changes: 29 additions & 13 deletions x-pack/plugin/esql/qa/testFixtures/src/main/resources/docs.csv-spec
Original file line number Diff line number Diff line change
Expand Up @@ -382,7 +382,7 @@ count:long | languages:integer
basicGrok
// tag::basicGrok[]
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}"""
| KEEP date, ip, email, num
// end::basicGrok[]
;
Expand All @@ -396,7 +396,7 @@ date:keyword | ip:keyword | email:keyword | num:keyword
grokWithConversionSuffix
// tag::grokWithConversionSuffix[]
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
| KEEP date, ip, email, num
// end::grokWithConversionSuffix[]
;
Expand All @@ -410,7 +410,7 @@ date:keyword | ip:keyword | email:keyword | num:integer
grokWithToDatetime
// tag::grokWithToDatetime[]
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42"
| GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"
| GROK a """%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}"""
| KEEP date, ip, email, num
| EVAL date = TO_DATETIME(date)
// end::grokWithToDatetime[]
Expand All @@ -436,11 +436,27 @@ ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
// end::grokWithEscape-result[]
;


grokWithEscapeTripleQuotes
// tag::grokWithEscapeTripleQuotes[]
ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected"
| GROK a """%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}"""
// end::grokWithEscapeTripleQuotes[]
| KEEP @timestamp
;

// tag::grokWithEscapeTripleQuotes-result[]
@timestamp:keyword
2023-01-23T12:15:00.000Z
// end::grokWithEscapeTripleQuotes-result[]
;


grokWithDuplicateFieldNames
// tag::grokWithDuplicateFieldNames[]
FROM addresses
| KEEP city.name, zip_code
| GROK zip_code "%{WORD:zip_parts} %{WORD:zip_parts}"
| GROK zip_code """%{WORD:zip_parts} %{WORD:zip_parts}"""
// end::grokWithDuplicateFieldNames[]
| SORT city.name
;
Expand All @@ -456,7 +472,7 @@ Tokyo | 100-7014 | null
basicDissect
// tag::basicDissect[]
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
| DISSECT a "%{date} - %{msg} - %{ip}"
| DISSECT a """%{date} - %{msg} - %{ip}"""
| KEEP date, msg, ip
// end::basicDissect[]
;
Expand All @@ -470,7 +486,7 @@ date:keyword | msg:keyword | ip:keyword
dissectWithToDatetime
// tag::dissectWithToDatetime[]
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1"
| DISSECT a "%{date} - %{msg} - %{ip}"
| DISSECT a """%{date} - %{msg} - %{ip}"""
| KEEP date, msg, ip
| EVAL date = TO_DATETIME(date)
// end::dissectWithToDatetime[]
Expand All @@ -485,7 +501,7 @@ some text | 127.0.0.1 | 2023-01-23T12:15:00.000Z
dissectRightPaddingModifier
// tag::dissectRightPaddingModifier[]
ROW message="1998-08-10T17:15:42 WARN"
| DISSECT message "%{ts->} %{level}"
| DISSECT message """%{ts->} %{level}"""
// end::dissectRightPaddingModifier[]
;

Expand All @@ -498,7 +514,7 @@ message:keyword | ts:keyword | level:keyword
dissectEmptyRightPaddingModifier#[skip:-8.11.2, reason:Support for empty right padding modifiers introduced in 8.11.2]
// tag::dissectEmptyRightPaddingModifier[]
ROW message="[1998-08-10T17:15:42] [WARN]"
| DISSECT message "[%{ts}]%{->}[%{level}]"
| DISSECT message """[%{ts}]%{->}[%{level}]"""
// end::dissectEmptyRightPaddingModifier[]
;

Expand All @@ -511,7 +527,7 @@ ROW message="[1998-08-10T17:15:42] [WARN]"
dissectAppendModifier
// tag::dissectAppendModifier[]
ROW message="john jacob jingleheimer schmidt"
| DISSECT message "%{+name} %{+name} %{+name} %{+name}" APPEND_SEPARATOR=" "
| DISSECT message """%{+name} %{+name} %{+name} %{+name}""" APPEND_SEPARATOR=" "
// end::dissectAppendModifier[]
;

Expand All @@ -524,7 +540,7 @@ john jacob jingleheimer schmidt|john jacob jingleheimer schmidt
dissectAppendWithOrderModifier
// tag::dissectAppendWithOrderModifier[]
ROW message="john jacob jingleheimer schmidt"
| DISSECT message "%{+name/2} %{+name/4} %{+name/3} %{+name/1}" APPEND_SEPARATOR=","
| DISSECT message """%{+name/2} %{+name/4} %{+name/3} %{+name/1}""" APPEND_SEPARATOR=","
// end::dissectAppendWithOrderModifier[]
;

Expand All @@ -537,7 +553,7 @@ john jacob jingleheimer schmidt|schmidt,john,jingleheimer,jacob
dissectNamedSkipKey
// tag::dissectNamedSkipKey[]
ROW message="1.2.3.4 - - 30/Apr/1998:22:00:52 +0000"
| DISSECT message "%{clientip} %{?ident} %{?auth} %{@timestamp}"
| DISSECT message """%{clientip} %{?ident} %{?auth} %{@timestamp}"""
// end::dissectNamedSkipKey[]
;

Expand All @@ -550,7 +566,7 @@ message:keyword | clientip:keyword | @timestamp:keyword
docsLike
// tag::like[]
FROM employees
| WHERE first_name LIKE "?b*"
| WHERE first_name LIKE """?b*"""
| KEEP first_name, last_name
// end::like[]
| SORT first_name
Expand All @@ -566,7 +582,7 @@ Eberhardt |Terkki
docsRlike
// tag::rlike[]
FROM employees
| WHERE first_name RLIKE ".leja.*"
| WHERE first_name RLIKE """.leja.*"""
| KEEP first_name, last_name
// end::rlike[]
;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1799,3 +1799,59 @@ warning:Line 1:29: java.lang.IllegalArgumentException: single-value function enc
x:keyword
null
;


likeEscapingSingleQuotes
// tag::likeEscapingSingleQuotes[]
ROW message = "foo * bar"
| WHERE message LIKE "foo \\* bar"
// end::likeEscapingSingleQuotes[]
;

// tag::likeEscapingSingleQuotes-result[]
message:keyword
foo * bar
// end::likeEscapingSingleQuotes-result[]
;


likeEscapingTripleQuotes
// tag::likeEscapingTripleQuotes[]
ROW message = "foo * bar"
| WHERE message RLIKE """foo \* bar"""
// end::likeEscapingTripleQuotes[]
;

// tag::likeEscapingTripleQuotes-result[]
message:keyword
foo * bar
// end::likeEscapingTripleQuotes-result[]
;


rlikeEscapingSingleQuotes
// tag::rlikeEscapingSingleQuotes[]
ROW message = "foo ( bar"
| WHERE message RLIKE "foo \\( bar"
// end::rlikeEscapingSingleQuotes[]
;

// tag::rlikeEscapingSingleQuotes-result[]
message:keyword
foo ( bar
// end::rlikeEscapingSingleQuotes-result[]
;


rlikeEscapingTripleQuotes
// tag::rlikeEscapingTripleQuotes[]
ROW message = "foo ( bar"
| WHERE message RLIKE """foo \( bar"""
// end::rlikeEscapingTripleQuotes[]
;

// tag::rlikeEscapingTripleQuotes-result[]
message:keyword
foo ( bar
// end::rlikeEscapingTripleQuotes-result[]
;
Loading

0 comments on commit 72292ed

Please sign in to comment.