Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add planning document for PPL revisions #751

Closed
wants to merge 10 commits into from
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Please refer to the [Flint Index Reference Manual](./docs/index.md) for more inf

* For additional details on Spark PPL Architecture, see [PPL Architecture](docs/ppl-lang/PPL-on-Spark.md)

* For the PPL planned release content [release-plan](docs/ppl-lang/planning/release-plan.md)

* For additional details on Spark PPL commands project, see [PPL Project](https://github.com/orgs/opensearch-project/projects/214/views/2)

## Prerequisites
Expand Down Expand Up @@ -92,7 +94,7 @@ If you discover a potential security issue in this project we ask that you notif

## License

See the [LICENSE](./LICENSE.txt) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.

## Copyright

Expand Down
34 changes: 17 additions & 17 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
- `explain simple | describe table`

#### **Fields**
[See additional command details](ppl-fields-command.md)
[See additional command details](commands/ppl-fields-command.md)
- `source = table`
- `source = table | fields a,b,c`
- `source = table | fields + a,b,c`
Expand Down Expand Up @@ -75,7 +75,7 @@ _- **Limitation: new field added by eval command with a function cannot be dropp


#### **Eval**:
[See additional command details](ppl-eval-command.md)
[See additional command details](commands/ppl-eval-command.md)

Assumptions: `a`, `b`, `c` are existing fields in `table`
- `source = table | eval f = 1 | fields a,b,c,f`
Expand Down Expand Up @@ -133,7 +133,7 @@ source = table | where ispresent(a) |
- `source = table | eval a = signum(a) | where a < 0`

#### **Aggregations**
[See additional command details](ppl-stats-command.md)
[See additional command details](commands/ppl-stats-command.md)

- `source = table | stats avg(a) `
- `source = table | where a < 50 | stats avg(c) `
Expand All @@ -160,7 +160,7 @@ source = table | where ispresent(a) |

#### **Dedup**

[See additional command details](ppl-dedup-command.md)
[See additional command details](commands/ppl-dedup-command.md)

- `source = table | dedup a | fields a,b,c`
- `source = table | dedup a,b | fields a,b,c`
Expand All @@ -177,31 +177,31 @@ source = table | where ispresent(a) |
- `source = table | dedup 1 a consecutive=true| fields a,b,c` (Consecutive deduplication is unsupported)

#### **Rare**
[See additional command details](ppl-rare-command.md)
[See additional command details](commands/ppl-rare-command.md)

- `source=accounts | rare gender`
- `source=accounts | rare age by gender`

#### **Top**
[See additional command details](ppl-top-command.md)
[See additional command details](commands/ppl-top-command.md)

- `source=accounts | top gender`
- `source=accounts | top 1 gender`
- `source=accounts | top 1 age by gender`

#### **Parse**
[See additional command details](ppl-parse-command.md)
[See additional command details](commands/ppl-parse-command.md)

- `source=accounts | parse email '.+@(?<host>.+)' | fields email, host `
- `source=accounts | parse email '.+@(?<host>.+)' | top 1 host `
- `source=accounts | parse email '.+@(?<host>.+)' | stats count() by host`
- `source=accounts | parse email '.+@(?<host>.+)' | eval eval_result=1 | fields host, eval_result`
- `source=accounts | parse email '.+@(?<host>.+)' | where age > 45 | sort - age | fields age, email, host`
- `source=accounts | parse address '(?<streetNumber>\d+) (?<street>.+)' | where streetNumber > 500 | sort num(streetNumber) | fields streetNumber, street`
- Limitation: [see limitations](ppl-parse-command.md#limitations)
- Limitation: [see limitations](commands/ppl-parse-command.md#limitations)

#### **Grok**
[See additional command details](ppl-grok-command.md)
[See additional command details](commands/ppl-grok-command.md)

- `source=accounts | grok email '.+@%{HOSTNAME:host}' | top 1 host`
- `source=accounts | grok email '.+@%{HOSTNAME:host}' | stats count() by host`
Expand All @@ -212,26 +212,26 @@ source = table | where ispresent(a) |

- **Limitation: Overriding existing field is unsupported:**_
- `source=accounts | grok address '%{NUMBER} %{GREEDYDATA:address}' | fields address`
- [see limitations](ppl-parse-command.md#limitations)
- [see limitations](commands/ppl-parse-command.md#limitations)

#### **Patterns**
[See additional command details](ppl-patterns-command.md)
[See additional command details](commands/ppl-patterns-command.md)

- `source=accounts | patterns email | fields email, patterns_field `
- `source=accounts | patterns email | where age > 45 | sort - age | fields email, patterns_field`
- `source=apache | patterns new_field='no_numbers' pattern='[0-9]' message | fields message, no_numbers`
- `source=apache | patterns new_field='no_numbers' pattern='[0-9]' message | stats count() by no_numbers`
- Limitation: [see limitations](ppl-parse-command.md#limitations)
- Limitation: [see limitations](commands/ppl-parse-command.md#limitations)

#### **Rename**
[See additional command details](ppl-rename-command.md)
[See additional command details](commands/ppl-rename-command.md)

- `source=accounts | rename email as user_email | fields id, user_email`
- `source=accounts | rename id as user_id, email as user_email | fields user_id, user_email`


#### **Join**
[See additional command details](ppl-join-command.md)
[See additional command details](commands/ppl-join-command.md)

- `source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c`
- `source = table1 | left join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c`
Expand All @@ -245,7 +245,7 @@ _- **Limitation: sub-searches is unsupported in join right side now**_


#### **Lookup**
[See additional command details](ppl-lookup-command.md)
[See additional command details](commands/ppl-lookup-command.md)

- `source = table1 | lookup table2 id`
- `source = table1 | lookup table2 id, name`
Expand All @@ -260,7 +260,7 @@ _- **Limitation: "REPLACE" or "APPEND" clause must contain "AS"**_


#### **InSubquery**
[See additional command details](ppl-subquery-command.md)
[See additional command details](commands/ppl-subquery-command.md)

- `source = outer | where a in [ source = inner | fields b ]`
- `source = outer | where (a) in [ source = inner | fields b ]`
Expand Down Expand Up @@ -352,7 +352,7 @@ Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table in

---
#### Experimental Commands:
[See additional command details](ppl-correlation-command.md)
[See additional command details](commands/ppl-correlation-command.md)

```sql
- `source alb_logs, traces, metrics | where ip="10.0.0.1" AND cloud.provider="aws"| correlate exact on (ip, port) scope(@timestamp, 2018-07-02T22:23:00, 1 D)`
Expand Down
38 changes: 19 additions & 19 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ source=accounts
```

For additional examples see the next [documentation](PPL-Example-Commands.md).

For the PPL planned release content [release-plan](planning/release-plan.md)
---
### Commands Specifications

Expand All @@ -24,45 +24,45 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`explain command `](PPL-Example-Commands.md/#explain)

- [`dedup command `](ppl-dedup-command.md)
- [`dedup command `](commands/ppl-dedup-command.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add version / since on each command doc?


- [`describe command`](PPL-Example-Commands.md/#describe)

- [`fillnull command`](ppl-fillnull-command.md)

- [`eval command`](ppl-eval-command.md)
- [`eval command`](commands/ppl-eval-command.md)

- [`fields command`](ppl-fields-command.md)
- [`fields command`](commands/ppl-fields-command.md)

- [`grok command`](ppl-grok-command.md)
- [`grok command`](commands/ppl-grok-command.md)

- [`parse command`](ppl-parse-command.md)
- [`parse command`](commands/ppl-parse-command.md)

- [`patterns command`](ppl-patterns-command.md)
- [`patterns command`](commands/ppl-patterns-command.md)

- [`rename command`](ppl-rename-command.md)
- [`rename command`](commands/ppl-rename-command.md)

- [`search command`](ppl-search-command.md)
- [`search command`](commands/ppl-search-command.md)

- [`sort command`](ppl-sort-command.md)
- [`sort command`](commands/ppl-sort-command.md)

- [`stats command`](ppl-stats-command.md)
- [`stats command`](commands/ppl-stats-command.md)

- [`where command`](ppl-where-command.md)
- [`where command`](commands/ppl-where-command.md)

- [`head command`](ppl-head-command.md)
- [`head command`](commands/ppl-head-command.md)

- [`rare command`](ppl-rare-command.md)
- [`rare command`](commands/ppl-rare-command.md)

- [`top command`](ppl-top-command.md)
- [`top command`](commands/ppl-top-command.md)

- [`join commands`](ppl-join-command.md)
- [`join commands`](commands/ppl-join-command.md)

- [`lookup commands`](ppl-lookup-command.md)
- [`lookup commands`](commands/ppl-lookup-command.md)

- [`subquery commands`](ppl-subquery-command.md)
- [`subquery commands`](commands/ppl-subquery-command.md)

- [`correlation commands`](ppl-correlation-command.md)
- [`correlation commands`](commands/ppl-correlation-command.md)


* **Functions**
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,19 @@
## PPL Correlation Command

> This is an experimental command - it may be removed in future versions

## PPL `correlation` Command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.0.0+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:yellow;">Experimental</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.4.0</th>
</tr>
</table>

## Overview

Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,25 @@
# PPL dedup command
# PPL `dedup` command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.5.1+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:yellow;">Experimental</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.5.0</th>
</tr>
</table>


## Table of contents

- [Description](#description)
- [Syntax](#syntax)
- [Examples](#examples)
- [Example 1: Dedup by one field](#example-1-dedup-by-one-field)
- [Example 2: Keep 2 duplicates documents](#example-2-keep-2-duplicates-documents)
- [Example 3: Keep or Ignore the empty field by default](#example-3-keep-or-ignore-the-empty-field-by-default)
- [Example 4: Dedup in consecutive document](#example-4-dedup-in-consecutive-document)
- [Limitation](#limitation)

## Description
Expand Down Expand Up @@ -126,7 +137,7 @@ PPL query:

### Limitation:

**Spark Support** ( >= 3.4)
**Spark Support** ( 3.5.1 +)

To translate `dedup` command with `allowedDuplication > 1`, such as `| dedup 2 a,b` to Spark plan, the solution is translating to a plan with Window function (e.g row_number) and a new column `row_number_col` as Filter.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
# PPL `eval` command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.5.1+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:yellow;">Experimental</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.5.0</th>
</tr>
</table>


## Description
The ``eval`` command evaluate the expression and append the result to the search result.

Expand Down Expand Up @@ -106,7 +122,7 @@ eval status_category =
```

### Limitation:
- `eval` with comma separated expression needs spark version >= 3.4
**Spark Support** ( 3.5.1 +)

- Overriding existing field is unsupported, following queries throw exceptions with "Reference 'a' is ambiguous"

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,21 @@
## PPL `fields` command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.0.0+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:greenyellow;">Stable</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.4.0</th>
</tr>
</table>


**Description**
Using ``field`` command to keep or remove fields from the search result.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,19 @@
## PPL Correlation Command
## PPL `grok` Command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.0.0+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:yellow;">Experimental</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.5.0</th>
</tr>
</table>


### Description
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
## PPL `head` Command

<table>
<tr>
<th style="color:gainsboro;">Spark</th>
<th style="color:greenyellow;">3.0.0+ </th>
</tr>
<tr>
<th style="color:gainsboro;">Status</th>
<th style="color:greenyellow;">Stable</th>
</tr>
<tr>
<th style="color:gainsboro;">Introduced In</th>
<th style="color:lightgreen;">0.4.0</th>
</tr>
</table>

**Description**
The ``head`` command returns the first N number of specified results after an optional offset in search order.

Expand Down
Loading
Loading