Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate PPL LOOKUP Command #686

Merged
merged 5 commits into from
Sep 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/PPL-Lookup-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## PPL Lookup Command

## Overview
Lookup command enriches your search data by adding or replacing data from a lookup index (dimension table).
You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched.
As an alternative of [Join command](../docs/PPL-Join-command.md), lookup command is more suitable for enriching the source data with a static dataset.


### Syntax of Lookup Command

```sql
SEARCH source=<sourceIndex>
| <other piped command>
| LOOKUP <lookupIndex> (<lookupMappingField> [AS <sourceMappingField>])...
[(REPLACE | APPEND) (<inputField> [AS <outputField>])...]
| <other piped command>
```
**lookupIndex**
- Required
- Description: the name of lookup index (dimension table)

**lookupMappingField**
- Required
- Description: A mapping key in \<lookupIndex\>, analogy to a join key from right table. You can specify multiple \<lookupMappingField\> with comma-delimited.

**sourceMappingField**
- Optional
- Default: \<lookupMappingField\>
- Description: A mapping key from source **query**, analogy to a join key from left side. If you don't specify any \<sourceMappingField\>, its default value is \<lookupMappingField\>.

**inputField**
- Optional
- Default: All fields of \<lookupIndex\> where matched values are applied to result output if no field is specified.
- Description: A field in \<lookupIndex\> where matched values are applied to result output. You can specify multiple \<inputField\> with comma-delimited. If you don't specify any \<inputField\>, all fields of \<lookupIndex\> where matched values are applied to result output.

**outputField**
- Optional
- Default: \<inputField\>
- Description: A field of output. You can specify multiple \<outputField\>. If you specify \<outputField\> with an existing field name in source query, its values will be replaced or appended by matched values from \<inputField\>. If the field specified in \<outputField\> is a new field, an extended new field will be applied to the results.

**REPLACE | APPEND**
- Optional
- Default: REPLACE
- Description: If you specify REPLACE, matched values in \<lookupIndex\> field overwrite the values in result. If you specify APPEND, matched values in \<lookupIndex\> field only append to the missing values in result.

### Usage
> LOOKUP <lookupIndex> id AS cid REPLACE mail AS email</br>
> LOOKUP <lookupIndex> name REPLACE mail AS email</br>
> LOOKUP <lookupIndex> id AS cid, name APPEND address, mail AS email</br>
> LOOKUP <lookupIndex> id</br>

### Example
```sql
SEARCH source=<sourceIndex>
| WHERE orderType = 'Cancelled'
| LOOKUP account_list, mkt_id AS mkt_code REPLACE amount, account_name AS name
| STATS count(mkt_code), avg(amount) BY name
```
```sql
SEARCH source=<sourceIndex>
| DEDUP market_id
| EVAL category=replace(category, "-", ".")
| EVAL category=ltrim(category, "dvp.")
| LOOKUP bounce_category category AS category APPEND classification
```
```sql
SEARCH source=<sourceIndex>
| LOOKUP bounce_category category
```
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,54 @@ trait FlintSparkSuite extends QueryTest with FlintSuite with OpenSearchSuite wit
| """.stripMargin)
}

protected def createPeopleTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| id INT,
| name STRING,
| occupation STRING,
| country STRING,
| salary INT
| )
| USING $tableType $tableOptions
|""".stripMargin)

// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'Jake', 'Engineer', 'England' , 100000),
| (1001, 'Hello', 'Artist', 'USA', 70000),
| (1002, 'John', 'Doctor', 'Canada', 120000),
| (1003, 'David', 'Doctor', null, 120000),
| (1004, 'David', null, 'Canada', 0),
| (1005, 'Jane', 'Scientist', 'Canada', 90000)
| """.stripMargin)
}

protected def createWorkInformationTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
| (
| uid INT,
| name STRING,
| department STRING,
| occupation STRING
| )
| USING $tableType $tableOptions
|""".stripMargin)

// Insert data into the new table
sql(s"""
| INSERT INTO $testTable
| VALUES (1000, 'Jake', 'IT', 'Engineer'),
| (1002, 'John', 'DATA', 'Scientist'),
| (1003, 'David', 'HR', 'Doctor'),
| (1005, 'Jane', 'DATA', 'Engineer'),
| (1006, 'Tom', 'SALES', 'Artist')
| """.stripMargin)
}

protected def createOccupationTopRareTable(testTable: String): Unit = {
sql(s"""
| CREATE TABLE $testTable
Expand Down
Loading
Loading