[FEATURE] PPL `lookup` Functionality #2651

brijos · 2024-05-03T17:40:35Z

Is your feature request related to a problem?
OpenSearch users want an easy way to enrich the data they have stored in OpenSearch and external data sources using content from an OpenSearch index. This is common in security analytics scenarios where one wants to enrich their IP reputation lists, vulnerability databases, or threat feeds.

What solution would you like?
Do a lookup of a field/value, from another log group and use that to convert to user friendly name/error code.

The lookup feature should support the OpenSearch indexes as data sources for lookup tables
Users should be able to perform a static lookup using the OpenSearch index and external data sources based on the Spark integration
Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and OpenSearch index
Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and an external data source
Admins can use Index State Management to control how long the reference lookup index is available
Include helpful error messages

*** Out of Scope ***

Defining new lookup data sources beyond listed types
Automatic field mapping between events and lookups

What alternatives have you considered?
Performing joins using SQL

Do you have any additional context?
None.

dblock · 2024-06-24T16:32:26Z

Catch All Triage - 1 2 3 4 5 6

salyh · 2024-07-16T11:53:02Z

PPL Lookup Design Proposal

As implemented in PR 2698 the proposed design (and so far implemented) syntax is:

Design

The lookup command can be implemented as a simple search for documents in the lookup index. For every row in a search result, a search with the given match fields are performed. If a single document is found, the fields and values of the lookup document are copied to the current row of the search result. If no document is found a no-op is performed. If multiple documents are found, an error is thrown. The implementation is mainly done in core/src/main/java/org/opensearch/sql/analysis/Analyzer.java and opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java. The "lookup search" is performed as a term and a match query. So both cases are catched: When a field is not analyzed or analyzed with respect to it mapping.

The Spark PPL Lookup command implementation is done in separate PR in the opensearch-spark repo: PR 407. Here we can (and need) to implement it as a join.

Syntax

lookup <lookup index> <lookup field> [AS <local lookup field>] [<lookup field> [AS <local lookup field>]]… [appendonly=true|false] [<source field> [AS <local field>]]...

lookup is the name of the lookup operation and it is supposed to be changed to something else. Normally we would use lookup but this seems to already used otherwise in the AST.

<lookup index> is the name of the lookup index (mandatory).

Then we need at least one <lookup field> which is a field in the lookup index used to match to a local field (in the current search) to get the lookup document. When there is no lookup document we just do nothing, if there is more than one we fail with an error.

If more than one <lookup field> is provided, all of them must match (we do a term and a match query for the field value as of now)

If the field has a different name in the current search result use <local lookup field> to map it.

appendonly is false by default abnd inidicates if the values we copy over to the search result from the lookup documemnt should overwrite existing values. If appendonly is true we do not overwrite existing values.

<source field> are the fields that should be copied. If no such fields are given all fields are copied. If the field should have a different name than in the lookup document use [AS <local field>

Examples:

{"query":"source=logins | lookup users uid AS id appendonly=true"}

{"query":"source=logins | lookup users uid,name phone,department AS thedepartment"}

Gokul-Radhakrishnan · 2024-09-05T12:02:39Z

+1 for this feature

LantaoJin · 2024-09-24T14:17:46Z

PPL Lookup Command Design

As implemented in opensearch-project/opensearch-spark#686 the proposed design doc:

Overview

Lookup command enriches your search data by adding or replacing data from a lookup index (dimension table).
You can extend fields of an index with values from a dimension table, append or replace values when lookup condition is matched.
As an alternative of Join command, lookup command is more suitable for enriching the source data with a static dataset.

Syntax of Lookup Command

SEARCH source=<sourceIndex>
| <other piped command>
| LOOKUP <lookupIndex> (<lookupMappingField> [AS <sourceMappingField>])...
    [(REPLACE | APPEND) (<inputField> [AS <outputField>])...]
| <other piped command>

lookupIndex

Required
Description: the name of lookup index (dimension table)

lookupMappingField

Required
Description: A mapping key in <lookupIndex>, analogy to a join key from right table. You can specify multiple <lookupMappingField> with comma-delimited.

sourceMappingField

Optional
Default: <lookupMappingField>
Description: A mapping key from source query, analogy to a join key from left side. If you don't specify any <sourceMappingField>, its default value is <lookupMappingField>.

inputField

Optional
Default: All fields of <lookupIndex> where matched values are applied to result output if no field is specified.
Description: A field in <lookupIndex> where matched values are applied to result output. You can specify multiple <inputField> with comma-delimited. If you don't specify any <inputField>, all fields of <lookupIndex> where matched values are applied to result output.

outputField

Optional
Default: <inputField>
Description: A field of output. You can specify multiple <outputField>. If you specify <outputField> with an existing field name in source query, its values will be replaced or appended by matched values from <inputField>. If the field specified in <outputField> is a new field, an extended new field will be applied to the results.

REPLACE | APPEND

Optional
Default: REPLACE
Description: If you specify REPLACE, matched values in <lookupIndex> field overwrite the values in result. If you specify APPEND, matched values in <lookupIndex> field only append to the missing values in result.

Usage

source = table1 | lookup table2 id
source = table1 | lookup table2 id, name
source = table1 | lookup table2 id as cid, name
source = table1 | lookup table2 id as cid, name replace dept as department
source = table1 | lookup table2 id as cid, name replace dept as department, city as location
source = table1 | lookup table2 id as cid, name append dept as department
source = table1 | lookup table2 id as cid, name append dept as department, city as location

Examples

SEARCH source=<sourceIndex>
| WHERE orderType = 'Cancelled'
| LOOKUP account_list, mkt_id AS mkt_code REPLACE amount, account_name AS name
| STATS count(mkt_code), avg(amount) BY name

SEARCH source=<sourceIndex>
| DEDUP market_id
| EVAL category=replace(category, "-", ".")
| EVAL category=ltrim(category, "dvp.")
| LOOKUP bounce_category category AS category APPEND classification

SEARCH source=<sourceIndex>
| LOOKUP bounce_category category

salyh · 2024-10-08T14:11:03Z

@YANG-DB guess we can close this one because it seems done in opensearch-project/opensearch-spark#686 ?

YANG-DB · 2024-10-08T17:08:33Z

@YANG-DB guess we can close this one because it seems done in opensearch-project/opensearch-spark#686 ?

We are keeping this open since this issue refers to the PPL OpenSearch engine

brijos added enhancement New feature or request untriaged labels May 3, 2024

lukasz-soszynski-eliatra mentioned this issue May 29, 2024

Lookup #2698

Open

6 tasks

dblock removed the untriaged label Jun 24, 2024

salyh mentioned this issue Jul 2, 2024

PPL Lookup opensearch-project/opensearch-spark#407

Closed

YANG-DB mentioned this issue Jul 3, 2024

[META]Add Missing PPL commands - campaign opensearch-project/opensearch-spark#408

Open

35 tasks

LantaoJin mentioned this issue Sep 20, 2024

Translate PPL LOOKUP Command opensearch-project/opensearch-spark#686

Merged

4 tasks

anasalkouz added this to PPL Commands Oct 4, 2024

YANG-DB moved this to Design in PPL Commands Oct 7, 2024

YANG-DB added the PPL Piped processing language label Oct 7, 2024

YANG-DB moved this from Design to In Progress in PPL Commands Oct 9, 2024

YANG-DB moved this from In Progress to Design in PPL Commands Oct 9, 2024

YANG-DB changed the title ~~[FEATURE] PPL LOOKUP Functionality~~ [FEATURE] PPL lookup Functionality Nov 28, 2024

YANG-DB self-assigned this Nov 29, 2024

YANG-DB moved this from Design to In Progress in PPL Commands Nov 30, 2024

This was referenced Dec 2, 2024

Lookup join #3179

Closed

Lookup - ppl command #3181

Closed

YANG-DB removed the status in PPL Commands Dec 12, 2024

YANG-DB moved this to InReview in PPL Commands Dec 12, 2024

Swiddis added the catch-all acknowledged Marks issues caught in catch-all triage for searchability label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] PPL `lookup` Functionality #2651

[FEATURE] PPL `lookup` Functionality #2651

brijos commented May 3, 2024 •

edited

Loading

dblock commented Jun 24, 2024

salyh commented Jul 16, 2024

Gokul-Radhakrishnan commented Sep 5, 2024

LantaoJin commented Sep 24, 2024

salyh commented Oct 8, 2024

YANG-DB commented Oct 8, 2024

[FEATURE] PPL lookup Functionality #2651

[FEATURE] PPL lookup Functionality #2651

Comments

brijos commented May 3, 2024 • edited Loading

dblock commented Jun 24, 2024

salyh commented Jul 16, 2024

PPL Lookup Design Proposal

Design

Syntax

Gokul-Radhakrishnan commented Sep 5, 2024

LantaoJin commented Sep 24, 2024

PPL Lookup Command Design

Overview

Syntax of Lookup Command

Usage

Examples

salyh commented Oct 8, 2024

YANG-DB commented Oct 8, 2024

[FEATURE] PPL `lookup` Functionality #2651

[FEATURE] PPL `lookup` Functionality #2651

brijos commented May 3, 2024 •

edited

Loading