Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]add appendcol ppl command #3172

Open
YANG-DB opened this issue Nov 28, 2024 · 1 comment
Open

[FEATURE]add appendcol ppl command #3172

YANG-DB opened this issue Nov 28, 2024 · 1 comment
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Nov 28, 2024

Is your feature request related to a problem?
This feature request proposes the addition of an appendcol- command to OpenSearch’s Piped Processing Language (PPL). The appendcol command (modeled after Splunk’s command of the same name), allows users to append the result of one or more searches as additional columns to the existing search results, offering a powerful mechanism to enrich data by combining results horizontally.

What solution would you like?
The following query:

source=logs | stats count by status | appendcol [ stats avg(response_time) by status ]

In this example the appendcol command adds a new column containing the average response time for each status alongside the original column that shows the count by status. The original row structure is preserved.

Implementation could probably use rewriting into a join query.

Add Proposal Document

In many use cases, users need to compare or merge multiple result sets side-by-side without altering the existing row structure. The appendcol command will become a used feature for horizontally merging datasets, and bringing this feature to OpenSearch PPL would greatly enhance data processing flexibility.

By adding appendcol to PPL, users can:

  • Combine different queries into a unified result set without duplicating rows.
  • Enrich an existing dataset with additional metrics or fields from other searches.
  • Improve the efficiency and readability of complex queries.

Technical Details

Syntax
The appendcol command would accept a query inside square brackets [ ], representing the additional pipeline that produces the column(s) to append to the original result set.

Behavior
The new column(s) would be aligned with the rows of the original dataset based on their order of appearance. Each appended column must produce the same number of rows as the base dataset to ensure proper alignment. Any discrepancies in row counts could result in null values for mismatched rows.

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Nov 28, 2024
@YANG-DB YANG-DB moved this to Todo in PPL Commands Nov 28, 2024
@YANG-DB YANG-DB assigned YANG-DB and unassigned YANG-DB Nov 29, 2024
@YANG-DB YANG-DB moved this from Todo to Design in PPL Commands Dec 3, 2024
@dblock dblock removed the untriaged label Dec 16, 2024
@dblock
Copy link
Member

dblock commented Dec 16, 2024

[Catch All Triage - 1, 2, 3]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: Design
Development

No branches or pull requests

2 participants