Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Extend ppl stats command functionality #660

Closed
YANG-DB opened this issue Sep 14, 2024 · 2 comments · Fixed by #800
Closed

[FEATURE]Extend ppl stats command functionality #660

YANG-DB opened this issue Sep 14, 2024 · 2 comments · Fixed by #800
Assignees
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 14, 2024

High level Review

The OpenSearch Piped Processing Language (PPL) currently lacks some advanced statistical aggregation capabilities similar to those provided by the eventstats command in Splunk Search Processing Language (SPL).
This feature request proposes adding new functions and syntax to PPL to enable statistical calculations and aggregations on event data.

Proposed Functionality:

  1. Aggregate statistical calculations:

    • Calculate common statistical measures like sum, count, min, max, avg, etc., on specific fields or expressions.
    • Support grouping events by one or more fields and performing statistical calculations within each group.
    • Allow renaming the calculated fields with custom names.
  2. Conditional aggregations:

    • Perform statistical calculations based on conditional expressions or filters.
    • Evaluate conditional expressions for each event and aggregate the results (e.g., sum of a conditional expression).
  3. Chaining and nesting:

    • Enable chaining and nesting of statistical calculations, similar to how eventstats commands can be chained in SPL.
    • Allow performing multiple levels of aggregations and calculations in a single query.
  4. Integration with existing PPL syntax:

    • Seamlessly integrate the new statistical aggregation capabilities with the existing PPL syntax and functions.
    • Ensure compatibility with other PPL features and maintain the overall usability and readability of the language.

Examples:

  1. Calculate the sum of a conditional expression grouped by a field:
stats sum(if(field1 = "value" and field2 like "%pattern%", 1, 0)) as conditional_sum by group_field
  1. Calculate minimum and maximum values of a field grouped by another field:
stats min(latency_field) as min_latency, max(latency_field) as max_latency by operation_id
  1. Chain multiple statistical calculations:
stats sum(count) as total_count by client_id | stats sum(total_count) as overall_total
@salyh
Copy link
Contributor

salyh commented Oct 5, 2024

@YANG-DB @vamsi-amazon

  • Are the “Conditional aggregations” related to the general availability of the if statement of is it an
    other “if” than the regular one -> https://github.com/opensearch-project/opensearch-
    spark/issues/398 (which is now CASE)
  • Chaining (as in Example 3) seems to be referring to the regular chain of ppl commands in general,
    right?
  • It seems that stats avg/sum/ etc is already supported/implemented as of now according to the docs, pls confirm
  • It seems that “by” grouping is already supported/implemented as of now according to the docs, pls confirm
  • Same for "Allow renaming the calculated fields with custom names."

So I am not exactly sure what the scope of this issue is because example 2+3 can be already executed successfully.
Example 3 can be executed when rewritten as CASE like ´stats sum(case(device-id = 'value1', 1, device-name = 'value2',2 else 1))`

@salyh salyh moved this from Todo to unknown in PPL Commands Oct 5, 2024
@YANG-DB YANG-DB moved this from unknown to Design in PPL Commands Oct 9, 2024
@YANG-DB
Copy link
Member Author

YANG-DB commented Oct 11, 2024

@salyh I've assigned this task to @LantaoJin
thanks for all your help !

@YANG-DB YANG-DB added the 0.6 label Oct 11, 2024
@LantaoJin LantaoJin moved this from Design to Todo in PPL Commands Oct 14, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in PPL Commands Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 enhancement New feature or request Lang:PPL Pipe Processing Language support
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants