-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Eventstats
in PPL
#800
Conversation
Signed-off-by: Lantao Jin <[email protected]>
Signed-off-by: Lantao Jin <[email protected]>
- `source = table | eventstats percentile(c, 90)` | ||
- `source = table | eventstats percentile_approx(c, 99)` | ||
|
||
**Limitation: distinct aggregation could not used in `eventstats`:**_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also add it cant be used in conjunction with stats
- probably obvious but still need to be noted...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This limitation is only for the new command eventstats
, not for stats
. If we add a limitation note for conjunction with stats
, similar, a limitation note for conjunction with every other commands would be considered. Would it be gilding the lily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok make sense...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an excellent document - can u plz create a similar doc for the stats
command ?
in a new PR ...
thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stats
command already had one https://github.com/opensearch-project/opensearch-spark/blob/main/docs/ppl-lang/ppl-stats-command.md. And stats
command is more straightforward, we even have a website doc https://opensearch.org/docs/latest/search-plugins/sql/ppl/functions/#stats to introduce what it is.
Key aspects of `eventstats`: | ||
|
||
1. It performs calculations across the entire result set or within defined groups. | ||
2. The original events remain intact, with new fields added to contain the statistical results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in some cases I would only want to see a subset of fields with the enriched aggregation - should we allow adding the fields
command after ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in some cases I would only want to see a subset of fields with the enriched aggregation - should we allow adding the
fields
command after ?
Yes. This command only enriches the row with new columns (depends on how many aggregation used in eventstats
), it allows to add fields
command (any other type of commands) after it with |
symbol.
* Support Eventstats in PPL Signed-off-by: Lantao Jin <[email protected]> * add doc Signed-off-by: Lantao Jin <[email protected]> --------- Signed-off-by: Lantao Jin <[email protected]> Co-authored-by: YANGDB <[email protected]>
Description
PPL
eventstats
commandDescription
The
eventstats
command enriches your event data with calculated summary statistics. It operates by analyzing specified fields within your events, computing various statistical measures, and then appending these results as new fields to each original event.Key aspects of
eventstats
:Difference between
stats
andeventstats
The
stats
andeventstats
commands are both used for calculating statistics, but they have some key differences in how they operate and what they produce:stats
: Produces a summary table with only the calculated statistics.eventstats
: Adds the calculated statistics as new fields to the existing events, preserving the original data.stats
: Reduces the result set to only the statistical summary, discarding individual events.eventstats
: Retains all original events and adds new fields with the calculated statistics.stats
: Best for creating summary reports or dashboards. Often used as a final command to summarize results.eventstats
: Useful when you need to enrich events with statistical context for further analysis or filtering. Can be used mid-search to add statistics that can be used in subsequent commands.Syntax
(check "docs/ppl-lang/ppl-eventstats-command.md" for details)
Event Aggregations
See additional command details
source = table | eventstats avg(a)
source = table | where a < 50 | eventstats avg(c)
source = table | eventstats max(c) by b
source = table | eventstats count(c) by b | head 5
source = table | eventstats stddev_samp(c)
source = table | eventstats stddev_pop(c)
source = table | eventstats percentile(c, 90)
source = table | eventstats percentile_approx(c, 99)
Limitation: distinct aggregation could not used in
eventstats
:_source = table | eventstats distinct_count(c)
(throw exception)Aggregations With Span
source = table | eventstats count(a) by span(a, 10) as a_span
source = table | eventstats sum(age) by span(age, 5) as age_span | head 2
source = table | eventstats avg(age) by span(age, 20) as age_span, country | sort - age_span | head 2
Aggregations With TimeWindow Span (tumble windowing function)
source = table | eventstats sum(productsAmount) by span(transactionDate, 1d) as age_date | sort age_date
source = table | eventstats sum(productsAmount) by span(transactionDate, 1w) as age_date, productId
Aggregations Group by Multiple Times
source = table | eventstats avg(age) as avg_state_age by country, state | eventstats avg(avg_state_age) as avg_country_age by country
source = table | eventstats avg(age) as avg_city_age by country, state, city | eval new_avg_city_age = avg_city_age - 1 | eventstats avg(new_avg_city_age) as avg_state_age by country, state | where avg_state_age > 18 | eventstats avg(avg_state_age) as avg_adult_country_age by country
Related Issues
Resolves #660
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.