-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL fieldsummary
command
#766
PPL fieldsummary
command
#766
Conversation
- antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <[email protected]>
- antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <[email protected]>
fix scala style format Signed-off-by: YANGDB <[email protected]>
# Conflicts: # ppl-spark-integration/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java
…ng table identifier only has 2 parts) Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]>
Signed-off-by: YANGDB <[email protected]>
# Conflicts: # ppl-spark-integration/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java
Signed-off-by: YANGDB <[email protected]>
Would you mind change the PR status to DRAFT if you are going to refactor? |
@LantaoJin plz review |
@ToString | ||
@RequiredArgsConstructor | ||
@EqualsAndHashCode(callSuper = false) | ||
public class NamedExpression extends UnresolvedExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove this definition or find an alternative, for example reusing Argument
? Because
- NamedExpression should have a name
- NamedExpression generally should be an abstract class and the parent of Attribute or Alias. We should refactor many codes if we really need it.
- It's confused me with Spark NamedExpression, it is not worth to introduce a new expression for fieldsummary command IMO. At least, not
NamedExpression
@@ -39,7 +42,8 @@ | |||
*/ | |||
public interface DataTypeTransformer { | |||
static <T> Seq<T> seq(T... elements) { | |||
return seq(List.of(elements)); | |||
return seq(Arrays.stream(elements).filter(Objects::nonNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment for this changes? What case did you see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes its not relevant any more - thanks for pointing it out
SUM(FunctionName.of("sum")), | ||
COUNT(FunctionName.of("count")), | ||
COUNT_DISTINCT(FunctionName.of("count_distinct")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a DISTINCT_COUNT. (did we miss it here?) And seems you are not use it as a built-in function name in following codes, instead, count_distinct
is an alias name.
Signed-off-by: YANGDB <[email protected]>
FIELDSUMMARY: 'FIELDSUMMARY'; | ||
INCLUDEFIELDS: 'INCLUDEFIELDS'; | ||
NULLS: 'NULLS'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add these keywords to keywordsCanBeId
too.
|
||
os> source = t | fieldsummary includefields= id, status_code, request_path nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
||
os> source = t | where status_code != 200 | fieldsummary includefields= status_code nulls=true | ||
+------------------+-------------+------------+------------+------------+------------+------------+------------+----------------| | ||
| Fiels | COUNT | COUNT_DISTINCT | MIN | MAX | AVG | MEAN | STDDEV | NUlls | TYPEOF | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiels
-> Fields
?
* add support for FieldSummary - antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <[email protected]> * add support for FieldSummary - antlr syntax - ast expression builder - ast node builder - catalyst ast builder Signed-off-by: YANGDB <[email protected]> * update sample query fix scala style format Signed-off-by: YANGDB <[email protected]> * support spark prior to 3.5 with its extended table identifier (existing table identifier only has 2 parts) Signed-off-by: YANGDB <[email protected]> * update union queries based summary Signed-off-by: YANGDB <[email protected]> * update scala fmt style Signed-off-by: YANGDB <[email protected]> * update scala fmt style Signed-off-by: YANGDB <[email protected]> * update query with where clause predicate Signed-off-by: YANGDB <[email protected]> * update command and remove the topvalues Signed-off-by: YANGDB <[email protected]> * update command docs Signed-off-by: YANGDB <[email protected]> * update with comments feedback Signed-off-by: YANGDB <[email protected]> * update `FIELD SUMMARY` symbols to the keywordsCanBeId bag of words Signed-off-by: YANGDB <[email protected]> --------- Signed-off-by: YANGDB <[email protected]>
Description
This PR implements the
fieldsummary
PPL command.Issues Resolved
#662
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.