-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ppl count approximate support #884
Changes from 5 commits
fd45d52
61c6bb8
724cbe9
424fad4
8dac8fe
b7f0855
0ae73e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,11 +6,12 @@ Using ``top`` command to find the most common tuple of values of all fields in t | |
|
||
### Syntax | ||
`top [N] <field-list> [by-clause]` | ||
`top_approx [N] <field-list> [by-clause]` | ||
|
||
* N: number of results to return. **Default**: 10 | ||
* field-list: mandatory. comma-delimited list of field names. | ||
* by-clause: optional. one or more fields to group the results by. | ||
|
||
* top_approx: approximate the count by using estimated [cardinality by HyperLogLog++ algorithm](https://spark.apache.org/docs/3.5.2/sql-ref-functions-builtin.html). | ||
|
||
### Example 1: Find the most common values in a field | ||
|
||
|
@@ -19,6 +20,7 @@ The example finds most common gender of all the accounts. | |
PPL query: | ||
|
||
os> source=accounts | top gender; | ||
os> source=accounts_approx | top gender; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we add this line? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typing error - thanks ! |
||
fetched rows / total rows = 2/2 | ||
+----------+ | ||
| gender | | ||
|
@@ -33,7 +35,7 @@ The example finds most common gender of all the accounts. | |
|
||
PPL query: | ||
|
||
os> source=accounts | top 1 gender; | ||
os> source=accounts_approx | top 1 gender; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
fetched rows / total rows = 1/1 | ||
+----------+ | ||
| gender | | ||
|
@@ -48,6 +50,7 @@ The example finds most common age of all the accounts group by gender. | |
PPL query: | ||
|
||
os> source=accounts | top 1 age by gender; | ||
os> source=accounts | top_approx 1 age by gender; | ||
fetched rows / total rows = 2/2 | ||
+----------+-------+ | ||
| gender | age | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be
rare_approx
here?