-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Rare with group by cause job abort due to result size too large #611
Comments
LIMIT 10 is not the same as Rare. To achieve the desired result, we should use a window function to limit the output to 10 records per group.
|
modifying the query to:
results in:
The physical plan being:
Now the plan looks way different with the window function. |
But regardless, same failure can happen to SQL as well. Not a PPL |
The new |
What is the bug?
Running
source=myglue_test.default.http_logs | rare request by clientip
gets error:
However, an equivalent SQL query:
select count(*) as cnt, request, clientip from myglue_test.default.http_logs group by request, clientip order by cnt asc limit 10
could get the result correctly:SQL query result
The
LIMIT 10
clause might cause a difference in SQL query, but I add it becauserare
command defaults to size 10 as wellPhysical plan for PPL query:
Physical plan for SQL query:
What is your host/environment?
The text was updated successfully, but these errors were encountered: