Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect calculation of aggregated columns with universe samples #147

Open
dongyoungy opened this issue Apr 29, 2018 · 1 comment
Open
Labels

Comments

@dongyoungy
Copy link
Contributor

I think there is a bug with universe samples in calculating aggregated column when the query contains group-by's that include columns of a universe sample.

For example, suppose an universe sample on column X of table A is used and the query contains group-by on column X,Y of table A. In this case, the sample contains every row in the groups, so it does not require any further adjustment using sampling probability. However, the current algorithm always adjusts aggregated values with sampling probability, which results in the estimated aggregated values having a very high error in this particular case.

@dongyoungy dongyoungy added the bug label Apr 30, 2018
@pyongjoo
Copy link
Member

In the future update, I will prevent this case. Let me keep this open until the pull request is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants