Incorrect calculation of aggregated columns with universe samples #147

dongyoungy · 2018-04-29T19:34:51Z

I think there is a bug with universe samples in calculating aggregated column when the query contains group-by's that include columns of a universe sample.

For example, suppose an universe sample on column X of table A is used and the query contains group-by on column X,Y of table A. In this case, the sample contains every row in the groups, so it does not require any further adjustment using sampling probability. However, the current algorithm always adjusts aggregated values with sampling probability, which results in the estimated aggregated values having a very high error in this particular case.

pyongjoo · 2018-11-13T07:51:13Z

In the future update, I will prevent this case. Let me keep this open until the pull request is merged.

dongyoungy added the bug label Apr 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect calculation of aggregated columns with universe samples #147

Incorrect calculation of aggregated columns with universe samples #147

dongyoungy commented Apr 29, 2018

pyongjoo commented Nov 13, 2018

Incorrect calculation of aggregated columns with universe samples #147

Incorrect calculation of aggregated columns with universe samples #147

Comments

dongyoungy commented Apr 29, 2018

pyongjoo commented Nov 13, 2018