Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to interpret AQP errors? #99

Open
xuebinsu opened this issue Jan 26, 2018 · 4 comments
Open

How to interpret AQP errors? #99

xuebinsu opened this issue Jan 26, 2018 · 4 comments

Comments

@xuebinsu
Copy link

What do 0.0 and NaN mean in the error column? I notice that the reported error is 0.0 while the difference of the approximate result (returned by Verdict) and the true result (return by Spark SQL) is not 0. Why?

@pyongjoo
Copy link
Member

pyongjoo commented Feb 2, 2018

Sorry for a late response. I just wanted to let you know that we are working on this issue.

Something we know right now is that our error estimation logic can produce unexpected results (such as NaN or 0.0) when the sample size is small. We will first clarify its root causes and will add some checks to prevent such cases.

@GaoleMeng
Copy link

The current progress in fixing this bug:
The cause of the bug is that the subsample size equals to one for estimation, which makes the stddev of this subsample value to be "Nan". We try to add two verdict_default properties to fix this bug:

verdict.error_bound.minimum_subsample_size = 10
This suggests that when the subsample group number is smaller than this number (default for 10), we set the error bound of the value to be -1 (which represent infinity)

verdict.error_bound.trust_error_bound = 0.1
This parameter suggest that when the error is out of 10% of the value, we set it to -1 (which represent infinity)

So before the user sees the actual error bound, we check and revise the value to be either -1 or an interpretable value.

@xuebinsu
Copy link
Author

xuebinsu commented Apr 4, 2018

Thanks very much for your work! I'll test it.

@GaoleMeng
Copy link

Thx!
In fact, the code is still under review and not merged yet. We will update when we finished soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants