-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(experiments): New trends calculation methods #26256
Conversation
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from a first preliminary glance, this looks statistically sound to me!
over the coming days, I would like to spend more time on digging deeper down into the stats, so likely I'll have more input on the methodology in general then
intervals = {} | ||
|
||
for variant in variants: | ||
posterior = stats.gamma(a=variant.count + 1, scale=1 / variant.absolute_exposure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add the beta to be statistically correct.
also, suggest to add PRIOR_ALPHA, and PRIOR_BETA as constants in this file, to avoid hard coding them.
posterior = stats.gamma(a=variant.count + 1, scale=1 / variant.absolute_exposure) | |
posterior = stats.gamma(a=variant.count + 1, scale=1 / (variant.absolute_exposure + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, here we use variant.absolute_exposure
where as in calculate_probabilities we use variant.exposure
. What's the difference between them and why is the other used here?
def calculate_probabilities( | ||
control_variant: ExperimentVariantTrendsBaseStats, | ||
test_variants: list[ExperimentVariantTrendsBaseStats], | ||
simulations: int = 100_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is an arbitrary large number for now, which is fine. we should do some quick testing to see if we really need such a large number
wins = sum( | ||
target_sample > max(other_variant_samples) | ||
for target_sample, other_variant_samples in zip(target_samples, zip(*other_samples)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use vectorization here, much faster than a loop. haven't tested, but something like this perhaps (assuming we are working with numpy arrays here).
wins = (target_samples[:, None] > other_samples).sum(axis=1).sum()
Closing in favor of #26760, which is based on the new Jupyter notebook. |
Changes
TBD
How did you test this code?
Tests should pass.