Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(experiments): New trends calculation methods #26256

Closed
wants to merge 6 commits into from

Conversation

danielbachhuber
Copy link
Contributor

Changes

TBD

How did you test this code?

Tests should pass.

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

Copy link
Contributor

@andehen andehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a first preliminary glance, this looks statistically sound to me!

over the coming days, I would like to spend more time on digging deeper down into the stats, so likely I'll have more input on the methodology in general then

intervals = {}

for variant in variants:
posterior = stats.gamma(a=variant.count + 1, scale=1 / variant.absolute_exposure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add the beta to be statistically correct.
also, suggest to add PRIOR_ALPHA, and PRIOR_BETA as constants in this file, to avoid hard coding them.

Suggested change
posterior = stats.gamma(a=variant.count + 1, scale=1 / variant.absolute_exposure)
posterior = stats.gamma(a=variant.count + 1, scale=1 / (variant.absolute_exposure + 1))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, here we use variant.absolute_exposure where as in calculate_probabilities we use variant.exposure. What's the difference between them and why is the other used here?

def calculate_probabilities(
control_variant: ExperimentVariantTrendsBaseStats,
test_variants: list[ExperimentVariantTrendsBaseStats],
simulations: int = 100_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is an arbitrary large number for now, which is fine. we should do some quick testing to see if we really need such a large number

Comment on lines +83 to +86
wins = sum(
target_sample > max(other_variant_samples)
for target_sample, other_variant_samples in zip(target_samples, zip(*other_samples))
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use vectorization here, much faster than a loop. haven't tested, but something like this perhaps (assuming we are working with numpy arrays here).

wins = (target_samples[:, None] > other_samples).sum(axis=1).sum()

@posthog-bot posthog-bot removed the stale label Dec 6, 2024
@danielbachhuber
Copy link
Contributor Author

Closing in favor of #26760, which is based on the new Jupyter notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants