Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Speed up filtering persons #25604

Merged
merged 92 commits into from
Oct 21, 2024
Merged

perf: Speed up filtering persons #25604

merged 92 commits into from
Oct 21, 2024

Conversation

timgl
Copy link
Collaborator

@timgl timgl commented Oct 15, 2024

Problem

Currently, we first group/argmax every single person, then we filter that result set down. It'd be much faster to filter first, then group by.

Changes

Filter first, then group by, then filter again to make sure the results are still correct.

The resulting query is a bit ugly (basically 3 sub queries) but it takes the filtering of the persons page for researchgate down from 80-90 seconds to 2 seconds!

Also removed the personsArgMaxVersion and optimizeJoinedFilters modifiers as those were making things more complicated and weren't relevant anymore.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@timgl timgl merged commit 3f75806 into master Oct 21, 2024
94 checks passed
@timgl timgl deleted the speed-up-person-queries branch October 21, 2024 15:12
Copy link

sentry-io bot commented Oct 21, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ NotImplementedError: IsTimeOrIntervalConstantVisitor has no method visit_array posthog.tasks.tasks.process_query_task View Issue
  • ‼️ NotImplementedError: IsTimeOrIntervalConstantVisitor has no method visit_array /api/projects/{parent_lookup_team_id}/query/ View Issue
  • ‼️ NotImplementedError: IsTimeOrIntervalConstantVisitor has no method visit_array posthog.tasks.tasks.process_query_task View Issue
  • ‼️ NotImplementedError: IsTimeOrIntervalConstantVisitor has no method visit_array posthog.tasks.tasks.process_query_task View Issue
  • ‼️ NotImplementedError: IsTimeOrIntervalConstantVisitor has no method visit_array posthog.tasks.tasks.process_query_task View Issue

Did you find this useful? React with a 👍 or 👎

timgl added a commit that referenced this pull request Oct 21, 2024
fuziontech added a commit that referenced this pull request Oct 21, 2024
* master: (100 commits)
  revert: perf: Speed up filtering persons (#25715)
  chore: count data ingested by source (#25714)
  fix(bi): Dont try to be too clever with variables (#25712)
  fix(cdp): Ask for kinesis stream name, not the ARN (#25711)
  fix(batch-exports): Deal with deeply nested events (#25710)
  fix(batch-exports): Ingest as JSON in Redshift super type (#25709)
  perf: Speed up filtering persons (#25604)
  feat(data-warehouse): Add button to cancel batch export backfill run (#25706)
  feat(cdp): add discord template (#25649)
  feat(max): Show query summary (#25696)
  feat(errors): Use exception list and remove stack trace raw (#25655)
  fix: smaller cohort resource usage for replay filters (#25699)
  chore: use meta columns in data visualization table (#25697)
  feat(cdp): add klaviyo templates (#25693)
  feat(max): Answer feedback buttons (#25670)
  feat(cdp): add google ads integration warning (#25698)
  feat(hog): print ast in console (#25608)
  feat(data-warehouse): edit SQL based import configs (#25685)
  fix(insights): handle negation property in cohort property filters (#25691)
  fix: warming and filter (#25674)
  ...
timgl added a commit that referenced this pull request Oct 25, 2024
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Tomás Farías Santana <[email protected]>
Co-authored-by: Marius Andra <[email protected]>
Co-authored-by: Georgiy Tarasov <[email protected]>
Co-authored-by: David Newell <[email protected]>
Co-authored-by: PostHog Bot <[email protected]>
Co-authored-by: Richard Borcsik <[email protected]>
Co-authored-by: Michael Matloka <[email protected]>
Co-authored-by: Eric Duong <[email protected]>
Co-authored-by: Paul D'Ambra <[email protected]>
Co-authored-by: Juraj Majerik <[email protected]>
Co-authored-by: Phani Raj <[email protected]>
Co-authored-by: Tom Owers <[email protected]>
Co-authored-by: Dylan Martin <[email protected]>
Co-authored-by: Oliver Browne <[email protected]>
Co-authored-by: Marcus Hof <[email protected]>
Co-authored-by: Michael Matloka <[email protected]>
Co-authored-by: Anirudh Pillai <[email protected]>
Co-authored-by: Sandy Spicer <[email protected]>
Co-authored-by: Neil Kakkar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.