-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(web-analytics): Add Sessions Table V2 #23023
feat(web-analytics): Add Sessions Table V2 #23023
Conversation
Size Change: +485 B (+0.05%) Total Size: 1.06 MB ℹ️ View Unchanged
|
0621875
to
a94e711
Compare
305f328
to
c06e1cc
Compare
b3c281c
to
b9748f9
Compare
efd2a06
to
d88e4d0
Compare
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
I see Paul reviewed https://github.com/PostHog/product-internal/pull/601, so he probably has some context already – tagged him here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me, but sadly I don't have any context on sessions table v2
a88fefc
to
7e0f9d3
Compare
7e0f9d3
to
cf099f6
Compare
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
* Add raw_sessions table * Fix * Change time chunking to 5 minutes * Add modes of operation, and some comments * WIP wire up sessions table V2 * More working v2 sessions tests * Optimize imports * Fix v1 sessions table test * Fix more tests * Fix channel type tests * Fix session replay joining with v2 * Web analytics queries and their tests working * Fix where clause extractor tests for v1 * Fix backfill script * Show last select query instead of first * Run ruff * Fix ids in tests * Fix database init * spelling * Fix test_query * Formatting * Handle session properties with v2 session table * Add more columns, fix some properties * Add new properties to taxonomy * Fix trends tests * Fix modifiers * Set v1 sessions table to default * Capture viewport size * Fix keyword arg rename * Update query snapshots * Update query snapshots * Fix test_utils * Fix test_trends * Make it easier to run test_parser_cpp from pytcharm * Update query snapshots * Update query snapshots * Add last external click url to the sessions MV * Update query snapshots * Add test_last_external_click_url * Add ingest from date * Run schema build after a rebase * Tweak test_all * Update query snapshots * Run schema after rebase * Update query snapshots * Update UI snapshots for `chromium` (2) * Update UI snapshots for `chromium` (2) * Change ingestion date and add explaining comment --------- Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Don't merge this until #22954 is in, as this PR uses that as a baseThis PR currently has some failing tests, these tests are failing on master and are not introduced by this PR. I'll rebase when they are fixed. See https://posthog.slack.com/archives/C0113360FFV/p1718961268054819
Before this PR gets merged, we should change the where clause of the MV to only include events in the future. I'll do this as a ~last step before merging, as we might need to change it depending on when this gets merged.
Problem
The previous session design was not fast enough, and did not allow for sampling. See https://github.com/PostHog/product-internal/pull/601
Changes
The biggest difference between the v1 and v2 tables is that v2 relies on the session_id being a uuid7, and can use this assumption to have a much better index. When just querying the session table alone for date range queries, this was around twice as fast, and should be even more of an upgrade when joining to the events table.
It also will support sampling, I expect we will have to tune the index granularity to work well but in theory we should be able to get very fast queries working, sampling by session.
Does this work well for both Cloud and self-hosted?
Yes
How did you test this code?
Added tests for the new mode, kept running tests against current mode