-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: count replay events in ClickHouse as we ingest them #16994
Conversation
Ah @fuziontech has been auto-tagged useful! The last couple of ClickHouse migrations have felt super painful (particularly in EU) was going to tag you to see if we need to gear up to get this through. There's no rush on this, am not expecting to merge it this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I assume its safe to be adding this field, even though we won't be writing to it yet 🤔
It's "auto-written" to We simply count the kafka messages in. So you get the count "for free" (https://github.com/PostHog/posthog/pull/16994/files#diff-c561c6c55dc4ab44eb990cfb6c1f1776dbff40818e9edd594147cfc8833fd9e9R1386) |
Ooohhhhhhhhhh.... Then I'm not sure I like it 😅 If we are just counting the rows then we are counting "number of events after all of our batching efforts" rather than "number of events regardless of our batching efforts". I guess the question is - why do we want to count them? It's useful to know which browsers are sending us lots of individual rrweb events as that might indicate bugs / weird behaviour. The batched event count is less "useful" perhaps as there are so many different things that could affect it (and we might improve it over time). So I feel like the raw count of events is the more interesting thing. But I could of course be wrong here... |
So clever it's gone around the circle back to stupid? 🤣 |
sum(size) as size, | ||
-- we can count the number of kafka messages | ||
-- as a proxy for the number of events | ||
count(*) as event_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we will rename this message_count
i.e. how much flows through kafka count session_replay_summaries
we will add event_count
which is the count of rrweb_events in each session replay summary
So we can see how well we're batching, how much traffic
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
sum(size) as size | ||
sum(size) as size, | ||
-- we can count the number of kafka messages instead of sending it explicitly | ||
count(*) as message_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is clever... that's not always a good thing 🤣
Is it better to explicitly add message_count: 1
in the plugin server to aid the future traveller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a bit more clear with message_count: 1
@fuziontech @benjackwhite I've added Tested by migrating from scratch, and migrating from the current state in master and could ingest recordings in both states and see the expected values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
sum(size) as size | ||
sum(size) as size, | ||
-- we can count the number of kafka messages instead of sending it explicitly | ||
count(*) as message_count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be a bit more clear with message_count: 1
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
Problem
Currently when debugging or responding to incidents we want to see how many $snapshot events we're receiving per team per unit time. That's easy with the
session_recording_events
table but will be impossible withsession_replay_events
table as that's an aggregating merge table.Changes
Count the kafka messages on the way in and add them to a new column in
session_replay_events
tableAnd count events received in each session
This isn't quite the same because we can't say events per unit time.
A 30 minute session with 10k events spread evenly will be indistinguishable from a 30 minute session that had 9k events in the first second and 1k over the rest of the time.
But it's better than nothing
How did you test this code?
Checking the counts look reasonable locally