Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: count replay events in ClickHouse as we ingest them #16994

Merged
merged 30 commits into from
Sep 14, 2023

Conversation

pauldambra
Copy link
Member

@pauldambra pauldambra commented Aug 10, 2023

Problem

Currently when debugging or responding to incidents we want to see how many $snapshot events we're receiving per team per unit time. That's easy with the session_recording_events table but will be impossible with session_replay_events table as that's an aggregating merge table.

Changes

Count the kafka messages on the way in and add them to a new column in session_replay_events table
And count events received in each session

Screenshot 2023-09-11 at 13 55 53

This isn't quite the same because we can't say events per unit time.

A 30 minute session with 10k events spread evenly will be indistinguishable from a 30 minute session that had 9k events in the first second and 1k over the rest of the time.

But it's better than nothing

How did you test this code?

Checking the counts look reasonable locally

@pauldambra
Copy link
Member Author

Ah @fuziontech has been auto-tagged useful!

The last couple of ClickHouse migrations have felt super painful (particularly in EU) was going to tag you to see if we need to gear up to get this through.

There's no rush on this, am not expecting to merge it this week

Copy link
Contributor

@benjackwhite benjackwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I assume its safe to be adding this field, even though we won't be writing to it yet 🤔

@pauldambra
Copy link
Member Author

pauldambra commented Aug 14, 2023

even though we won't be writing to it yet 🤔

It's "auto-written" to

We simply count the kafka messages in. So you get the count "for free" (https://github.com/PostHog/posthog/pull/16994/files#diff-c561c6c55dc4ab44eb990cfb6c1f1776dbff40818e9edd594147cfc8833fd9e9R1386)

@benjackwhite
Copy link
Contributor

even though we won't be writing to it yet 🤔

It's "auto-written" to

We simply count the kafka messages in. So you get the count "for free" (https://github.com/PostHog/posthog/pull/16994/files#diff-c561c6c55dc4ab44eb990cfb6c1f1776dbff40818e9edd594147cfc8833fd9e9R1386)

Ooohhhhhhhhhh.... Then I'm not sure I like it 😅

If we are just counting the rows then we are counting "number of events after all of our batching efforts" rather than "number of events regardless of our batching efforts". I guess the question is - why do we want to count them? It's useful to know which browsers are sending us lots of individual rrweb events as that might indicate bugs / weird behaviour. The batched event count is less "useful" perhaps as there are so many different things that could affect it (and we might improve it over time). So I feel like the raw count of events is the more interesting thing. But I could of course be wrong here...

@pauldambra
Copy link
Member Author

Ooohhhhhhhhhh.... Then I'm not sure I like it 😅

So clever it's gone around the circle back to stupid? 🤣

sum(size) as size,
-- we can count the number of kafka messages
-- as a proxy for the number of events
count(*) as event_count
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will rename this message_count i.e. how much flows through kafka count session_replay_summaries

we will add event_count which is the count of rrweb_events in each session replay summary

So we can see how well we're batching, how much traffic

@PostHog PostHog deleted a comment from posthog-bot Aug 28, 2023
@pauldambra pauldambra removed the stale label Aug 28, 2023
@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

sum(size) as size
sum(size) as size,
-- we can count the number of kafka messages instead of sending it explicitly
count(*) as message_count,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clever... that's not always a good thing 🤣

Is it better to explicitly add message_count: 1 in the plugin server to aid the future traveller?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a bit more clear with message_count: 1

@pauldambra pauldambra marked this pull request as ready for review September 11, 2023 12:58
@pauldambra pauldambra removed the stale label Sep 11, 2023
@pauldambra
Copy link
Member Author

@fuziontech @benjackwhite I've added _timestamp to this as well.

Tested by migrating from scratch, and migrating from the current state in master and could ingest recordings in both states and see the expected values

Screenshot 2023-09-12 at 17 46 07

Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

sum(size) as size
sum(size) as size,
-- we can count the number of kafka messages instead of sending it explicitly
count(*) as message_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be a bit more clear with message_count: 1

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@pauldambra pauldambra merged commit 68a4e18 into master Sep 14, 2023
@pauldambra pauldambra deleted the feat/count-replay-events branch September 14, 2023 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants