Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(blobby): add new recordings-blob-ingestion-overflow role #21161

Merged
merged 12 commits into from
Apr 2, 2024

Conversation

xvello
Copy link
Contributor

@xvello xvello commented Mar 26, 2024

Problem

Implements the last piece of https://github.com/PostHog/product-internal/pull/573, consuming the new topic with the same logic.

Changes

  • Add a new recordings-blob-ingestion-overflow plugin server role for cloud, triggering a new capability. Capability is turned off on hobby by default.
  • Add a new consumeOverflow constructor param to SessionRecordingIngester. Reusing the same code to avoid duplication and increasing maintenance cost. Only difference so far is the different topic and consumergroup, and never running overflow detection.

Does this work well for both Cloud and self-hosted?

Not enabled by default on hobby, can be enabled for local devenv by setting SESSION_RECORDING_OVERFLOW_ENABLED

How did you test this code?

  • All integration tests (except overflow detection) run on both consumer flavors
  • Will test the full flow on dev before rolling out the prod consumers

Running local devenv with REPLAY_OVERFLOW_SESSIONS_ENABLED=true CAPTURE_CONFIG_REDIS_HOST=localhost:6379 SESSION_RECORDING_OVERFLOW_BUCKET_CAPACITY=1 SESSION_RECORDING_OVERFLOW_ENABLED=true DEBUG=1 ./bin/start, session starts on main topic, redis is updated, session moved to overflow topic, data is still readable through the app:

image

Consumers are properly commiting offsets: "messages behind" is zero

image image

@xvello xvello force-pushed the xvello/replay-overflow-detection-3 branch from c05ffef to fd5477d Compare March 27, 2024 13:29
Base automatically changed from xvello/replay-overflow-detection-2 to master March 27, 2024 13:47
@xvello xvello force-pushed the xvello/replay-overflow-detection-3 branch from fd5477d to c39f28d Compare March 27, 2024 13:53
@xvello xvello requested review from pauldambra and a team March 27, 2024 14:12
Copy link
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Comment on lines +170 to +173
this.topic = consumeOverflow
? KAFKA_SESSION_RECORDING_SNAPSHOT_ITEM_OVERFLOW
: KAFKA_SESSION_RECORDING_SNAPSHOT_ITEM_EVENTS
this.consumerGroupId = this.consumeOverflow ? KAFKA_CONSUMER_GROUP_ID_OVERFLOW : KAFKA_CONSUMER_GROUP_ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 👌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! I grepped for raw uses of the topic name (var name and value) and could not find any. Hopefully I tracked them all down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any others either. 👍

@@ -43,6 +46,7 @@ require('@sentry/tracing')

// WARNING: Do not change this - it will essentially reset the consumer
const KAFKA_CONSUMER_GROUP_ID = 'session-recordings-blob'
const KAFKA_CONSUMER_GROUP_ID_OVERFLOW = 'session-recordings-blob-overflow'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish github would let you comment on unchanged lines

reading through this again got me thinking of the various prometheus metrics. we have on our dashboard

I guess we need to change the dashboard so I can view metrics from not-overflow and overflow deployments separately rather than labelling them all on whether they're overflow

Copy link
Contributor Author

@xvello xvello Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prom metrics inherit the role tag from the deploy's label, I'm planning on adding that tag as a dashboard var, keeping the default to the main consumer. We'll have to duplicate some kafka widgets, but the rest will be reusable without duping the whole dash.

@xvello xvello merged commit db30cd2 into master Apr 2, 2024
77 checks passed
@xvello xvello deleted the xvello/replay-overflow-detection-3 branch April 2, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants