-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: blobby ingestion warnings #20963
Conversation
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
Size Change: +2.52 kB (0%) Total Size: 824 kB
|
frontend/src/scenes/data-management/ingestion-warnings/IngestionWarningsView.tsx
Outdated
Show resolved
Hide resolved
frontend/src/scenes/data-management/ingestion-warnings/IngestionWarningsView.tsx
Show resolved
Hide resolved
plugin-server/src/main/ingestion-queues/session-recording/services/replay-events-ingester.ts
Show resolved
Hide resolved
Nice improvement, this is important for Mobile since date and time is often not up to date. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments but otherwise LGTM
Co-authored-by: Manoel Aranda Neto <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on not passing the full db
anymore, and happy that we're extending ingestion warnings!
There's a scalability issue with ingestion warnings though, as the app is currently loading the full dataset instead of paginating it when you unfurl a section. This causes the app to become unresponsive if there's too many warnings.
For the overflow warning we debounce per distinct_id once an hour per pod and it is not enough on some accounts.
We'll need to fix this by better CH deduplication and pagination on the read API, but in the meantime, we might want to move the token-bucket debouncing within captureIngestionWarning
itself?
- add a new optional
debounce_key
param, that'll be thesession_id
on your case - repurpose
OverflowWarningLimiter
into a widerIngestionWarningLimiter
, still once per hour as currently setup captureIngestionWarning
consumes the limiter with${type}:${teamId}:${debounce_key}
if key is present, and skip the kafka produce if no token is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a debounce key to the person state warnings then merge! Looks great
@@ -323,15 +323,15 @@ export class PersonState { | |||
return undefined | |||
} | |||
if (isDistinctIdIllegal(mergeIntoDistinctId)) { | |||
await captureIngestionWarning(this.db, teamId, 'cannot_merge_with_illegal_distinct_id', { | |||
await captureIngestionWarning(this.db.kafkaProducer, teamId, 'cannot_merge_with_illegal_distinct_id', { | |||
illegalDistinctId: mergeIntoDistinctId, | |||
otherDistinctId: otherPersonDistinctId, | |||
eventUuid: this.event.uuid, | |||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best not to rate-limit these, as they contain useful info and merges are supposed to be rare.
I support not making the rate-limiting optional to avoid deploying a bomb, but let's add mergeIntoDistinctId
as the debounce key for these the three warnings on this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed a solution to let us override the debounce key construction so that the three uses in the person state are debounced together.... is that what you meant?
(happy to change it if I'm being silly :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being unclear: I do not wish to debounce these warnings, and send all of them to CH. They should be extremely infrequent unless there's a subtle instrumentation bug, and in that case it's better to have as much info as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so keeping the safe by default... i've made it opt in to always sending rather than implicit on absent debounce key for e.g.
so
- always debounce on team id and type
- unless you provide a debounce key in which case it is team id, type, and key
- unless you say alwaysSend in which case it always sends
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks and sorry for the back-and-forth
As an incident follow-up we want to write ingestion warnings from blob ingestion for the version of the web sdk that is writing the data. But we don't record that info in blobby or write ingestion warnings from blobby.
Let's start writing ingestion warnings from blobby...
Let's start with data we already drop - timestamp skew. Right now there's no way for a customer to see that this is happening.
Currently we wait for up to a month of skew which is way too much. Let's allow 7 days of skew before we start dropping data. (even if this isn't handling lag correctly, kafka would be dropping data based on age before this code does)