feat: blobby ingestion warnings #20963

pauldambra · 2024-03-16T06:50:09Z

As an incident follow-up we want to write ingestion warnings from blob ingestion for the version of the web sdk that is writing the data. But we don't record that info in blobby or write ingestion warnings from blobby.

Let's start writing ingestion warnings from blobby...

Let's start with data we already drop - timestamp skew. Right now there's no way for a customer to see that this is happening.

Currently we wait for up to a month of skew which is way too much. Let's allow 7 days of skew before we start dropping data. (even if this isn't handling lag correctly, kafka would be dropping data based on age before this code does)

…on warnings

posthog-bot · 2024-03-16T08:12:40Z

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

github-actions · 2024-03-16T08:31:47Z

Size Change: +2.52 kB (0%)

Total Size: 824 kB

Filename	Size	Change
`frontend/dist/toolbar.js`	824 kB	+2.52 kB (0%)

_{compressed-size-action}

plugin-server/src/worker/ingestion/utils.ts

frontend/src/scenes/data-management/ingestion-warnings/IngestionWarningsView.tsx

plugin-server/src/main/ingestion-queues/session-recording/services/replay-events-ingester.ts

marandaneto · 2024-03-18T07:49:28Z

Nice improvement, this is important for Mobile since date and time is often not up to date.

marandaneto

Left a few comments but otherwise LGTM

Co-authored-by: Manoel Aranda Neto <[email protected]>

xvello

👍 on not passing the full db anymore, and happy that we're extending ingestion warnings!

There's a scalability issue with ingestion warnings though, as the app is currently loading the full dataset instead of paginating it when you unfurl a section. This causes the app to become unresponsive if there's too many warnings.

For the overflow warning we debounce per distinct_id once an hour per pod and it is not enough on some accounts.

We'll need to fix this by better CH deduplication and pagination on the read API, but in the meantime, we might want to move the token-bucket debouncing within captureIngestionWarning itself?

add a new optional debounce_key param, that'll be the session_id on your case
repurpose OverflowWarningLimiter into a wider IngestionWarningLimiter, still once per hour as currently setup
captureIngestionWarning consumes the limiter with ${type}:${teamId}:${debounce_key} if key is present, and skip the kafka produce if no token is available?

xvello

Please add a debounce key to the person state warnings then merge! Looks great

xvello · 2024-03-19T09:39:50Z

plugin-server/src/worker/ingestion/person-state.ts

@@ -323,15 +323,15 @@ export class PersonState {
            return undefined
        }
        if (isDistinctIdIllegal(mergeIntoDistinctId)) {
-            await captureIngestionWarning(this.db, teamId, 'cannot_merge_with_illegal_distinct_id', {
+            await captureIngestionWarning(this.db.kafkaProducer, teamId, 'cannot_merge_with_illegal_distinct_id', {
                illegalDistinctId: mergeIntoDistinctId,
                otherDistinctId: otherPersonDistinctId,
                eventUuid: this.event.uuid,
            })


I think it's best not to rate-limit these, as they contain useful info and merges are supposed to be rare.
I support not making the rate-limiting optional to avoid deploying a bomb, but let's add mergeIntoDistinctId as the debounce key for these the three warnings on this file

pushed a solution to let us override the debounce key construction so that the three uses in the person state are debounced together.... is that what you meant?

(happy to change it if I'm being silly :))

Sorry for being unclear: I do not wish to debounce these warnings, and send all of them to CH. They should be extremely infrequent unless there's a subtle instrumentation bug, and in that case it's better to have as much info as possible.

so keeping the safe by default... i've made it opt in to always sending rather than implicit on absent debounce key for e.g.

so

always debounce on team id and type

unless you provide a debounce key in which case it is team id, type, and key

unless you say alwaysSend in which case it always sends

This reverts commit c0d3002.

posthog-bot · 2024-03-19T15:28:40Z

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog-bot · 2024-03-19T15:39:17Z

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

xvello

Perfect! Thanks and sorry for the back-and-forth

pauldambra and others added 6 commits March 16, 2024 06:49

don't hand your trousers to the shop assistant

7450be0

write ingestion warning when timestamp is very invalidd

dfb166f

reduce amount of skew before we warn and add display code for ingesti…

7964bcc

…on warnings

one week of skew

6ffa7f4

is it this triggering linter

074dad9

Update UI snapshots for chromium (1)

30d94af

pauldambra added 3 commits March 16, 2024 08:17

this is fine

70053b0

this is fine

7f2a7bc

typo

c9e0dc1

pauldambra marked this pull request as ready for review March 16, 2024 08:19

pauldambra requested a review from a team March 17, 2024 14:59

pauldambra commented Mar 17, 2024

View reviewed changes

plugin-server/src/worker/ingestion/utils.ts Show resolved Hide resolved

marandaneto reviewed Mar 18, 2024

View reviewed changes

frontend/src/scenes/data-management/ingestion-warnings/IngestionWarningsView.tsx Outdated Show resolved Hide resolved

marandaneto reviewed Mar 18, 2024

View reviewed changes

frontend/src/scenes/data-management/ingestion-warnings/IngestionWarningsView.tsx Show resolved Hide resolved

marandaneto reviewed Mar 18, 2024

View reviewed changes

plugin-server/src/main/ingestion-queues/session-recording/services/replay-events-ingester.ts Show resolved Hide resolved

marandaneto approved these changes Mar 18, 2024

View reviewed changes

fix typo

3772513

Co-authored-by: Manoel Aranda Neto <[email protected]>

xvello reviewed Mar 18, 2024

View reviewed changes

pauldambra added 3 commits March 18, 2024 21:26

Merge branch 'master' into feat/blobby-ingestion-warning

739d76a

move limiting into capture ingestion warning

ed3df5c

fix tests

310a98c

pauldambra requested a review from xvello March 19, 2024 08:24

xvello approved these changes Mar 19, 2024

View reviewed changes

pauldambra added 4 commits March 19, 2024 10:19

shared debounce key for person state

c0d3002

Revert "shared debounce key for person state"

76fc781

This reverts commit c0d3002.

allow overriding debouncing of capture ingestion warning

a92ce2d

positive not negative

3932482

Update UI snapshots for chromium (2)

5f42c4f

Update UI snapshots for chromium (2)

debc0d3

xvello approved these changes Mar 19, 2024

View reviewed changes

pauldambra merged commit 906737b into master Mar 19, 2024
131 checks passed

pauldambra deleted the feat/blobby-ingestion-warning branch March 19, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: blobby ingestion warnings #20963

feat: blobby ingestion warnings #20963

pauldambra commented Mar 16, 2024 •

edited

Loading

posthog-bot commented Mar 16, 2024

github-actions bot commented Mar 16, 2024 •

edited

Loading

marandaneto commented Mar 18, 2024

marandaneto left a comment

xvello left a comment

xvello left a comment

xvello Mar 19, 2024

pauldambra Mar 19, 2024

xvello Mar 19, 2024

pauldambra Mar 19, 2024

posthog-bot commented Mar 19, 2024

posthog-bot commented Mar 19, 2024

xvello left a comment

feat: blobby ingestion warnings #20963

feat: blobby ingestion warnings #20963

Conversation

pauldambra commented Mar 16, 2024 • edited Loading

posthog-bot commented Mar 16, 2024

📸 UI snapshots have been updated

github-actions bot commented Mar 16, 2024 • edited Loading

marandaneto commented Mar 18, 2024

marandaneto left a comment

Choose a reason for hiding this comment

xvello left a comment

Choose a reason for hiding this comment

xvello left a comment

Choose a reason for hiding this comment

xvello Mar 19, 2024

Choose a reason for hiding this comment

pauldambra Mar 19, 2024

Choose a reason for hiding this comment

xvello Mar 19, 2024

Choose a reason for hiding this comment

pauldambra Mar 19, 2024

Choose a reason for hiding this comment

posthog-bot commented Mar 19, 2024

📸 UI snapshots have been updated

posthog-bot commented Mar 19, 2024

📸 UI snapshots have been updated

xvello left a comment

Choose a reason for hiding this comment

pauldambra commented Mar 16, 2024 •

edited

Loading

github-actions bot commented Mar 16, 2024 •

edited

Loading