Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New ingestion consumer #27668

Open
wants to merge 73 commits into
base: fix/tests
Choose a base branch
from

Conversation

benjackwhite
Copy link
Contributor

@benjackwhite benjackwhite commented Jan 18, 2025

Problem

This PR is the first in a few steps to start refactoring the plugin server to be much cleaner and more minimal

Changes

  • New consumer based of the same class based consumer for cdp
  • Essentially does the same thing as the old each-batch-ingestion process but cleaner and removing a lot of complex code paths
  • Changes to be much more config driven - this allows us to run multiple version of exactly the same consumer just with different source / destination topics and consumer group IDs
  • Adds a new mode for local running (not default) so that we can test all ingestion consumers this way - this will become the default once we release and remove the older options
  • Tests to cover most of this

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@benjackwhite benjackwhite changed the base branch from master to feat/producer-refactor January 19, 2025 10:46
@benjackwhite benjackwhite changed the title wip: New ingestion consumer feat: New ingestion consumer Jan 19, 2025
Copy link
Contributor

@pl pl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good step forward, the new IngestionConsumer is much cleaner than what I've seen in Mr Blobby.

I haven't done a super-detailed review of the ingestion code, as it would take me a bit more time to properly understand the old code and compare it with the new implementation. Let me know if you want me to do that.

I left some comments in a couple of places, they're not super critical, so I'll approve the PR and leave it to you what to do with them.

plugin-server/src/capabilities.ts Outdated Show resolved Hide resolved
plugin-server/src/types.ts Show resolved Hide resolved
plugin-server/src/main/pluginsServer.ts Show resolved Hide resolved

beforeEach(async () => {
const now = new Date(2025, 1, 1).getTime()
jest.spyOn(Date, 'now').mockReturnValue(now)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: This looks like it's gonna break when you run it in a different timezone – in the snapshot it's already a different date. I assume we don't want to mock the timers using jest because it's an integration test and it might be running timers somewhere in the middle of the stack. Could we do the same as we do with UUIDs or completely mask out the timestamps in the snapshots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I tried a few things and this seemed to be the best option. I'll circle back and see if I can make it nicer

})

describe('dropping events', () => {
describe.each(['headers', 'payload'] as const)('via %s', (kind) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: For me it's quite a bit of effort to understand those tests – there's quite a lot of back-and-forth:

  • to understand why the payload tests pass, I had to scroll up quite a bit to find that there's a default token in the payload
  • in the header tests the payloads still have the tokens – the name of the describe block suggest that we're testing events with tokens only in the headers

The tests below are short enough (and very readable on their own) that they could be duplicated and the createEvent function could be explicit on where the token lands, instead of relying on addMessageHeaders filtering by the kind of test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool point taken. I think its too early for this to be reviewed tbh as I'm mostly trying to understand the existing logic myself...

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@benjackwhite benjackwhite marked this pull request as ready for review January 21, 2025 18:33
@benjackwhite benjackwhite requested review from meikelmosby and pl and removed request for pl and meikelmosby January 21, 2025 18:33
# Conflicts:
#	frontend/__snapshots__/replay-player-success--second-recording-in-list--dark.png
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some bugs here - will pull out to a separate PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants