Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Redshift batch export uses spmc consumer #26897

Merged

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Dec 13, 2024

Problem

Redshift batch export is not using SPMC abstractions, instead it is doing its own thing. This leads to duplication and harder to read code. The main reason it wasn't migrated is that Redshift doesn't write to a file but instead to a query that then gets executed. We can still adapt the same abstractions as all other batch exports though, so that's what we do here.

Changes

  • Create a new BatchExportWriter: RedshiftInsertBatchExportWriter. This new writer will write INSERT queries for Redshift.
  • Redshift now runs a RedshiftConsumer that flushes these queries by executing them.
  • run_consumer_loop is now run_consumer as we no longer run multiple in a loop, this may change later.
  • Also run_consumer now takes an instance of Consumer which simplifies the signature slightly.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@tomasfarias tomasfarias changed the title refactor: Initialize consumer separate from run_consumer_loop refactor: Redshift batch export uses spmc consumer Dec 13, 2024
@tomasfarias tomasfarias force-pushed the refactor/redshift-batch-export-uses-spmc-consumer branch from 9e1e133 to 97bf04c Compare December 13, 2024 15:00
@tomasfarias tomasfarias requested a review from rossgray December 13, 2024 16:41
Copy link
Contributor

@rossgray rossgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The only slightly confusing thing is that we're trying to fit a SQL-based writer into a temporary file-writer abstraction but think this is fine for now

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

@tomasfarias
Copy link
Contributor Author

tomasfarias commented Dec 30, 2024

@rossgray

Looks good. The only slightly confusing thing is that we're trying to fit a SQL-based writer into a temporary file-writer abstraction but think this is fine for now

The abstraction should cover both cases: It's a writer, whether it's writing to a temporary file, or a SQL query buffer, should be left up to the subclass to decide. That being said, I am open to expressing this more clearly in the code, as it was certainly just a temporary file writer in the beginning. Maybe we need more separation between "writer" and "writee" to make it more obvious, and some of the sql buffer logic moved to a new "writee".

@tomasfarias tomasfarias merged commit c53f931 into master Jan 2, 2025
92 checks passed
@tomasfarias tomasfarias deleted the refactor/redshift-batch-export-uses-spmc-consumer branch January 2, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants