-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Redshift batch export uses spmc consumer #26897
refactor: Redshift batch export uses spmc consumer #26897
Conversation
9e1e133
to
97bf04c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. The only slightly confusing thing is that we're trying to fit a SQL-based writer into a temporary file-writer abstraction but think this is fine for now
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
The abstraction should cover both cases: It's a writer, whether it's writing to a temporary file, or a SQL query buffer, should be left up to the subclass to decide. That being said, I am open to expressing this more clearly in the code, as it was certainly just a temporary file writer in the beginning. Maybe we need more separation between "writer" and "writee" to make it more obvious, and some of the sql buffer logic moved to a new "writee". |
Problem
Redshift batch export is not using SPMC abstractions, instead it is doing its own thing. This leads to duplication and harder to read code. The main reason it wasn't migrated is that Redshift doesn't write to a file but instead to a query that then gets executed. We can still adapt the same abstractions as all other batch exports though, so that's what we do here.
Changes
BatchExportWriter
:RedshiftInsertBatchExportWriter
. This new writer will write INSERT queries for Redshift.RedshiftConsumer
that flushes these queries by executing them.run_consumer_loop
is nowrun_consumer
as we no longer run multiple in a loop, this may change later.run_consumer
now takes an instance ofConsumer
which simplifies the signature slightly.👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?