-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(batch-exports): Add backfill workflow #17909
Conversation
78810d2
to
2faa175
Compare
02e2f42
to
ad344ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me (from what I know about backfills so far 😄).
I had a question about heartbeats, though.
""" | ||
|
||
async def heartbeat() -> None: | ||
"""Heartbeat factor times every heartbeat_timeout.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, it seems like this sleeps for timeout / factor
and then heartbeats once and returns? Should this recur or loop? (I'm probably missing something obvious.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, this is missing a while
loop, I'll add it in my next commit. Thank you!
Without the loop Python will run the task until completion whenever we await
, effectively heart-beating only once. This is not what we want, as we may be waiting for a while until a backfill finishes, and we will certainly hit the heartbeat_timeout
if we only heartbeat once.
I'm not super happy with the way we currently handle heartbeats: There is potential for us to abstract these so that we can re-use them in all workflows (and avoid me forgetting while loops!). Every workflow should implement heartbeats to allow cancellation and recovery, I just haven't wrapped my head around at what the heartbeat abstraction should look like; some Temporal samples use decorators which is effectively what I'm doing here...
Anyways, something to tackle in a follow-up PR.
a51041f
to
847032a
Compare
Had to rebase on |
0a16034
to
489b6f6
Compare
Problem
Backfills cannot go beyond 1000 runs due to limitations in Temporal buffer. Since we cannot control the buffer or even ping it for its status, let's manage backfills more manually with another level of indirection. This will allow users to backfill beyond the 1000 run limit, and could unblock us to add more frontend support for backfills. To avoid this PR getting too large, I'm not including any frontend changes here.
Changes
BackfillBatchExportWorkflow
to handle backfills.This workflow works by iterating over the periods to backfill and making backfill requests in batches of
buffer_limit
(by default 1 for safety). After a request is made, it waits for all the runs in the batch to be done, before sending the next request.BatchExportBackfill
model.This model exists purely for reporting purposes, similar to
BatchExportRun
. In the future, we can add information in the frontend for users like: Is there a backfill in progress? Which periods does it cover? Offer a cancel button, etc...BackfillBatchExportWorkflow
."NoOp"
as a destination, purely for testing reasons.👉 Stay up-to-date with PostHog coding conventions for a smoother review.
How did you test this code?
Unit tests.