Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch-exports): Add backfill workflow #17909

Merged
merged 15 commits into from
Oct 19, 2023
Merged

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Oct 11, 2023

Problem

Backfills cannot go beyond 1000 runs due to limitations in Temporal buffer. Since we cannot control the buffer or even ping it for its status, let's manage backfills more manually with another level of indirection. This will allow users to backfill beyond the 1000 run limit, and could unblock us to add more frontend support for backfills. To avoid this PR getting too large, I'm not including any frontend changes here.

Changes

  • Added a BackfillBatchExportWorkflow to handle backfills.

This workflow works by iterating over the periods to backfill and making backfill requests in batches of buffer_limit (by default 1 for safety). After a request is made, it waits for all the runs in the batch to be done, before sending the next request.

  • Added a BatchExportBackfill model.

This model exists purely for reporting purposes, similar to BatchExportRun. In the future, we can add information in the frontend for users like: Is there a backfill in progress? Which periods does it cover? Offer a cancel button, etc...

  • Backfill API endpoint now starts a BackfillBatchExportWorkflow.
  • Adds "NoOp" as a destination, purely for testing reasons.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

How did you test this code?

Unit tests.

@tomasfarias tomasfarias changed the title feat(batch-exports): Add backfill model and service support feat(batch-exports): Add backfill abstraction Oct 11, 2023
@tomasfarias tomasfarias marked this pull request as ready for review October 11, 2023 13:35
@tomasfarias tomasfarias force-pushed the feat/backfill-abstraction branch 2 times, most recently from 78810d2 to 2faa175 Compare October 13, 2023 10:58
@tomasfarias tomasfarias changed the title feat(batch-exports): Add backfill abstraction feat(batch-exports): Add backfill workflow Oct 13, 2023
@tomasfarias tomasfarias force-pushed the feat/backfill-abstraction branch from 02e2f42 to ad344ac Compare October 13, 2023 15:12
@tomasfarias tomasfarias requested a review from a team October 13, 2023 20:49
Copy link
Contributor

@bretthoerner bretthoerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me (from what I know about backfills so far 😄).

I had a question about heartbeats, though.

posthog/batch_exports/service.py Outdated Show resolved Hide resolved
"""

async def heartbeat() -> None:
"""Heartbeat factor times every heartbeat_timeout."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, it seems like this sleeps for timeout / factor and then heartbeats once and returns? Should this recur or loop? (I'm probably missing something obvious.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, this is missing a while loop, I'll add it in my next commit. Thank you!

Without the loop Python will run the task until completion whenever we await, effectively heart-beating only once. This is not what we want, as we may be waiting for a while until a backfill finishes, and we will certainly hit the heartbeat_timeout if we only heartbeat once.

I'm not super happy with the way we currently handle heartbeats: There is potential for us to abstract these so that we can re-use them in all workflows (and avoid me forgetting while loops!). Every workflow should implement heartbeats to allow cancellation and recovery, I just haven't wrapped my head around at what the heartbeat abstraction should look like; some Temporal samples use decorators which is effectively what I'm doing here...

Anyways, something to tackle in a follow-up PR.

@tomasfarias tomasfarias force-pushed the feat/backfill-abstraction branch from a51041f to 847032a Compare October 17, 2023 09:29
@tomasfarias
Copy link
Contributor Author

tomasfarias commented Oct 17, 2023

Had to rebase on master to bump the migration.

@tomasfarias tomasfarias force-pushed the feat/backfill-abstraction branch from 0a16034 to 489b6f6 Compare October 19, 2023 12:21
@tomasfarias tomasfarias merged commit a7ae3b0 into master Oct 19, 2023
68 checks passed
@tomasfarias tomasfarias deleted the feat/backfill-abstraction branch October 19, 2023 13:09
Justicea83 pushed a commit to Justicea83/posthog that referenced this pull request Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants