Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(persons-on-events): Support backfilling multiple teams, and store backfill state #20886

Merged
merged 5 commits into from
Mar 18, 2024

Conversation

tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Mar 13, 2024

Problem

The backfill command previously wasn't convenient to use to run a mass backfill over all teams, e.g. as a Kubernetes Job, or even just manually run from a toolbox pod.

This allows it to be executed over all teams that have not already been marked as backfilled unless a specific --team-id is provided. Tracking the state makes lets us make sure that no teams were missed, as well as prevents duplicate rows being unnecessarily inserted in case of repeat executions.

How did you test this code?

The more important Backfill was already tested, manually tested the CLI variations.

  1. Dry run, single --team-id
(env) ted@revuelto posthog % DEBUG=1 python manage.py backfill_distinct_id_overrides --team-id 1 
[…]
2024-03-13T00:03:02.937317Z [info     ] Starting backfill for 1 teams... [posthog.management.commands.backfill_distinct_id_overrides] pid=69010 tid=8013618880
2024-03-13T00:03:02.937486Z [info     ] Starting BackfillQuery(team_id=1)... [posthog.management.commands.backfill_distinct_id_overrides] pid=69010 tid=8013618880
2024-03-13T00:03:02.959076Z [info     ] BackfillQuery(team_id=1) would have inserted 0 records. [posthog.management.commands.backfill_distinct_id_overrides] pid=69010 tid=8013618880
  1. Dry run, multiple --team-id (including invalid team)
(env) ted@revuelto posthog % DEBUG=1 python manage.py backfill_distinct_id_overrides --team-id 1 --team-id 2
[…]
CommandError: Teams with ids {2} do not exist
  1. Live run, all teams (no existing setting state)
(env) ted@revuelto posthog % DEBUG=1 python manage.py backfill_distinct_id_overrides --live-run             
[…]
2024-03-13T00:03:27.389036Z [info     ] Starting backfill for 1 teams... [posthog.management.commands.backfill_distinct_id_overrides] pid=69148 tid=8013618880
2024-03-13T00:03:27.389224Z [info     ] Starting BackfillQuery(team_id=1)... [posthog.management.commands.backfill_distinct_id_overrides] pid=69148 tid=8013618880
2024-03-13T00:03:27.414098Z [info     ] Completed BackfillQuery(team_id=1) (marked 1 team as backfilled.) [posthog.management.commands.backfill_distinct_id_overrides] pid=69148 tid=8013618880
  1. Live run, all teams (but nothing to actually do since they were just backfilled)
(env) ted@revuelto posthog % DEBUG=1 python manage.py backfill_distinct_id_overrides --live-run
[…]
2024-03-13T00:03:42.634597Z [info     ] Starting backfill for 0 teams... [posthog.management.commands.backfill_distinct_id_overrides] pid=69247 tid=8013618880

@tkaemming tkaemming requested a review from a team March 13, 2024 00:24
Comment on lines 60 to 70
updated_teams = list(
Team.objects.raw(
"""
UPDATE posthog_team
SET extra_settings = COALESCE(extra_settings, '{}'::jsonb) || jsonb_build_object('distinct_id_overrides_backfilled', true)
WHERE id = %s
RETURNING *
""",
[self.team_id],
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed there would be a more convenient way to update this via the ORM… not sure the "shortcut" was worth it at this point.

@@ -14,10 +15,12 @@


@dataclass
class BackfillQuery:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more than a Query at this point.

@tkaemming tkaemming merged commit 78c9184 into master Mar 18, 2024
73 checks passed
@tkaemming tkaemming deleted the distinct-id-overrides-backfill-all-teams branch March 18, 2024 20:26
@tkaemming tkaemming mentioned this pull request Mar 22, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants