feat(persons): Add bulk deletion endpoint #24790

timgl · 2024-09-04T16:53:25Z

Problem

Most users only have distinct_ids saved locally, not our person ID. So in order to delete a person, they first do a call to the persons endpoint to get the UUID, then a call to delete that person. This is inefficient, especially when you have to delete a lot of users

PR has a lot of changes because of #24815, but the actual changes are in this commit

Changes

Create a bulk delete endpoint that can handle both uuids and distinct ids.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Twixes

Sounds great. A few suggestions to ensure the bulk deletes are efficient and fast

posthog/api/person.py

Twixes · 2024-09-05T14:51:49Z

posthog/api/person.py

+        """
+        This endpoint allows you to bulk delete persons, either by the PostHog persons ID or by distinct IDs. You can pass through a maximum of 100 ids per call.
+        """
+        if request.data.get("distinct_ids"):


Total nit: The walrus operator would prevent us having to request.data.get("distinct_ids") twice in this method

Suggested change

if request.data.get("distinct_ids"):

if distinct_ids := request.data.get("distinct_ids"):

Twixes · 2024-09-05T14:55:54Z

posthog/api/person.py

+                AsyncDeletion.objects.bulk_create(
+                    [
+                        AsyncDeletion(
+                            deletion_type=DeletionType.Person,
+                            team_id=self.team_id,
+                            key=str(person.uuid),
+                            created_by=cast(User, self.request.user),
+                        )
+                    ],
+                    ignore_conflicts=True,
+                )


This is going to be up to 100 INSERTs per request, would be great to AsyncDeletion.objects.bulk_create() outside of the loop

Same problem as #24790 (comment)

Twixes · 2024-09-05T14:56:22Z

posthog/api/person.py

+
+        for person in persons:
+            delete_person(person=person)
+            self.perform_destroy(person)


Why not bulk persons.objects.delete()?

I was worried about deletions failing halfway through, because we can't do transactions in clickhouse. So if we first bulk-deleted all the persons in posthog, but failed to half the people in clickhouse, we'd end up in a weird state. By doing it sequentially, at least the failure will be contained to one person, rather than potentially up to 100.

Hm, I don't think we can tell if a ClickHouse-side delete fails anyway, because all delete_person() does is queue a deletion row into Kafka, which is extremely unlikely to fail. So if we first bulk-delete in Postgres, and then emit deletion rows to CH, that should be the highest level of integrity possible in this situation.

My point is we may not even emit those rows if we fail halfway through.

It's definitely a tradeoff – in this explicit bulk delete case it doesn't feel great to put this O(n) load on Postgres, but I see what you mean. For maximum integrity this route, we should swap the delete_person(person=person) and self.perform_destroy(person) lines though.

Hm this doesn't work as the person gets deleted before clickhouse get a chance to delete it. I also think it's more likely for clickhouse to fail, thus the current order does make sense

posthog/api/person.py

Co-authored-by: Michael Matloka <[email protected]>

…24816)

…4770)

…lag (#23587) Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

)

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

…ery/` (#24808)

Co-authored-by: Neil Kakkar <[email protected]>

…24876)

Co-authored-by: Michael Matloka <[email protected]>

…thog into bulk-delete-endpoint

timgl requested a review from Twixes September 4, 2024 17:01

Twixes requested changes Sep 5, 2024

View reviewed changes

timgl and others added 5 commits September 5, 2024 16:53

feat: Add bulk deletion endpoint

bf66822

fix

d0aa06f

Update posthog/api/person.py

5269bb7

Co-authored-by: Michael Matloka <[email protected]>

Update posthog/api/person.py

e90f3c7

Co-authored-by: Michael Matloka <[email protected]>

Update posthog/api/person.py

2cf9516

Co-authored-by: Michael Matloka <[email protected]>

timgl force-pushed the bulk-delete-endpoint branch from eb5df07 to 2cf9516 Compare September 5, 2024 15:53

timgl and others added 3 commits September 5, 2024 16:55

nit

5f65b12

docs nit

86b966e

Remove from Postgres before ClickHouse emit

5e25cc7

Twixes approved these changes Sep 6, 2024

View reviewed changes

Twixes changed the title ~~feat: Add bulk deletion endpoint~~ feat(persons): Add bulk deletion endpoint Sep 6, 2024

Twixes and others added 17 commits September 9, 2024 09:23

Fix person instance access

8d5053d

chore: allow exporting mobile recordings while impersonating a user (#…

4196136

…24816)

feat(onboarding-templates): create custom variable selector panel (#2…

5c72dda

…4770)

feat(insights): launch funnels as a Clickhouse UDF behind a feature f…

380f683

…lag (#23587) Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

fix(hedgehog): Fix hedgehog in profile pictures (#24813)

f5791bb

feat: add sdk target area to support form (#24827)

68d7252

fix: Removed old heartbeat metric (#24829)

49629bf

feat: Switch batch exports logging to async (#24819)

0da6e88

fix: rrweb player now has a destroy method (#24826)

dacc610

chore: refactor RedisLimiter to update async out of the hot path (#24818

4f2c08d

)

feat(cdp): sendgrid template updates (#24814)

be8187a

fix(hog): bools and numbers are not "empty" (#24835)

28f3367

feat(propdefs): add filtering to allow for gradual rollout (#24820)

91cd660

feat: show absolute time in seekbar preview (#24837)

97e8ec2

fix(propdefs): push all updates into batch before releasing (#24841)

682e8f9

fix: patch mobile recordings that are missing their meta event (#24840)

b399e5f

fix(billing): Avoid double period in over limit warning (#24831)

e08c403

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

pauldambra and others added 26 commits September 10, 2024 11:35

fix: make the version checker banner team aware (#24861)

3fb562c

feat: web vitals metrics allowlist (#24850)

5cd21ea

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

fix(dashboards): Make sure loading dashboard items does not `POST /qu…

7777500

…ery/` (#24808)

fix(dashboards): Ensure text card draggability (#24859)

ec97dd8

chore(feature flags): tooltip for the Match evaluation column (#24862)

a42d0fc

Co-authored-by: Neil Kakkar <[email protected]>

feat: Hedgehog mode skins (#24838)

b1c688c

chore(deps): Update posthog-js to 1.161.1 (#24866)

1fb2ec9

chore(data-warehouse): enable logging (#24867)

46c9a32

chore(data-warehouse): add psutil (#24875)

5630444

feat(cdp): Use cyclotron part 2 (#24746)

ca766ee

chore(data-warehouse): update deploy conditions for temporal workers (#…

5bb8cc6

…24876)

chore: Update readme for web analytics launch (#24812)

7a67664

feat: add replay overflow limiter to rust capture (#24803)

47e860b

fxi

5c875d5

feat: Add bulk deletion endpoint

bb05558

fix

dceb26f

Update posthog/api/person.py

b7e8a75

Co-authored-by: Michael Matloka <[email protected]>

Update posthog/api/person.py

735c969

Co-authored-by: Michael Matloka <[email protected]>

Update posthog/api/person.py

0ff2391

Co-authored-by: Michael Matloka <[email protected]>

nit

0bb1f89

docs nit

93857ed

Remove from Postgres before ClickHouse emit

8cd543b

Fix person instance access

3c5f554

fxi

9a29853

Merge branch 'master' into bulk-delete-endpoint

a5d1230

Merge branch 'bulk-delete-endpoint' of https://github.com/posthog/pos…

ac40e2a

…thog into bulk-delete-endpoint

timgl enabled auto-merge (squash) September 10, 2024 10:49

fix

37af305

timgl merged commit c9bf238 into master Sep 10, 2024
84 checks passed

timgl deleted the bulk-delete-endpoint branch September 10, 2024 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(persons): Add bulk deletion endpoint #24790

feat(persons): Add bulk deletion endpoint #24790

timgl commented Sep 4, 2024 •

edited

Loading

Twixes left a comment

Twixes Sep 5, 2024

Twixes Sep 5, 2024

timgl Sep 5, 2024

Twixes Sep 5, 2024

timgl Sep 5, 2024

Twixes Sep 5, 2024

timgl Sep 6, 2024

Twixes Sep 6, 2024

timgl Sep 10, 2024

	if request.data.get("distinct_ids"):
	if distinct_ids := request.data.get("distinct_ids"):

feat(persons): Add bulk deletion endpoint #24790

feat(persons): Add bulk deletion endpoint #24790

Conversation

timgl commented Sep 4, 2024 • edited Loading

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

Twixes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timgl commented Sep 4, 2024 •

edited

Loading