Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(environments): Make taxonomy reads + writes project–based #26766

Merged
merged 12 commits into from
Dec 16, 2024

Conversation

Twixes
Copy link
Member

@Twixes Twixes commented Dec 9, 2024

Problem

#26521 added project_id to EventDefinition, PropertyDefinition, and EventProperty – along with some indexes based on project_id. However, these indexes are only useful if every row in these tables has project_id set, which is not the case. For GroupTypeMapping, we just did a backfill – that table had 5k rows per region, not a lot.
THESE tables have 100-500 MILLION rows per region and that's problematic. We can technically backfill, but that'd be a multi-hour migration, a significant challenge. (CC @PostHog/team-infra this is the long-running migration I was asking about in your channel)

Changes

Let's be smarter, if slightly messier, about project_id here. We don't actually need to backfill the field. The exact same behavior can be achieved by using coalesce(project_id, team_id) – so that's the expression we need to index on.

This PR drops the useless indexes on just project_id, and adds useful ones on coalesce(project_id, team_id).
We make those reworked immediately indexes useful in two ways:

  • in rust/prop-defs, we're replacing ON CONFLICT (team_id, ...) clauses with ON CONFLICT (coalesce(project_id, team_id), ...)
  • in Django, we're replacing filtering on team_id = %(team_id)s with filtering on coalesce(project_id, team_id) = %(project_id)s

How did you test this code?

Should pass existing Django tests.
Would be useful to add a test ensuring the ON CONFLICT logic works in prop-defs, but not sure if there's already a relevant test to edit, or where to add one. @oliverb123

@Twixes Twixes requested a review from oliverb123 December 9, 2024 17:13
@Twixes Twixes changed the title feat(environments): Make taxonomy insertion project–based feat(environments): Make taxonomy reads and writes project–based Dec 9, 2024
@Twixes Twixes changed the title feat(environments): Make taxonomy reads and writes project–based feat(environments): Make taxonomy reads + writes project–based Dec 9, 2024
Copy link
Member

@rafaeelaudibert rafaeelaudibert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we ever plan to drop team_id? If so, we're simply delaying the work :)

@Twixes
Copy link
Member Author

Twixes commented Dec 9, 2024

@rafaeelaudibert Good question. I'm not planning that right now. This coalesce()-based filtering definitely is a kind of tech debt, but at the same time proper cleanup means a few migrations all-in-all affecting a billion+ Postgres rows, which we don't have a playbook for – I don't feel very comfortable running that, and it's a major time investment. I think this is a fairly sustainable shortcut as far as shortcuts go.

@Twixes Twixes requested a review from a team December 10, 2024 11:50
@oliverb123
Copy link
Contributor

Hey! Best place to add tests would be in the tests/update.rs file. My recommendation is to use pure SQL expressions to set up a couple of test data rows in the way you expect, and then assert that the conflicts work etc.

Copy link
Contributor

@oliverb123 oliverb123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only note is that those queries you changed run a few thousand times a second at peak - when you merge this, you'll need to monitor RDS and be prepped to revert if it looks like the coalesce has added significant CPU cost to them (I don't think it will, just giving you a heads up). Once the deployments done, feel free to ping me so I can keep an eye on it too

@@ -175,8 +175,8 @@ impl Event {
let updates = self.into_updates_inner();
if updates.len() > skip_threshold {
warn!(
"Event {} for team {} has more than 10,000 properties, skipping",
event, team_id
"Event {} for team {} has more than {} properties, skipping",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, thanks!

@Twixes Twixes merged commit 78959b2 into master Dec 16, 2024
89 checks passed
@Twixes Twixes deleted the project-based-prop-defs branch December 16, 2024 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants