-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(environments): Make taxonomy reads + writes project–based #26766
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ever plan to drop team_id
? If so, we're simply delaying the work :)
@rafaeelaudibert Good question. I'm not planning that right now. This |
Hey! Best place to add tests would be in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, only note is that those queries you changed run a few thousand times a second at peak - when you merge this, you'll need to monitor RDS and be prepped to revert if it looks like the coalesce
has added significant CPU cost to them (I don't think it will, just giving you a heads up). Once the deployments done, feel free to ping me so I can keep an eye on it too
@@ -175,8 +175,8 @@ impl Event { | |||
let updates = self.into_updates_inner(); | |||
if updates.len() > skip_threshold { | |||
warn!( | |||
"Event {} for team {} has more than 10,000 properties, skipping", | |||
event, team_id | |||
"Event {} for team {} has more than {} properties, skipping", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, thanks!
Problem
#26521 added
project_id
toEventDefinition
,PropertyDefinition
, andEventProperty
– along with some indexes based onproject_id
. However, these indexes are only useful if every row in these tables hasproject_id
set, which is not the case. ForGroupTypeMapping
, we just did a backfill – that table had 5k rows per region, not a lot.THESE tables have 100-500 MILLION rows per region and that's problematic. We can technically backfill, but that'd be a multi-hour migration, a significant challenge. (CC @PostHog/team-infra this is the long-running migration I was asking about in your channel)
Changes
Let's be smarter, if slightly messier, about
project_id
here. We don't actually need to backfill the field. The exact same behavior can be achieved by usingcoalesce(project_id, team_id)
– so that's the expression we need to index on.This PR drops the useless indexes on just
project_id
, and adds useful ones oncoalesce(project_id, team_id)
.We make those reworked immediately indexes useful in two ways:
rust/prop-defs
, we're replacingON CONFLICT (team_id, ...)
clauses withON CONFLICT (coalesce(project_id, team_id), ...)
team_id = %(team_id)s
with filtering oncoalesce(project_id, team_id) = %(project_id)s
How did you test this code?
Should pass existing Django tests.
Would be useful to add a test ensuring the
ON CONFLICT
logic works in prop-defs, but not sure if there's already a relevant test to edit, or where to add one. @oliverb123