-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: HogWatcher #23053
feat: HogWatcher #23053
Conversation
…to feat/proper-templates
# Conflicts: # posthog/api/hog_function.py # posthog/cdp/validation.py
# Conflicts: # posthog/api/hog_function.py # posthog/cdp/templates/__init__.py # posthog/cdp/templates/hog_function_template.py # posthog/cdp/templates/slack/template_slack.py # posthog/cdp/validation.py
# Conflicts: # frontend/__snapshots__/scenes-app-insights--funnel-top-to-bottom-breakdown-edit--dark.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumping thoughts mid-review. I effectively only looked through the frontend and Django parts... so just the plugin server part to go... 😅
frontend/src/scenes/pipeline/hogfunctions/PipelineHogFunctionConfiguration.tsx
Show resolved
Hide resolved
frontend/src/scenes/pipeline/hogfunctions/HogFunctionStatusIndicator.tsx
Outdated
Show resolved
Hide resolved
frontend/src/scenes/pipeline/hogfunctions/HogFunctionStatusIndicator.tsx
Show resolved
Hide resolved
frontend/src/scenes/pipeline/hogfunctions/HogFunctionStatusIndicator.tsx
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall very solid 👍 and I'm coming around to the separate watcher idea. However why not take it one step further and isolate the watcher into its own pod/service, especially if it's going to get more cleanup duties soon? I left a longer inline comment about it.
private async checkIsLeader() { | ||
const leaderId = await runRedis(this.hub.redisPool, 'getLeader', async (client) => { | ||
// Set the leader to this instance if it is not set and add an expiry to it of twice our observation period | ||
const pipeline = client.pipeline() | ||
|
||
// TODO: This can definitely be done in a single command - just need to make sure the ttl is always extended if the ID is the same | ||
|
||
// @ts-expect-error - IORedis types don't allow for NX and EX in the same command | ||
pipeline.set(`${BASE_REDIS_KEY}/leader`, this.instanceId, 'NX', 'EX', (OBSERVATION_PERIOD * 3) / 1000) | ||
pipeline.get(`${BASE_REDIS_KEY}/leader`) | ||
const [_, res] = await pipeline.exec() | ||
|
||
// NOTE: IORedis types don't allow for NX and GET in the same command so we have to cast it to any | ||
return res[1] as string | ||
}) | ||
|
||
this.isLeader = leaderId === this.instanceId | ||
|
||
if (this.isLeader) { | ||
status.info('👀', '[HogWatcher] I am the leader') | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder... if we should not just have an explicitly deployed leader? This would remove all that coordination noise, and give it breathing room from running hog code as well.
This happened before. The plugin server has a scheduler, which used something called redlock to make sure there's only one pod in the fleet running the scheduling commands. This made sense when there were 2 scheduled tasks running per hour on a self hosted instance, but the Cloud required a different approach, as these scheduled bursts in ingestion nodes were causing problems.
Now we have a lot of schedulers:
Thus, I think we'd make this system more robust if we'd remove the redis lock and make it just a single node service. We would immediately buy some vertical scaling room as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea tbh. I didn't do it as there was pushback on the leader idea in general and I didn't want to go too far down that whole if there might have been an alternative I realised along the way.
Should be easy enough to setup, but again I might do that in follow up as its more of an improvmenet
const pipeline = client.pipeline() | ||
|
||
changes.observations.forEach(({ id, observation }) => { | ||
// We key the observations by observerId and timestamp with a ttl of the max period we want to keep the data for | ||
const subKey = `observation:${id}:${this.instanceId}:${observation.timestamp}` | ||
pipeline.hset(`${BASE_REDIS_KEY}/state`, subKey, JSON.stringify(observation)) | ||
}) | ||
|
||
return pipeline.exec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we set the TTL for these keys... or is that the TODO: Implement this
part? Isn't the easy solution to just use normal TTLs if this moves from hset
to set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the todo is just noise I removed in the follow up PR.
The issue is redis 6 (what we use) doesn't support ttl-ing hash fields, only the whole hash.
We could do this as a separate set but for now it just felt easier to have it all in one hash so we can load the whole thing in one go and clean up after.
In practice it functions the same so lets try it and see if it is cleaning up and then we cna move it out after if it still makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rock'n'roll
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated2 snapshot changes in total. 0 added, 2 modified, 0 deleted:
Triggered by this commit. |
Problem
We predict (based on existing pipeline work) that there will be cases of rogue functions or teams that clog up the execution pipeline.
To get ahead of this we want to have a way of detecting and marking functions or teams as inefficient so that they are moved to the "slow lane", temporarily disabled and eventually permanently disabled. At first I was going to do the "simple" thing and have a manual button for disabling functions, but where is the fun in that...
Changes
TODO
Follow ip
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?