-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(queries): Add possibility to run queries async #18571
Conversation
We can achieve the same current POST behavior using DRF's create
# Conflicts: # frontend/src/queries/query.ts
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good. Left a few thoughts inside, and some tests seem to still be failing.
I did manage to get it to crash locally as well. This simple query:
resulted in:
File "/Users/marius/Projects/PostHog/posthog/posthog/clickhouse/client/execute_async.py", line 92, in execute_process_query
redis_client.set(key, json.dumps(dataclasses.asdict(query_status)), ex=REDIS_STATUS_TTL_SECONDS)
File "/Users/marius/.pyenv/versions/3.10.10/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UUID is not JSON serializable
and the query kept spinning.
One more thought, which we can definitely do in a follow up PR to keep this one contained: the HogQL queries themselves all have a 60sec timeout. This should get bumped to something larger like 10min when in the async query mode. We can probably reuse the "in_export_context" toggle for this, and set it to 10min when that's enabled. |
Size Change: +164 B (0%) Total Size: 2.01 MB
|
@mariusandra Ah, one thing I wanted to ask. Once we merge, the docs will show the new async option and endpoints immediately (with no mention of it being experimental or something). How do we proceed in such cases? Should I comment it out for now? |
📸 UI snapshots have been updated3 snapshot changes in total. 0 added, 3 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Let's get it in, start testing and see what's next for getting it live. My shortlist would be:
- HogQL statement timeout (60sec -> 10min?)
- Should we work towards rolling this out for everyone, or now just to clients who have trouble?
- Who are the clients who currently have trouble? We can use metabase to find teams who get the most timeouts, and see manually if we can make their lives better.
- Resource monitoring: keeping an eye on Celery and that it doesn't run capacity. Do we need to change something?
- Can we get a snapshot/dashboard of queries in progress, and queries queued (waiting for a worker)?
All good to merge this and start testing. Not sure why the visual regression tests changed here though? 🤷
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Problem
Some queries will take longer than various timeouts (i.e. Nginx request timeout and so on).
We want a way to run these asynchronously out of the request response cycle.
Changes
Other things done
How did you test this code?