Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(queries): Add possibility to run queries async #18571

Merged
merged 26 commits into from
Nov 20, 2023

Conversation

webjunkie
Copy link
Contributor

@webjunkie webjunkie commented Nov 13, 2023

Problem

Some queries will take longer than various timeouts (i.e. Nginx request timeout and so on).
We want a way to run these asynchronously out of the request response cycle.

Changes

  • kick off a Celery task with the query
  • track results via redis
  • reuses functionality found from a Hackathon 🤪 feat: Async Query Client from Hackathon #9353
  • clean up ViewSet according to DRF conventions: now using create, retrieve, destroy methods
  • introduced client_query_id UUID as id for queries

Other things done

  • Move slow lane flag into query.ts
  • Check if we should in frontend exclude e.g. metadata queries
  • Add feature flag for async queries
  • Handle query abort/cancelations
    • abort frontend polling
    • cancel running Celery task
    • cancel running CH query? 🤔
    • cancel Celery task in queue to not get orphans
  • Use client_query_id to identify query
    • check security implications
  • put experimental on things that appear in the docs

How did you test this code?

  • added tests
  • tried in frontend
  • added artificial delay to test long queries locally

We can achieve the same current POST behavior using DRF's create
@webjunkie webjunkie changed the title feat(queries): Run slow queries via Celery feat(queries): Run slow queries async Nov 13, 2023
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@webjunkie webjunkie changed the title feat(queries): Run slow queries async feat(queries): Add possibility to run queries async Nov 15, 2023
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

posthog/api/query.py Dismissed Show dismissed Hide dismissed
@webjunkie webjunkie marked this pull request as ready for review November 15, 2023 13:57
Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good. Left a few thoughts inside, and some tests seem to still be failing.

I did manage to get it to crash locally as well. This simple query:
image
resulted in:

  File "/Users/marius/Projects/PostHog/posthog/posthog/clickhouse/client/execute_async.py", line 92, in execute_process_query
    redis_client.set(key, json.dumps(dataclasses.asdict(query_status)), ex=REDIS_STATUS_TTL_SECONDS)
  File "/Users/marius/.pyenv/versions/3.10.10/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UUID is not JSON serializable

and the query kept spinning.

frontend/src/queries/query.ts Outdated Show resolved Hide resolved
posthog/api/process.py Outdated Show resolved Hide resolved
frontend/src/types.ts Outdated Show resolved Hide resolved
@mariusandra
Copy link
Collaborator

One more thought, which we can definitely do in a follow up PR to keep this one contained: the HogQL queries themselves all have a 60sec timeout. This should get bumped to something larger like 10min when in the async query mode. We can probably reuse the "in_export_context" toggle for this, and set it to 10min when that's enabled.

Copy link
Contributor

github-actions bot commented Nov 17, 2023

Size Change: +164 B (0%)

Total Size: 2.01 MB

Filename Size Change
frontend/dist/toolbar.js 2.01 MB +164 B (0%)

compressed-size-action

@webjunkie
Copy link
Contributor Author

@mariusandra Ah, one thing I wanted to ask. Once we merge, the docs will show the new async option and endpoints immediately (with no mention of it being experimental or something). How do we proceed in such cases? Should I comment it out for now?

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's get it in, start testing and see what's next for getting it live. My shortlist would be:

  • HogQL statement timeout (60sec -> 10min?)
  • Should we work towards rolling this out for everyone, or now just to clients who have trouble?
  • Who are the clients who currently have trouble? We can use metabase to find teams who get the most timeouts, and see manually if we can make their lives better.
  • Resource monitoring: keeping an eye on Celery and that it doesn't run capacity. Do we need to change something?
  • Can we get a snapshot/dashboard of queries in progress, and queries queued (waiting for a worker)?

All good to merge this and start testing. Not sure why the visual regression tests changed here though? 🤷

@mariusandra mariusandra merged commit ff4bfee into master Nov 20, 2023
76 checks passed
@mariusandra mariusandra deleted the feature/query-slow-lane branch November 20, 2023 11:26
Copy link

sentry-io bot commented Nov 20, 2023

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ AttributeError: 'NoneType' object has no attribute 'get' /api/projects/{parent_lookup_team_id}/query/ View Issue
  • ‼️ ValidationError: [ErrorDetail(string='Unsupported query kind: PersonsNode', code='invalid')] posthog.tasks.exporter.export_asset View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants