Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-warehouse): use materialized view in hogql #25187

Merged
merged 86 commits into from
Oct 4, 2024

Conversation

EDsCODE
Copy link
Member

@EDsCODE EDsCODE commented Sep 24, 2024

Problem

  • materialized tables aren't queried right now

Changes

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

@EDsCODE EDsCODE changed the base branch from master to dw-materialized-database-tables-ui September 24, 2024 20:24
@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Base automatically changed from dw-materialized-database-tables-ui to master September 30, 2024 20:02
@EDsCODE EDsCODE marked this pull request as ready for review September 30, 2024 20:08
@EDsCODE EDsCODE requested a review from Gilbert09 October 1, 2024 01:42
@EDsCODE EDsCODE requested review from a team and removed request for Gilbert09 October 1, 2024 01:42
Copy link
Member

@Gilbert09 Gilbert09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my one comment about using DataWarehouseTable - I reckon the pipelines should operate the same between external sources vs materialization - both create a DataWarehouseTable and set it on their equivalent models (Schema vs SavedQuery)

@@ -63,6 +68,7 @@ class Status(models.TextChoices):
null=True,
help_text="The timestamp of this SavedQuery's last run (if any).",
)
credential = models.ForeignKey(DataWarehouseCredential, on_delete=models.CASCADE, null=True, blank=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding credential's to saved query, lets add a foreign key to DataWarehouseTable instead, this means:

  • We only have one type of "Table", regardless of whether it's an external data sourced table, or a materialized view, they should both be synonymous as far as the system is concerned
  • The URL of the table is stored in the DB and not code, making easy changes to how we store the data with backward compatibility (this has been a godsend for all the DLT pipeline changes we've made)
  • It'll support different S3 formats (parquet, delta, csv, etc) - this will be helpful for when we eventually move to iceberg, and again, this will be stored in the DB and so migrations of tables don't have to be one big migration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point let me give it a try

posthog/hogql/database/s3_table.py Outdated Show resolved Hide resolved
@EDsCODE EDsCODE marked this pull request as draft October 1, 2024 20:42
@EDsCODE
Copy link
Member Author

EDsCODE commented Oct 1, 2024

TODO:

  • how mat tables overwrite the saved queries (names will be shared)
  • UI issues (rerunning a model run)
  • UI: feedback when starting a model run
  • update tests

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (wasn't pushed!)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

2 snapshot changes in total. 0 added, 2 modified, 0 deleted:

  • chromium: 0 added, 2 modified, 0 deleted (diff for shard 2)
  • webkit: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

4 snapshot changes in total. 0 added, 4 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

5 snapshot changes in total. 0 added, 5 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

3 snapshot changes in total. 0 added, 3 modified, 0 deleted:

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@EDsCODE EDsCODE merged commit 6e9e9e8 into master Oct 4, 2024
93 checks passed
@EDsCODE EDsCODE deleted the dw-materialized-use-new-tables branch October 4, 2024 01:27
Copy link

sentry-io bot commented Oct 4, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ CHQueryErrorTimeoutExceeded: DB::Exception: Timeout exceeded: elapsed 600.082745045 seconds, maximum: 600. Stack trace: posthog.tasks.exporter.export_asset View Issue
  • ‼️ AttributeError: 'AnonymousUser' object has no attribute 'distinct_id' /embedded/{access_token} View Issue
  • ‼️ AttributeError: 'AnonymousUser' object has no attribute 'distinct_id' /shared/{access_token} View Issue
  • ‼️ AttributeError: 'AnonymousUser' object has no attribute 'distinct_id' /embedded/{access_token} View Issue
  • ‼️ AttributeError: 'AnonymousUser' object has no attribute 'distinct_id' /embedded/{access_token} View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants