Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(persons-on-events): Add ClickHouse table for tracking distinct ID overrides (without Kafka ingestion) #20326

Merged
merged 6 commits into from
Feb 14, 2024

Conversation

tkaemming
Copy link
Contributor

@tkaemming tkaemming commented Feb 14, 2024

Problem

This splits the MergeTree table out of #19855 to unblock progress on squash refactoring and backfill development against real schemas without worrying about juggling unmerged development branches.

This doesn't include the Kafka engine and materialized view from #19855. I'd expect those tables to remain the same as they were implemented in that PR, but we don't want to actually start ingesting events until #20162 and #20226 have been merged to ensure that the overrides table only has data that is unaffected by the distinct ID reuse issues described in #20187.

More context on distinct ID overrides more broadly is here: https://github.com/PostHog/product-internal/pull/557

How did you test this code?

(env) ted@revuelto posthog % DEBUG=1 ./bin/migrate      
[…]
2024-02-14T01:05:19.735937Z [info     ] Applying migration 0052_add_person_distinct_id_overrides_table... [migrations] pid=40536 tid=8013618880
2024-02-14T01:05:19.736018Z [info     ]     Executing python operation run_sql [migrations] pid=40536 tid=8013618880
[…]
(env) ted@revuelto posthog % docker-compose -f docker-compose.dev.yml exec -it clickhouse clickhouse-client -q 'SHOW CREATE TABLE person_distinct_id_overrides format Vertical'
Row 1:
──────
statement: CREATE TABLE default.person_distinct_id_overrides
(
    `team_id` Int64,
    `distinct_id` String,
    `person_id` UUID,
    `is_deleted` Int8,
    `version` Int64,
    `_timestamp` DateTime,
    `_offset` UInt64,
    `_partition` UInt64,
    INDEX kafka_timestamp_minmax_person_distinct_id_overrides _timestamp TYPE minmax GRANULARITY 3
)
ENGINE = ReplicatedReplacingMergeTree('/clickhouse/tables/noshard/posthog.person_distinct_id_overrides', '{replica}-{shard}', version)
ORDER BY (team_id, distinct_id)
SETTINGS index_granularity = 512

Copy link
Member

@fuziontech fuziontech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I know we talked about some nullable is_deleted but you are pulling in from the existing PERSON_DISTINCT_ID2_TABLE_BASE_SQL I assume this makes things easier.

🚢 it

@tkaemming tkaemming force-pushed the poe-distinct-id-overrides-data-only branch from 2835ad8 to 9190bfa Compare February 14, 2024 15:49
@tkaemming
Copy link
Contributor Author

Looks good to me. I know we talked about some nullable is_deleted but you are pulling in from the existing PERSON_DISTINCT_ID2_TABLE_BASE_SQL I assume this makes things easier.

Ah yeah, that was around using a nullable person_id and dropping is_deleted (or making it an alias for person_id IS NULL) to keep the Postgres and ClickHouse schemas better aligned. I think we'll still want to do that eventually, but we don't need to be in a rush to do it quite yet. (More details here too.)

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 0 modified, 0 deleted
  • webkit: 0 added, 1 modified, 0 deleted (diff for shard 2)

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Contributor

Size Change: 0 B

Total Size: 2.22 MB

ℹ️ View Unchanged
Filename Size
frontend/dist/toolbar.js 2.22 MB

compressed-size-action

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 0 modified, 0 deleted
  • webkit: 0 added, 1 modified, 0 deleted (diff for shard 2)

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants