Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Support batch export models as views #23052

Merged
merged 49 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
b43e7af
refactor: Update metrics to fetch counts at request time
tomasfarias Jun 7, 2024
590f74f
fix: Move import to method
tomasfarias Jun 7, 2024
53c71b6
fix: Add function
tomasfarias Jun 10, 2024
ea1d33f
feat: Custom schemas for batch exports
tomasfarias Jun 12, 2024
81e929c
feat: Frontend support for model field
tomasfarias Jun 12, 2024
054a826
fix: Clean-up
tomasfarias Jun 12, 2024
11b47d3
fix: Add missing migration
tomasfarias Jun 12, 2024
106a725
fix: Make new field nullable
tomasfarias Jun 12, 2024
8b19079
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
fdb729a
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
ae9870c
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
091e380
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
af52af9
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
46c93de
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
c8bc6a6
fix: Bump migration number
tomasfarias Jun 18, 2024
fae1b59
fix: Bump migration number
tomasfarias Jun 18, 2024
85f5094
refactor: Update metrics to fetch counts at request time
tomasfarias Jun 7, 2024
b2382e2
fix: Actually use include and exclude events
tomasfarias Jun 12, 2024
bc6dd4e
refactor: Switch to counting runs
tomasfarias Jun 12, 2024
ce7d4df
refactor: Support batch export models as views
tomasfarias Jun 18, 2024
55f6a5a
fix: Merge conflict
tomasfarias Jun 18, 2024
8e306a4
fix: Quality check fixes
tomasfarias Jun 18, 2024
a33563a
refactor: Update metrics to fetch counts at request time
tomasfarias Jun 7, 2024
1f93d43
fix: Move import to method
tomasfarias Jun 7, 2024
a395f18
fix: Add function
tomasfarias Jun 10, 2024
eb0e581
fix: Typing fixes
tomasfarias Jun 10, 2024
4001daa
feat: Custom schemas for batch exports
tomasfarias Jun 12, 2024
0f3cbe3
feat: Frontend support for model field
tomasfarias Jun 12, 2024
fde00d4
fix: Clean-up
tomasfarias Jun 12, 2024
ee354cb
fix: Add missing migration
tomasfarias Jun 12, 2024
ccab5f6
fix: Make new field nullable
tomasfarias Jun 12, 2024
e7b5cb9
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
6da056a
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
a8b3594
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
724c208
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
6b19e95
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
618d0b8
Update UI snapshots for `chromium` (1)
github-actions[bot] Jun 12, 2024
536cbdb
fix: Bump migration number
tomasfarias Jun 18, 2024
0a9343e
fix: Clean-up unused code
tomasfarias Jun 18, 2024
f921bb5
chore: Clean-up unused function and tests
tomasfarias Jun 18, 2024
dc9535f
fix: Clean-up unused function
tomasfarias Jun 18, 2024
ee3fcb8
fix: HTTP Batch export default fields
tomasfarias Jun 19, 2024
c4deddf
fix: Remove test case on new column not present in base table
tomasfarias Jun 19, 2024
f2cad91
chore: Clean-up unused functions and queries
tomasfarias Jun 19, 2024
86c86a2
fix: Only run extra clickhouse queries in batch exports tests
tomasfarias Jun 19, 2024
278a067
refactor: Remove coalesce and use only inserted_at in queries
tomasfarias Jun 19, 2024
740188b
fix: Remove deprecated test
tomasfarias Jun 19, 2024
525c682
fix: Add person_id to person model and enforce ordering
tomasfarias Jun 20, 2024
2eabe0c
refactor: Also add version column
tomasfarias Jun 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 0 additions & 23 deletions posthog/api/test/test_app_metrics.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import datetime as dt
import json
import uuid
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every change on this file is just cleaning up.

from unittest import mock

from freezegun.api import freeze_time
Expand All @@ -9,7 +8,6 @@
from posthog.api.test.batch_exports.conftest import start_test_worker
from posthog.api.test.batch_exports.operations import create_batch_export_ok
from posthog.batch_exports.models import BatchExportRun
from posthog.client import sync_execute
from posthog.models.activity_logging.activity_log import Detail, Trigger, log_activity
from posthog.models.plugin import Plugin, PluginConfig
from posthog.models.utils import UUIDT
Expand All @@ -20,20 +18,6 @@
SAMPLE_PAYLOAD = {"dateRange": ["2021-06-10", "2022-06-12"], "parallelism": 1}


def insert_event(team_id: int, timestamp: dt.datetime, event: str = "test-event"):
sync_execute(
"INSERT INTO `sharded_events` (uuid, team_id, event, timestamp) VALUES",
[
{
"uuid": uuid.uuid4(),
"team_id": team_id,
"event": event,
"timestamp": timestamp,
}
],
)


@freeze_time("2021-12-05T13:23:00Z")
class TestAppMetricsAPI(ClickhouseTestMixin, APIBaseTest):
maxDiff = None
Expand Down Expand Up @@ -149,9 +133,6 @@ def test_retrieve_batch_export_runs_app_metrics(self):
data_interval_start=last_updated_at - dt.timedelta(hours=1),
status=BatchExportRun.Status.COMPLETED,
)
for _ in range(3):
insert_event(team_id=self.team.pk, timestamp=last_updated_at - dt.timedelta(minutes=1))

BatchExportRun.objects.create(
batch_export_id=batch_export_id,
data_interval_end=last_updated_at - dt.timedelta(hours=2),
Expand All @@ -164,9 +145,6 @@ def test_retrieve_batch_export_runs_app_metrics(self):
data_interval_start=last_updated_at - dt.timedelta(hours=3),
status=BatchExportRun.Status.FAILED_RETRYABLE,
)
for _ in range(5):
timestamp = last_updated_at - dt.timedelta(hours=2, minutes=1)
insert_event(team_id=self.team.pk, timestamp=timestamp)

response = self.client.get(f"/api/projects/@current/app_metrics/{batch_export_id}?date_from=-7d")
self.assertEqual(response.status_code, status.HTTP_200_OK)
Expand Down Expand Up @@ -235,7 +213,6 @@ def test_retrieve_batch_export_runs_app_metrics_defaults_to_zero(self):
data_interval_start=last_updated_at - dt.timedelta(hours=1),
status=BatchExportRun.Status.COMPLETED,
)
insert_event(team_id=self.team.pk, timestamp=last_updated_at - dt.timedelta(minutes=1))

response = self.client.get(f"/api/projects/@current/app_metrics/{batch_export_id}?date_from=-7d")
self.assertEqual(response.status_code, status.HTTP_200_OK)
Expand Down
135 changes: 135 additions & 0 deletions posthog/batch_exports/sql.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
CREATE_PERSONS_BATCH_EXPORT_VIEW = """
CREATE OR REPLACE VIEW persons_batch_export AS (
SELECT
pd.team_id AS team_id,
pd.distinct_id AS distinct_id,
toString(p.id) AS person_id,
p.properties AS properties,
pd.version AS version,
pd._timestamp AS _inserted_at
FROM (
SELECT
team_id,
distinct_id,
max(version) AS version,
argMax(person_id, person_distinct_id2.version) AS person_id,
max(_timestamp) AS _timestamp
FROM
person_distinct_id2
WHERE
team_id = {team_id:Int64}
GROUP BY
team_id,
distinct_id
) AS pd
INNER JOIN
person p ON p.id = pd.person_id AND p.team_id = pd.team_id
WHERE
pd.team_id = {team_id:Int64}
AND p.team_id = {team_id:Int64}
AND pd._timestamp >= {interval_start:DateTime64}
AND pd._timestamp < {interval_end:DateTime64}
ORDER BY
_inserted_at
)
"""

CREATE_EVENTS_BATCH_EXPORT_VIEW = """
CREATE OR REPLACE VIEW events_batch_export AS (
SELECT
team_id AS team_id,
min(timestamp) AS timestamp,
event AS event,
any(distinct_id) AS distinct_id,
any(toString(uuid)) AS uuid,
min(COALESCE(inserted_at, _timestamp)) AS _inserted_at,
any(created_at) AS created_at,
any(elements_chain) AS elements_chain,
any(toString(person_id)) AS person_id,
any(nullIf(properties, '')) AS properties,
any(nullIf(person_properties, '')) AS person_properties,
nullIf(JSONExtractString(properties, '$set'), '') AS set,
nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
FROM
events
PREWHERE
events.inserted_at >= {interval_start:DateTime64}
AND events.inserted_at < {interval_end:DateTime64}
WHERE
team_id = {team_id:Int64}
AND events.timestamp >= {interval_start:DateTime64} - INTERVAL {lookback_days:Int32} DAY
AND events.timestamp < {interval_end:DateTime64} + INTERVAL 1 DAY
AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
GROUP BY
team_id, toDate(events.timestamp), event, cityHash64(events.distinct_id), cityHash64(events.uuid)
ORDER BY
_inserted_at, event
SETTINGS optimize_aggregation_in_order=1
)
"""

CREATE_EVENTS_BATCH_EXPORT_VIEW_UNBOUNDED = """
CREATE OR REPLACE VIEW events_batch_export_unbounded AS (
SELECT
team_id AS team_id,
min(timestamp) AS timestamp,
event AS event,
any(distinct_id) AS distinct_id,
any(toString(uuid)) AS uuid,
min(COALESCE(inserted_at, _timestamp)) AS _inserted_at,
any(created_at) AS created_at,
any(elements_chain) AS elements_chain,
any(toString(person_id)) AS person_id,
any(nullIf(properties, '')) AS properties,
any(nullIf(person_properties, '')) AS person_properties,
nullIf(JSONExtractString(properties, '$set'), '') AS set,
nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
FROM
events
PREWHERE
events.inserted_at >= {interval_start:DateTime64}
AND events.inserted_at < {interval_end:DateTime64}
WHERE
team_id = {team_id:Int64}
AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
GROUP BY
team_id, toDate(events.timestamp), event, cityHash64(events.distinct_id), cityHash64(events.uuid)
ORDER BY
_inserted_at, event
SETTINGS optimize_aggregation_in_order=1
)
"""

CREATE_EVENTS_BATCH_EXPORT_VIEW_BACKFILL = """
CREATE OR REPLACE VIEW events_batch_export_backfill AS (
SELECT
team_id AS team_id,
min(timestamp) AS timestamp,
event AS event,
any(distinct_id) AS distinct_id,
any(toString(uuid)) AS uuid,
min(COALESCE(inserted_at, _timestamp)) AS _inserted_at,
any(created_at) AS created_at,
any(elements_chain) AS elements_chain,
any(toString(person_id)) AS person_id,
any(nullIf(properties, '')) AS properties,
any(nullIf(person_properties, '')) AS person_properties,
nullIf(JSONExtractString(properties, '$set'), '') AS set,
nullIf(JSONExtractString(properties, '$set_once'), '') AS set_once
FROM
events
WHERE
team_id = {team_id:Int64}
AND events.timestamp >= {interval_start:DateTime64}
AND events.timestamp < {interval_end:DateTime64}
AND (length({include_events:Array(String)}) = 0 OR event IN {include_events:Array(String)})
AND (length({exclude_events:Array(String)}) = 0 OR event NOT IN {exclude_events:Array(String)})
GROUP BY
team_id, toDate(events.timestamp), event, cityHash64(events.distinct_id), cityHash64(events.uuid)
ORDER BY
_inserted_at, event
SETTINGS optimize_aggregation_in_order=1
)
"""
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from posthog.batch_exports.sql import (
CREATE_EVENTS_BATCH_EXPORT_VIEW,
CREATE_EVENTS_BATCH_EXPORT_VIEW_BACKFILL,
CREATE_EVENTS_BATCH_EXPORT_VIEW_UNBOUNDED,
CREATE_PERSONS_BATCH_EXPORT_VIEW,
)
from posthog.clickhouse.client.migration_tools import run_sql_with_exceptions

operations = map(
run_sql_with_exceptions,
[
CREATE_PERSONS_BATCH_EXPORT_VIEW,
CREATE_EVENTS_BATCH_EXPORT_VIEW,
CREATE_EVENTS_BATCH_EXPORT_VIEW_UNBOUNDED,
CREATE_EVENTS_BATCH_EXPORT_VIEW_BACKFILL,
],
)
12 changes: 6 additions & 6 deletions posthog/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ def create_clickhouse_tables(num_tables: int):
# Create clickhouse tables to default before running test
# Mostly so that test runs locally work correctly
from posthog.clickhouse.schema import (
CREATE_DATA_QUERIES,
CREATE_DICTIONARY_QUERIES,
CREATE_DISTRIBUTED_TABLE_QUERIES,
CREATE_MERGETREE_TABLE_QUERIES,
CREATE_MV_TABLE_QUERIES,
CREATE_DATA_QUERIES,
CREATE_DICTIONARY_QUERIES,
CREATE_VIEW_QUERIES,
build_query,
)
Expand Down Expand Up @@ -53,24 +53,24 @@ def reset_clickhouse_tables():
from posthog.clickhouse.plugin_log_entries import (
TRUNCATE_PLUGIN_LOG_ENTRIES_TABLE_SQL,
)
from posthog.heatmaps.sql import TRUNCATE_HEATMAPS_TABLE_SQL
from posthog.models.app_metrics.sql import TRUNCATE_APP_METRICS_TABLE_SQL
from posthog.models.channel_type.sql import TRUNCATE_CHANNEL_DEFINITION_TABLE_SQL
from posthog.models.cohort.sql import TRUNCATE_COHORTPEOPLE_TABLE_SQL
from posthog.models.event.sql import TRUNCATE_EVENTS_TABLE_SQL
from posthog.models.group.sql import TRUNCATE_GROUPS_TABLE_SQL
from posthog.models.performance.sql import TRUNCATE_PERFORMANCE_EVENTS_TABLE_SQL
from posthog.models.person.sql import (
TRUNCATE_PERSON_DISTINCT_ID2_TABLE_SQL,
TRUNCATE_PERSON_DISTINCT_ID_OVERRIDES_TABLE_SQL,
TRUNCATE_PERSON_DISTINCT_ID_TABLE_SQL,
TRUNCATE_PERSON_STATIC_COHORT_TABLE_SQL,
TRUNCATE_PERSON_DISTINCT_ID_OVERRIDES_TABLE_SQL,
TRUNCATE_PERSON_TABLE_SQL,
)
from posthog.models.sessions.sql import TRUNCATE_SESSIONS_TABLE_SQL
from posthog.session_recordings.sql.session_recording_event_sql import (
TRUNCATE_SESSION_RECORDING_EVENTS_TABLE_SQL,
)
from posthog.models.channel_type.sql import TRUNCATE_CHANNEL_DEFINITION_TABLE_SQL
from posthog.models.sessions.sql import TRUNCATE_SESSIONS_TABLE_SQL
from posthog.heatmaps.sql import TRUNCATE_HEATMAPS_TABLE_SQL

# REMEMBER TO ADD ANY NEW CLICKHOUSE TABLES TO THIS ARRAY!
TABLES_TO_CREATE_DROP = [
Expand Down
Loading
Loading