Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support multiple models in S3 batch export #23105

Merged
merged 53 commits into from
Jun 25, 2024

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Jun 20, 2024

Problem

Users wish to export person information to allow identifying unique persons in their event batch exports.

Changes

This commit introduces a new batch_export_model input for all batch export destinations. This input is of type BatchExportModel which can be used to indicate which model a batch export is supposed to target. This opens the door to creating "persons" model exports, and eventually extending this further with more changes.

The change was done in a backwards compatible way, so batch_export_schema was not removed, and existing export configurations should continue to work (as evidenced by unit tests passing without changes).

However, moving forward all batch exports will be created with batch_export_schema set to None and batch_export_model defined in its place.

After updating existing exports, batch_export_schema and any other code associated with backwards compatibility can be deleted.

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Updated all S3 unit tests to guarantee person batch export support. Follow-up PRs will work on other destinations.

This required generating person data, and checking it was successfully exported. To do this, new utilities were added under temporal/tests/utils/persons.py, they work similar to event utilities in temporal/tests/utils/event.py

Also, a bug was fixed in the test_s3_export_workflow_with_minio_bucket_and_custom_key_prefix test as it wasn't properly using custom keys.

@tomasfarias tomasfarias force-pushed the feat/person-batch-exports branch 2 times, most recently from 3d7b18b to acb3108 Compare June 20, 2024 16:36
@tomasfarias tomasfarias changed the title feat: Support multiple models in all batch export destinations feat: Support multiple models in S3 batch export Jun 20, 2024
@tomasfarias tomasfarias marked this pull request as ready for review June 20, 2024 16:49
@tomasfarias
Copy link
Contributor Author

Err, ignore the last few test failures, they seem to be on github's side:

 livestream Error Get "https://ghcr.io/v2/posthog/livestream/manifests/sha256:748d94295ed87bad468c059edd756d8bbed16059156812f4d17c626aaddbf76a": net/http: TLS handshake timeout
Error response from daemon: Get "https://ghcr.io/v2/posthog/livestream/manifests/sha256:748d94295ed87bad468c059edd756d8bbed16059156812f4d17c626aaddbf76a": net/http: TLS handshake timeout
Error: Process completed with exit code 18.

I'll retry the tests.

@tomasfarias
Copy link
Contributor Author

Hmm not sure why the hogql_query tests are failing, don't see any changes of mine around that part of the codebase. I'll investigate tomorrow.

Copy link
Member

@EDsCODE EDsCODE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, nice pattern for extending beyond persons

Base automatically changed from refactor/support-for-batch-export-model-views to master June 24, 2024 15:09
@tomasfarias tomasfarias requested a review from fuziontech as a code owner June 24, 2024 15:09
@tomasfarias tomasfarias force-pushed the feat/person-batch-exports branch from 5e86a92 to c894d48 Compare June 24, 2024 15:35
Comment on lines 51 to 52
extra_hosts:
- 'host.docker.internal:host-gateway'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this as locally was failing by builds. Shouldn't have been pushed though.

@tomasfarias tomasfarias removed the request for review from fuziontech June 24, 2024 15:58
@tomasfarias
Copy link
Contributor Author

tomasfarias commented Jun 24, 2024

Hey @fuziontech! You got tagged for review as I rebased this PR and commits from the previous PR you've reviewed popped up in this PR's git history.

Feel free to ignore, and apologies for the spam. Unless you are curious, then obviously comments are welcome!

tomasfarias and others added 23 commits June 25, 2024 10:37
This commit introduces a new `batch_export_model` input for all batch
export destinations. This input is of type `BatchExportModel` which
can be used to indicate which model a batch export is supposed to
target. This opens the door to creating "persons" model exports, and
eventually extending this further with more changes.

The change was done in a backwards compatible way, so
`batch_export_schema` was not removed, and existing export
configurations should continue to work (as evidenced by unit tests
passing without changes).

However, moving forward all batch exports will be created with
`batch_export_schema` set to `None` and `batch_export_model` defined
in its place.

After updating existing exports, `batch_export_schema` and any other code
associated with backwards compatibility can be deleted.
@tomasfarias tomasfarias force-pushed the feat/person-batch-exports branch from 91e4c08 to cb6ecca Compare June 25, 2024 08:38
@tomasfarias tomasfarias force-pushed the feat/person-batch-exports branch from cb6ecca to e32b4a5 Compare June 25, 2024 08:41
@tomasfarias tomasfarias merged commit ae3f5a0 into master Jun 25, 2024
84 checks passed
@tomasfarias tomasfarias deleted the feat/person-batch-exports branch June 25, 2024 09:14
Copy link

sentry-io bot commented Jun 25, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ClickHouseError: Code: 46. DB::Exception: Unknown table function events_batch_export: or incorrect parameterized ... posthog.temporal.common.clickhouse in check_res... View Issue
  • ‼️ ClickHouseError: Code: 46. DB::Exception: Unknown table function events_batch_export_backfill: or incorrect param... posthog.temporal.common.clickhouse in check_res... View Issue
  • ‼️ ClickHouseError: Code: 46. DB::Exception: Unknown table function events_batch_export: or incorrect parameterized ... posthog.temporal.common.clickhouse in check_res... View Issue

Did you find this useful? React with a 👍 or 👎

timgl pushed a commit that referenced this pull request Jun 27, 2024
* refactor: Update metrics to fetch counts at request time

* fix: Move import to method

* fix: Add function

* feat: Custom schemas for batch exports

* feat: Frontend support for model field

* fix: Clean-up

* fix: Add missing migration

* fix: Make new field nullable

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* fix: Bump migration number

* fix: Bump migration number

* refactor: Update metrics to fetch counts at request time

* refactor: Switch to counting runs

* refactor: Support batch export models as views

* fix: Merge conflict

* fix: Quality check fixes

* refactor: Update metrics to fetch counts at request time

* fix: Move import to method

* fix: Add function

* fix: Typing fixes

* feat: Custom schemas for batch exports

* feat: Frontend support for model field

* fix: Clean-up

* fix: Add missing migration

* fix: Make new field nullable

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* Update UI snapshots for `chromium` (1)

* fix: Bump migration number

* fix: Clean-up unused code

* fix: Clean-up unused function

* fix: Only run extra clickhouse queries in batch exports tests

* feat: Support multiple models in all batch export destinations

This commit introduces a new `batch_export_model` input for all batch
export destinations. This input is of type `BatchExportModel` which
can be used to indicate which model a batch export is supposed to
target. This opens the door to creating "persons" model exports, and
eventually extending this further with more changes.

The change was done in a backwards compatible way, so
`batch_export_schema` was not removed, and existing export
configurations should continue to work (as evidenced by unit tests
passing without changes).

However, moving forward all batch exports will be created with
`batch_export_schema` set to `None` and `batch_export_model` defined
in its place.

After updating existing exports, `batch_export_schema` and any other code
associated with backwards compatibility can be deleted.

* fix: Add additional type hints and update tests

* chore: Add test utilities for multi-model batch exports

* fix: Pass properties included as keyword arguments

* fix: Support custom key prefixes for multi-model

* feat: Unit testing for S3 batch exports with multi-model support

* fix: Add and correct type hints

* fix: Typo in parameter

* fix: API tests now work with model parameter

* fix: Re-add hosts to docker-compose

* fix: Use UTC timezone alias

* fix: Add missing test column

* revert: Python 3.11 alias

---------

Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants