Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition user and client tables #285

Open
dgitis opened this issue Nov 10, 2023 · 1 comment
Open

Partition user and client tables #285

dgitis opened this issue Nov 10, 2023 · 1 comment

Comments

@dgitis
Copy link
Collaborator

dgitis commented Nov 10, 2023

It would be good to have daily partitioned versions of the dim_ga4__client_keys, fct_ga4__client_keys, and fct_ga4__user_ids similar to what we have with sessions so that larger sites can disable the non-partitioned models without needing to customize.

The new GA4 user export tables are day partitioned.

I believe this should be related to #251 with us adding an optional cutoff date for when to start using Google's user export (because even when enabled, they didn't immediately start receiving all of the data) and merge the two sources of data in the daily tables and then build the non-day partitioned tables from the merged daily tables.

When comparing our client_key fields with the equivalent pseudonymous_users table in the new export, I think it is best that we set up our daily tables to contain basically the same data as is in the new export renamed and unnested to our usually standard. We then try to build as much as possible from before the cutoff into that table.

For the non-partitioned tables, do we try to maintain compatibility with our existing fields? For example, the first_device_* and first_geo_* fields don't have equivalents in the GA4 export.

While it would be nice to maintain compatibility, I personally don't use most of those fields.

If others use them, then I'm happy to rebuild that downstream of the daily models.

I am resistant to rebuilding that data on the daily models because if you're trying to reduce the costs by using just the daily models despite less accurate data then you probably won't want to do the look-ups required to enhance the daily models either. Particularly if you don't use the fields all that often.

Thoughts @adamribaudo @willbryant ?

@adamribaudo-velir
Copy link
Collaborator

Waiting for access to a dataset that actually holds this data before weighing in. Should be soon.

@dgitis dgitis mentioned this issue Apr 13, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants