Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-property option to analyze properties seperately #318

Open
tessa-beijloos opened this issue Apr 17, 2024 · 5 comments
Open

multi-property option to analyze properties seperately #318

tessa-beijloos opened this issue Apr 17, 2024 · 5 comments

Comments

@tessa-beijloos
Copy link

tessa-beijloos commented Apr 17, 2024

We can select multiple properties like this:

vars:
  ga4:
    property_ids: [11111111, 22222222, 33333333]
    static_incremental_days: 3
    combined_dataset: "my_combined_dataset"

And it then combines the dataset into one (with combined_property_data()). I need a way to analyze them seperately in the same project. Is there a workaround for this? Thanks in advance!

@adamribaudo-velir
Copy link
Collaborator

Yes, the events from each of those properties will each have a unique stream_id

@tessa-beijloos
Copy link
Author

Thanks! But does that mean that we still first combine the datasets from both properties? I am using the bigquery connection and if i create the base table the query costs will be a lot higher if i first combine the tables and after that select different streaming ids to seperate them

@adamribaudo-velir
Copy link
Collaborator

It's up to you if you'd like 3 separate projects for 3 properties or 1 larger project with 3 properties. You can either process and analyze them independently or jointly. In either case, all data is partitioned on date, so you can limit costs by including date filters.

Hopefully that helps.

@DVDH-000
Copy link

I suspect it would also help save cost if the base table creation in dbt_packages/ga4/models/staging/base/base_ga4__events.sql would be clustered on stream_id. What do you think @adamribaudo-velir ?

@adamribaudo-velir
Copy link
Collaborator

@DVDH-000 It's currently clustered on event_name. Whether clustering on event_name or stream_id is more performant likely depends on whether the user plans to analyze across streams or within a single stream which we can't predict.

I'm pretty certain that config setting can be overridden in the project yaml. And if anyone wants to run some empirical tests to demonstrate which is more performant under various scenarios, by all means :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants