Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to recreate GA4 reports using this schema? #337

Open
umairkarel opened this issue Aug 5, 2024 · 7 comments
Open

How to recreate GA4 reports using this schema? #337

umairkarel opened this issue Aug 5, 2024 · 7 comments

Comments

@umairkarel
Copy link

I am currently working on recreating some reports using the modeled schema provided. However, I am encountering some challenges in formulating the SQL queries needed to generate these reports accurately.

Could anyone please share guidance or examples on how to recreate the GA4 reports using SQL queries based on the modeled schema?

@dgitis
Copy link
Collaborator

dgitis commented Aug 5, 2024

You're going to need to be much more specific if you want help here.

  • What reports are you trying to recreate?
  • What query are you using?
  • What results are you getting?
  • What results are you expecting?

Regardless, don't expect numbers to exactly match.

There are a number of places where we have deliberately decided to diverge from GA4.

Google uses Hyperlog++ to estimate numbers. The dbt-GA4 package counts the numbers.

Google calculates source, medium, and campaign individually resulting in source/medium combinations that shouldn't have a campaign being assigned the next available campaign. The dbt-GA4 package assigns all attribution parameters to the first event with a valid source.

I've written a free course on how you are supposed to use the dbt-GA4 package. I think it might help you get started but the goal of the package is to produce better numbers than GA4 so be prepared to see differences.

@umairkarel
Copy link
Author

Thanks a lot, @dgitis!

I am trying to recreate the GA4 "User Acquisition" report. Could you please help me with the SQL query for this report?

image

Additionally, it would be fantastic if you could assist with the e-commerce purchase report as well.

image

Thank you for your assistance!

@dgitis
Copy link
Collaborator

dgitis commented Aug 6, 2024

For the first report, you'll want to blend or join (the terminology depends on the reporting tool) the fct_ga4__sessions_daily and the dim_ga4__sessions_daily tables on the session_partition_key.

You'll want either session_default_channel_grouping or last_non_direct_default_channel_grouping as your primary dimension along with the session_partition_date for your date range.

From fct_ga4__sessions_daily you'll want the following.

  • New users: sumif(session_number = 1)
  • Engaged sessions: sum(session_partition_max_session_engaged)
  • Engagement rate: sum(session_partition_max_session_engaged) / count(*)
  • Engaged sessions per user: sum(session_partition_max_session_engaged) / count(distinct client_key)
  • Average engagement time: sum(session_partition_sum_engagement_time_msec) / count(*)
  • Event count: requires customization; see if session_partition_count_page_views is an acceptable alternative
  • Key event count: configure conversion events and you will have a column named event_name_count, like purchase_count, for each conversion event
  • Total revenue: depending on how you set up data collection, you might be able to use sum(session_partition_sum_event_value_in_usd); otherwise this requires customization

Treat the above queryies as pseudo code. The actual query will depend on your reporting tool.

@dgitis
Copy link
Collaborator

dgitis commented Aug 6, 2024

The second report requires customization and it depends on how you set your warehouse up. Take the course I linked, set things up. Try to figure it out yourself, and then share your code.

It will be much easier to help you when you have a base setup.

Hint: You'll want to create a events fact table with all events in it. This table will mostly be used for funnel reporting so you can filter some events that aren't relevant to funnels, like scroll and user_engagement. Obviously, you'll want device data in this table and then when reporting you'll use a lot of countif(event_name = 'add_to_cart').

Treat this as pseudo code. The actual query will depend on your reporting tool.

@umairkarel
Copy link
Author

Thanks @dgitis!!!

@umairkarel
Copy link
Author

Hey @dgitis,

I wanted to get the report for the page path and the total revenue associated with it. My team decided to add user_pseudo_id to both the fact_ga4__pages and fact_ga4__items_ecommerce tables, and then join them on this field to calculate the total revenue. However, this approach doesn’t seem correct to me can you verify that and If possible, could you please help us identify the fields we could use for these two tables in a way that allows us to accurately calculate the total revenue for each page path after join?

Thanks in advance!

@dgitis
Copy link
Collaborator

dgitis commented Aug 29, 2024

You should be using the client_key which is based on user_pseudo_id but is more robust.

The pages model isn't meant to be joined. It's optimized for looking at data by page URL adding an ID to this table defeats the purpose of having a pre-aggregated table.

Instead, you should create a fct_ga4__event_page_view model with one row per page_view event and join that to fct_ga4__sessions on client_key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants