Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign traffic source, medium, campaign for more traffic parameters beyond gclid #343

Open
sashage opened this issue Sep 1, 2024 · 3 comments

Comments

@sashage
Copy link

sashage commented Sep 1, 2024

Currently, there is the CTE detect_gclid in stg_ga4__events.sql that reconstructs source, medium and campaign if the URL parameter gclid is present in page_location

Now, there a several more URL parameters that can be linked to paid ad clicks:

gbraid and wbraid: these are alternatives to gclid that Google uses (https://www.customerlabs.com/blog/what-are-gclid-gbraid-and-wbraid-parameters/)

fbclid: click ID for Facebook/Meta/Instagram paid ads. In most cases source is set to l.facebook.com, m.facebook.com, or lm.facebook.com, which effectively splits the traffic. Same for instagram, which normally has the source instagram.com or l.instagram.com

ttclid: click ID for TikTok paid ads. Here the source is usually null unless set explicitly

This list is of course not complete, but should capture a much larger share of unassigned paid traffic.

Proposed change:

select
        * except (event_source, event_medium, event_campaign),
        case
            when (page_location like '%gclid%' and event_source is null) then "google"
            when (page_location like '%gbraid%' and event_source is null) then "google"
            when (page_location like '%wbraid%' and event_source is null) then "google"
            when (page_location like '%fbclid%' and event_source is null) then "facebook"
            when (page_location like '%fbclid%' and event_source LIKE "%facebook.com") then "facebook"
            when (page_location like '%fbclid%' and event_source LIKE "%instagram.com") then "instagram"
            when (page_location like '%ttclid%' and event_source is null) then "tiktok"
            when (page_location like '%ttclid%' and event_source = "tiktok.com") then "tiktok"
            else event_source
        end as event_source,
        case
            when (page_location like '%gclid%' and event_medium is null) then "cpc"
            when (page_location like '%gclid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%gbraid%' and event_medium is null) then "cpc"
            when (page_location like '%gbraid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%wbraid%' and event_medium is null) then "cpc"
            when (page_location like '%wbraid%' and event_medium = 'organic') then "cpc"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_medium is null) then "cpc"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_medium = "referral" ) then "cpc"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_medium is null) then "cpc"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_medium = "referral") then "cpc"
            else event_medium
        end as event_medium,
        case
            when (page_location like '%gclid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%gclid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%gclid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%gbraid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign is null) then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign = 'organic') then "(cpc)"
            when (page_location like '%wbraid%' and event_campaign = '(organic)') then "(cpc)"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_campaign is null ) then "(cpc)"
            when (page_location like '%fbclid%' and ( event_source is null or event_source LIKE "%facebook.com" or event_source LIKE "%instagram.com" ) and event_campaign = "(referral)" ) then "(cpc)"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_campaign is null) then "(cpc)"
            when (page_location like '%ttclid%' and ( event_source is null or event_source = "tiktok.com" ) and event_campaign = "(referral)") then "(cpc)"

            else event_campaign
        end as event_campaign
    from include_event_key
@adamribaudo-velir
Copy link
Collaborator

I'm on board with parsing gbraid, wbraid, fbclick, and ttclick.

I'm just not sure why there are so many checks in the case statements for variations of source/medium/campaign. If a gclid is present in the URL, why bother checking if the event_campaign is null / organic / (organic) before assigning the value to (cpc)? Shouldn't the page_location like '%gclid%' condition be sufficient?

@sashage
Copy link
Author

sashage commented Sep 2, 2024

It could be that advertisers pass along custom UTMs that should be respected.
null / organic / (organic) are auto-populated for campaign in absense of other utm_campaign values

It also checks for source value within medium and campaign, to make sure that all or none are updated. Otherwise we might get inconsistent UTMs

There are probably better ways to do all the checks. Might even make sense to move the entire logic to separate model for easier maintenance. Many ad networks have their own click IDs that we could add later.

@adamribaudo-velir
Copy link
Collaborator

Thanks for the reminder. This mirrors the logic we have implemented today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants