Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update readme with the business use cases linked to each API #170

Closed
ALamraniAlaouiScibids opened this issue Jun 30, 2021 · 17 comments
Closed

Comments

@ALamraniAlaouiScibids
Copy link

Hello,

Thanks for updating the readme file, it has been really helpful to better understand the links between the different APIs and to keep track of their current status.
It would be very helpful to add the business use cases that each API is intended to deal with.
My current understanding is:

  • billing (how much does the buyer owe the seller?) (Event Level API)
  • real-time budget monitoring (how much money did my campaign spend today ?) (Event Level API)
  • manual optimization of campaign parameters (Aggregate API)
  • machine optimization of bidding models (Aggregate API)

I am not sure about manual optimization and machine optimization use cases.
It seems that the Event Level API could also be useful for these use cases (for prospecting campaigns that do not use user level data as we can get the impression context data by joining on the 64 bits ids).

Could you please shed some light on this ?

I would be happy to propose a Pull Request once the business use cases are clarified.

@maudnals
Copy link
Contributor

maudnals commented Jul 1, 2021

Hi @ALamraniAlaouiScibids,

It would be very helpful to add the business use cases that each API is intended to deal with.

Definitely! Thank you for opening this issue and proposing this list of use cases.

Let's run through these⏤Note: I come to this discussion from a developer / API mechanics point of view. I may not have an in-depth view into how an adtech typically fulfills these use cases today. So I'll start by asking clarifying questions about the use cases themselves. @csharrison and @johnivdel have more context and depth into advertising use cases, so please chime in when needed!

Billing (how much does the buyer owe the seller?)

Event-level reports can tell you that X conversions took place for a given adtech-defined-ID (attributionsourceventid). It could be a campaignID or creativeID or a combination of publisherID+campaignID, or anything else that makes sense for you to use as an ad-side ID.
But with event-level reports, you can't get too many details on the conversion itself in order to prevent cross-site identity joins. For example, event-level reports (for clicks) can tell you that X conversions of type "purchase" (3 bits) are attributed to a given adtech-defined-ID for ads running on a specific site. Event-level reports (for views) can tell you that X conversions (1 bit) are attributed to a given adtech-defined-ID.
Also worth noting:

  • The event-level reports are sent with a delay, days or sometimes weeks after a conversion. IIUC this doesn't prevent billing, but may impact how it's done, as I'm assuming there are no delays in the way billing is performed today (?)
  • Some amount of noise is applied to conversion-side data. Yet it's possible to recover the true conversion count at an aggregate level, if you know how often the noise was applied (5% of conversions in Chrome). So IIUC this shouldn't hinder the billing use case.

❓Q: With this in mind, do Event-Level reports cover billing use cases? (In particular, is this coarse conversion-side information sufficient for billing?)

Real-time budget monitoring (how much money did my campaign spend today ?)

IIUC, your budget in this context defines how much you're willing to spend on displaying these ads.
Event-level reports are sent with a delay, days or sometimes weeks after a conversion. Aggregate reports are sent with a shorter delay.
❓Q: Do you need the reports to monitor your budget?
❓Q: IIUC, you don't, but you would use the attributionsourceventids to keep track of what you're spending; is it how you're envisioning budget monitoring?

Manual optimization of campaign parameters

❓Q: Can you describe what manual optimization looks like? Something along the lines of (oversimplified): Last month, campaign1 led to more conversions that campaign2, so I'll display more campaign1 ads or maybe Last month, campaign1 led to higher conversion values that campaign2, so I'll display more campaign1 ads?

Aggregate reports can give you a mapping of campaignID-conversion count or (for example) campaignID-total purchase value. With this in mind and if the answer to the question above is yes, Aggregate reports can be used for manual optimization of some campaign parameters.

❓Q: Event-level reports can give you a mapping of campaignID-conversion count (and even conversion type, for clicks). Isn't this also fulfilling some optimization use cases?

Machine optimization of bidding models

Event-level reports can be used to train models and optimize for a conversion count or a type of conversions (e.g. purchases).
[EDIT] Optimization use cases such as "optimize for higher purchase value" or "optimize for <anything that needs to be more granular that 3 or 1 bits>" is an area of active research. Only aggregate reports give you granular conversion-side data, and data in these is noised to achieve differential privacy. How to strike the balance of usefulness (optimize for purchase value) and privacy, what's the right amount of differentially private noise to add? This is an area of active research and discussion.

❓Q: Does Machine optimization here mean "train ML models to optimize for X"?
❓Q: If yes, what is typically X? A total purchase value?

One question for @csharrison / @johnivdel: ❓Q: Event-level reports give detailed ad-side data down to for example a creativeID. Aggregate reports do support detailed ad-side data too, but can this also be used for optimization as in granular campaign parameter tweaking (granular ad-side) in order to reach a higher conversion count (coarse)?

@ALamraniAlaouiScibids
Copy link
Author

Hello @maudnals,

Thank you for your quick answer and explanations.
I will answer first to the questions related to the optimization as it is what I am most familiar with:

❓ Q: Can you describe what manual optimization looks like? Something along the lines of (oversimplified): Last month, campaign1 led to more conversions that campaign2, so I'll display more campaign1 ads or maybe Last month, campaign1 led to higher conversion values that campaign2, so I'll display more campaign1 ads?

❓ Q: Event-level reports can give you a mapping of campaignID-conversion count (and even conversion type, for clicks). Isn't this also fulfilling some optimization use cases?

I would define manual optimization by "any mono dimensional variable optimization that a human can easily do". It could be linked to the campaign ID or any other variable.
For instance: site_id=123 performs better that site_id=456, then I will spend more on site_id=123.

I understand that the campaign_ID use case is taken into account in the Event Level API. However it seems to me that we cannot do manual optimization on other variables using the Event Level API data. Unless we use the 64 bits as an impression_id and join on the corresponding impression level data, but it does not seem to be the privacy sandbox vision in the long term.
(other variables = site / domain, creative, region, postal_code, browser etc....)

❓ Q: Does Machine optimization here mean "train ML models to optimize for X"?
❓ If yes, what is typically X? A total purchase value?

X can be different things according to the objectives of the campaign:

  • purchase value
  • conversion (number of purchases)
  • views on website
  • clicks
  • etc...

❓ Q: With this in mind, do Event-Level reports cover billing use cases? (In particular, is this coarse conversion-side information sufficient for billing?)

I am not sure I can answer this. It would be interesting to have the input of a Media agency on this.

❓ Q: Do you need the reports to monitor your budget?
❓ Q: IIUC, you don't, but you would use the attributionsourceventids to keep track of what you're spending; is it how you're envisioning budget monitoring?

In fact, what would be needed here is an "event level reporting" on the impression data and the associated bid price.
It does not really concern this repo though.

@lbdvt
Copy link
Contributor

lbdvt commented Jul 7, 2021

Hi,

❓Q: With this in mind, do Event-Level reports cover billing use cases? (In particular, is this coarse conversion-side information sufficient for billing?)

I can provide some info related to this.

On the open web, publishers (i.e. sites that display ads) usually earn ad revenues by charging a price for each ad displayed ("CPM" price, defined through RTB auction).

Advertisers (i.e. sites that sell goods/services) pay for ad campaigns:

  • Either for each ad displayed ("CPM" model)
  • Either for each ad clicked ("CPC" model)
  • Either for each sale (or "conversion" in a more generic way), as a % of the sale amount. This is often referred to as affiliate marketing.

Inbetween the publishers and the advertisers, there are often AdTech intermediaries (SSP, Exchange, DSP), who will take a share of the revenue in exchange for the services they are providing. They will also bridge the gap between publisher and advertiser billing models. For example, Criteo bills its advertisers on a CPC basis, but pays its publishers on a CPM basis.

Each participant (advertisers, publishers, AdTech intermediaries) has its own events' log (display, click, conversion) and uses this log to validate the bills sent or received. For example, an Ad Tech intermediary will always validate according to its own records of events a bill from a publisher.

Bills are usually sent the first day of the month, for the whole past month, but billing at specific dates is not unusual too.

So in a nutshell, web advertising billing is based on:

  • Events such as ad display, ad click, conversion (including sale amount in case of the sale)
  • Low latency data (below a few hours)
  • Multiple parties having access to independent and auditable data

@vincent-grosbois
Copy link

Hi @maudnals

❓Q: With this in mind, do Event-Level reports cover billing use cases? (In particular, is this coarse conversion-side information sufficient for billing?)
I don't think event-level report cover billing use case at all:

  • most billing that occurs is based on display-basis, or click-basis. Seems like event-level report are made to report for "conversion" events, or at least events that occurs on the "destination" website . AFAIK there are not a lot of models where billing exists directly based on these events.
  • billing system needs at least to encode a price by event to bill, which I don't really see easily possible there, unless in fitting it in the 64 bits of "event" data
  • billing system needs to be received by some parties as soon as possible for basic checks, even if the actual billing (money transfer) is probably done in bulk at the end of month

@maudnals
Copy link
Contributor

Great, thank you for all the details @ALamraniAlaouiScibids @lbdvt.

Billing and budget monitoring

Events such as ad display, ad click, conversion (including sale amount in case of the sale)

(Note: access to the sale amount can be provided by aggregate reports).

In fact, what would be needed here is an "event level reporting" on the impression data and the associated bid price.
It does not really concern this repo though.

low-latency data (below a few hours)

Bills are usually sent the first day of the month, for the whole past month, but billing at specific dates is not unusual too.

Q for @csharrison @johnivdel: What are your thoughts on the billing use case, and on the (real-time) budget monitoring use case?
There was a past conversation on affiliate commissions in which aggregate reports were mentioned.

Optimization

I understand that the campaign_ID use case is taken into account in the Event Level API. However it seems to me that we cannot do manual optimization on other variables using the Event Level API data. Unless we use the 64 bits as an impression_id and join on the corresponding impression level data, but it does not seem to be the privacy sandbox vision in the long term.
(other variables = site / domain, creative, region, postal_code, browser etc....)

You can use the 64 bits as an impression_id and join on the corresponding impression level data.
One key aspect of the privacy sandbox vision is preventing the joinability of 1-P identities across sites (see privacy model). An adtech accessing detailed ad-side information is compatible with the privacy sandbox vision, as long as this can't be used to track an individual user across sites. For event-level reports, the 3-bit or 1-bit conversion data limitation creates the needed protection.

With this in mind, manual optimization of campaign parameters for a certain coarse dimension of conversion data is supported by event-level reports. E.g. if your conversion-side mapping is a conversion type (integer between 0 and 7, used as a conversion type: signup - checkout - ...) you can use event-level reports to optimize for a certain conversion type. Keeping in mind the delay with which event-level reports are sent.

Now, AFAIK, optimization for any more granular conversion-side data (e.g. optimize for purchase value) is an area of active discussion and research. Maybe this is something that could be listed as such, in an "Open questions / Areaa of active research" section.

A few questions for you @ALamraniAlaouiScibids:

  • Q: How relevant is the distinction of machine vs. manual optimization at this stage, in the context of listing use cases in the README? Does this need to be a top level categorization? To take an example, optimizing for a certain conversion type (thanks to event-level reports) is something that would be supported manually, but could also be done via ML, AFAIU. @ALamraniAlaouiScibids @johnivdel @csharrison WDYT?
  • Q: Optimization of campaign parameters for a certain coarse dimension of conversion data is supported by event-level reports. The 64 bits can actually be used to optimize on any ad-side parameter. Do you think that the term "optimization of campaign parameters" captures this well? Or is it too reductive?
  • Q: Taking a step back, why are optimization of campaign parameters and optimization of bidding models listed separately in your initial proposal, what are the main differences? Is the goal (as in the X in Optimize for X) different? Are the parameters (as in the Y in tweak Y to reach optimization) different? Is the complexity (multi vs monodimensional) different?

@ALamraniAlaouiScibids
Copy link
Author

Thank you for the explanations @maudnals.

Q: Taking a step back, why are optimization of campaign parameters and optimization of bidding models listed separately in your initial proposal, what are the main differences? Is the goal (as in the X in Optimize for X) different? Are the parameters (as in the Y in tweak Y to reach optimization) different? Is the complexity (multi vs monodimensional) different?

I have done the distinction between manual and machine optimization as the multi dimensional "machine optimization" seemed more complex (the noise level here needs to be quite low to guarantee building a usable ML model on multidimensional data).
But I guess that we could drop the "machine vs. manual optimization distinction" for simplicity in this first proposal and iterate over it if needed.

Q: Optimization of campaign parameters for a certain coarse dimension of conversion data is supported by event-level reports. The 64 bits can actually be used to optimize on any ad-side parameter. Do you think that the term "optimization of campaign parameters" captures this well? Or is it too reductive?

Maybe we can use a more general term such as "Campaign Optimization".

At this stage, It seems that the advertisers would use either the event level API or aggregate level API according to:

  • their need for real time optimization (delay to get the data in the event level API)
  • their tolerance of noise (aggregate API is more noisy ? I guess it will depend on the choice of the different parameters L1 and epsilon versus the 5% noise applied in the event level API)
  • the type of conversions (Post View versus Post Click) as they are not handled the same way in the event level API (more restrictions for post view conversions)
  • the granularity of data needed on the conversion side (if the advertiser is doing value based optimization such as ROI, the event level API would not be able to encode the data)

Does this new distinction seems more relevant ?

@maudnals
Copy link
Contributor

maudnals commented Aug 6, 2021

(Sorry for the slow reply, I was ooo⏤thanks for the details!)

I have done the distinction between manual and machine optimization as the multi dimensional "machine optimization" seemed more complex (the noise level here needs to be quite low to guarantee building a usable ML model on multidimensional data).
But I guess that we could drop the "machine vs. manual optimization distinction" for simplicity in this first proposal and iterate over it if needed.

Sounds good, happy to iterate when needed.

Maybe we can use a more general term such as "Campaign Optimization".

Q: would this term cover both "optimize for purchase value" and "optimize for number of conversions"? And be understood as such by adtech companies, publishers, and advertisers? cc @johnivdel @csharrison

either the event level API or aggregate level API

(Nit on terminology: so far, event and aggregate reports have been described as features / report types within one single API)

advertisers would use either the event level API or aggregate level API according to

  1. their need for real time optimization (delay to get the data in the event level API)
  2. their tolerance of noise (aggregate API is more noisy ? I guess it will depend on the choice of the different parameters L1 and epsilon versus the 5% noise applied in the event level API)
  3. the type of conversions (Post View versus Post Click) as they are not handled the same way in the event level API (more restrictions for post view conversions)
  4. the granularity of data needed on the conversion side (if the advertiser is doing value based optimization such as ROI, the event level API would not be able to encode the data)

To the best of my knowledge: (@csharrison @johnivdel WDYT?)

  • 4 is likely the most salient point in this list
  • The levels of noise are WIP, so 2 remains an open question / an area of active research
  • Timing (1) is definitely a difference between the report types. One nit: even though aggregate reports won't be delayed to the same extent as event-level, "real-time" may be a stretch and that term isn't used in the spec ("It may be possible to send these with as little delay as ~0-1 hour", see Aggregate proposal).
  • IIUC you're suggesting in 3 that aggregate reports treat views and clicks in the same way, whereas event-level apply different constraints to clicks vs views. Q: Are you saying that this would be decisive ("I need aggregate reports because I want click and view data to be treated in the same way")? Or are you plainly describing a difference between the report types?

One suggestion:
Keeping in mind the desired outcome for this issue (adding a list of [EDIT] potential use cases to the repo), maybe the discussion has reached a point where we're ready to start drafting some text?
In a different context for a developer blogpost, we (web Developer Relations) had put together a list of use cases. Maybe that starter text can be used here as a starter / basis. It includes the notion of "Reporting" (that I think we haven't discussed here but it may be helpful to describe non-optimization use cases) but I'm sure that text could be more fleshed-out and benefit from the discussion above.
I've created a doc with that starter text, @ALamraniAlaouiScibids and anyone interested let's collaborate on that doc?

@maudnals
Copy link
Contributor

maudnals commented Aug 18, 2021

Thank you @ALamraniAlaouiScibids for your comment on the doc.
Do you have additional suggestions?

@ALamraniAlaouiScibids
Copy link
Author

ALamraniAlaouiScibids commented Aug 30, 2021

Hello @maudnals,
(sorry for the late reply, I was ooo).
SGTM for the document ! 👍
If it can help, here is a table with the advantages and drawbacks for each report type:
summary table

@maudnals do not hesitate if you need anything from me

@maudnals
Copy link
Contributor

maudnals commented Sep 8, 2021

Thanks for sharing this! I had a look and left a few comments.
@csharrison and @johnivdel may have a few suggestions / corrections?

@maudnals
Copy link
Contributor

Hi @ALamraniAlaouiScibids, would you consider presenting your table and ask for feedback on it next Monday during the call?
That could be a good place to get some extra input on this.

@ALamraniAlaouiScibids
Copy link
Author

Yes sure I could do that.
Nice idea !

@maudnals
Copy link
Contributor

Nice!
In this case, please add your item to Monday's agenda in this doc:
https://docs.google.com/document/d/1zUSm9nX2nUsCa_fbI96UJoRCEr3eAPwWLU7HmClhIJk/edit

@maudnals
Copy link
Contributor

Thanks for presenting today!
As mentioned during the meeting, @ALamraniAlaouiScibids feel free to create a follow-up issue regarding the limitation on views (and any other topic you'd like to discuss further).

@ALamraniAlaouiScibids
Copy link
Author

Thanks for your help @maudnals.
I have created the follow up issues:
#229 and #230

@maudnals
Copy link
Contributor

Great, thanks @ALamraniAlaouiScibids.
Do you feel comfortable closing this ticket or do you think there is more to discuss?

Also, FYI: #219

@ALamraniAlaouiScibids
Copy link
Author

Perfect, thanks for the PR.
Will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants