-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a mechanism to support breakdowns by trigger data #55
Comments
Thanks for filing this issue @csharrison. I completely agree with you that there are multiple important, and valuable use cases that a trigger-side “breakdown key” could satisfy. One that immediately comes to mind is publishers who log a “conversion funnel” such as “add to cart”, followed by “initiate checkout” followed by “purchase”. Aggregate statistics about the full-funnel would be very useful. I would love to hear suggestions for how you’d imagine this integrated, for example do you see this as a Cartesian product (breakdown by source-breakdown cross trigger-breakdown) or something else? |
Unfortunately I don't think there's necessarily a one-size fits all solution here. In ARA we allow some binary OR mixing + a query with pre-specified keys. However, we are thinking about some general enhancements to this to support use-cases like key discovery where it's really not very practical to enumerate all possible keys, which typically involves a thresholding step. We don't have anything published on this (yet) but the high level idea is to have a "threshold" parameter which allows you to interpolate between the case where you always output the pre-declared keys (no thresholding. 100% recall assuming your pre-declared space covers everything relevant), and a thresholding mechanism like this one where you have no false positives / 100% precision, but reduced recall due to thresholding. It's also worth noting that introducing a thresholding mechanism like this one pushes us into the "approximate DP" regime with a delta term. |
I was thinking about the privacy of the Cartesian product output of trigger, source breakdown keys. It seems to me that since in IPA we have per user (matchkey) capping, we don’t need to limit the number of breakdown keys from either the trigger or source side. But let me know if this seems correct. If
The natural privacy concern that arises is that a value in This is unlike the ARA event-level reporting which doesn’t cap across all events of a user and thus needs to limit the size of the trigger breakdown set. Otherwise, you could have several events for the same user using a unique trigger breakdown and several events with a unique source breakdown and have no way to cap that user’s contribution to the final breakdown and protect them with the DP noise. |
@bmcase I agree w/ you. The sizes of Side note: for ARA event-level, the privacy mechanism is k-ary randomized response which is technically robust to large domains, it just causes the noise to dominate: |
Thanks for clarifying. That makes sense for event-level reports. How about aggregated reports in ARA, you mention there the trigger breakdown space is large; is this a concern there without user level capping? |
No, the output space is not a concern for aggregate APA for similar reasons to IPA. We ask the developer to pre-specify all possible keys prior to aggregation and just consider those keys in the privacy mechanism: This ensures the key space is public. If we didn't ask the developer to pre-specify we would have to output 2^128 output buckets to be private which is not feasible. |
FYI: we recently published https://github.com/WICG/attribution-reporting-api/blob/main/aggregate_key_discovery.md which is relevant for this topic. It proposes a flexible solution between "declare every possible output bucket" and "use thresholding to ensure 100% precision without declaring any buckets". |
It is a very common use-case to query conversion aggregates broken down by information embedded in the trigger/conversion associated with a given attributed impression. For example:
Supporting this use-case will bring more value to IPA, and we can do it without impacting the privacy bar, so we should consider adding support to this in IPA.
We have support in this in ARA in two places:
One caveat here is that the kind of “conversion label” is sometimes a function of both source and trigger-side data. For example, Google Ads has a “selective optimization” feature that allows for labeling a conversion as eligible for bidding optimization based on the campaign. This is a useful technique for reducing the dimensionality of the output (and thus saving privacy budget).
There are many techniques to explore to satisfy this use case, but at a high level, this requires two things:
More work would be needed to generate trigger breakdown keys as a function of source-side data. For reference on how ARA solves this, see our event explainer, whose mechanism works similarly for generating aggregatable reports. At a high level, we designed a simple DSL that allows selecting trigger data / aggregation keys based on source-side information.
The text was updated successfully, but these errors were encountered: