How does the Report Collector know what reports to send with each query? #91

nunesgh · 2024-07-23T07:02:40Z

From IPA-End-to-End, Case2023, and report_collectors, the context, i.e. the website or app that generates a report, is only ever specified in the encrypted Match Key associated data,¹ i.e. the stored value could only be seen in the clear by Helper Parties, but never by the Report Collector.²

But even if the Report Collector could see the context of each report in the clear, hence being able to start building a query for a context, e.g. context A, how could it do so using the relevant reports from contexts B, C, etc.?

For instance, the Report Collector might have 10 trigger reports from context A and 2,000 source reports from context B, of which only 100 were actually related to impressions for ads from context A. Is the Report Collector capable of creating a query with only the 10 trigger reports from A and the 100 source reports from B? Or would the Report Collector always need to send to the Helper Party Network all the records it has stored, from all contexts, for every query?

In summary: 1) How does the Report Collector know which reports belong to any context, e.g. context A, in order to start building a query? 2) Can the Report Collector distinguish what reports from other contexts are relevant to context A?

Encrypt(pk_i, data_i), where data_i = (mk_i, report_collector, current_website/app, match_key_provider) and current_website/app would be the context. ↩
The Report Collector could save reports from each context separately, e.g. in distinct Lists of Events (Def. 7, Case2023) per context, hence keeping the relationship between reports and their originating contexts, but that is not explicitly stated anywhere and does not seem to be the case. ↩

The text was updated successfully, but these errors were encountered:

danielmasny · 2024-08-13T22:00:07Z

In IPA, the actual user identity, i.e. match key is sensitive and hidden from the Report Collector even though sides with a log in will know a lot about the user. We are not trying to hide this type of information from sites or Collectors, only the match key, which allows to link a user across websites. Therefore, Report Collectors have a lot of information to preselect events for a query and assign breakdown keys to them. Nevertheless, there will always be a lot of source events that will not relate to any trigger events. When a report collector does not have much meta information to preselect source events, this will be worse. However this will not impact the outcome of the query but only the cost. We have not quantified this issue. This is mostly since the availability of meta information is a company specific issues and there are very limited public datasets available. It is quite difficult to publish datasets with sensitive user information. Therefore it is hard to quantify whether this is an issue. We haven't received any concerns from any advertiser, publisher or adtechs so far.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the Report Collector know what reports to send with each query? #91

How does the Report Collector know what reports to send with each query? #91

nunesgh commented Jul 23, 2024

danielmasny commented Aug 13, 2024

How does the Report Collector know what reports to send with each query? #91

How does the Report Collector know what reports to send with each query? #91

Comments

nunesgh commented Jul 23, 2024

Footnotes

danielmasny commented Aug 13, 2024