Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does the Report Collector know what reports to send with each query? #91

Open
nunesgh opened this issue Jul 23, 2024 · 1 comment
Open

Comments

@nunesgh
Copy link

nunesgh commented Jul 23, 2024

From IPA-End-to-End, Case2023, and report_collectors, the context, i.e. the website or app that generates a report, is only ever specified in the encrypted Match Key associated data,1 i.e. the stored value could only be seen in the clear by Helper Parties, but never by the Report Collector.2

But even if the Report Collector could see the context of each report in the clear, hence being able to start building a query for a context, e.g. context A, how could it do so using the relevant reports from contexts B, C, etc.?

For instance, the Report Collector might have 10 trigger reports from context A and 2,000 source reports from context B, of which only 100 were actually related to impressions for ads from context A. Is the Report Collector capable of creating a query with only the 10 trigger reports from A and the 100 source reports from B? Or would the Report Collector always need to send to the Helper Party Network all the records it has stored, from all contexts, for every query?

In summary: 1) How does the Report Collector know which reports belong to any context, e.g. context A, in order to start building a query? 2) Can the Report Collector distinguish what reports from other contexts are relevant to context A?

Footnotes

  1. Encrypt(pk_i, data_i), where data_i = (mk_i, report_collector, current_website/app, match_key_provider) and current_website/app would be the context.

  2. The Report Collector could save reports from each context separately, e.g. in distinct Lists of Events (Def. 7, Case2023) per context, hence keeping the relationship between reports and their originating contexts, but that is not explicitly stated anywhere and does not seem to be the case.

@danielmasny
Copy link
Contributor

In IPA, the actual user identity, i.e. match key is sensitive and hidden from the Report Collector even though sides with a log in will know a lot about the user. We are not trying to hide this type of information from sites or Collectors, only the match key, which allows to link a user across websites. Therefore, Report Collectors have a lot of information to preselect events for a query and assign breakdown keys to them. Nevertheless, there will always be a lot of source events that will not relate to any trigger events. When a report collector does not have much meta information to preselect source events, this will be worse. However this will not impact the outcome of the query but only the cost. We have not quantified this issue. This is mostly since the availability of meta information is a company specific issues and there are very limited public datasets available. It is quite difficult to publish datasets with sensitive user information. Therefore it is hard to quantify whether this is an issue. We haven't received any concerns from any advertiser, publisher or adtechs so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants