From a28ac1a5cfb0573364967898af690798d3932ed0 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Thu, 13 Oct 2022 16:21:35 -0700 Subject: [PATCH 1/8] rename threat-model-draft to threat-model/readme.md for automatic github rendering --- threat-model/{Threat-Model-Draft-2022-08-09.md => readme.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename threat-model/{Threat-Model-Draft-2022-08-09.md => readme.md} (100%) diff --git a/threat-model/Threat-Model-Draft-2022-08-09.md b/threat-model/readme.md similarity index 100% rename from threat-model/Threat-Model-Draft-2022-08-09.md rename to threat-model/readme.md From 4f71178057d98fb43a66c667a9e10f7b9df59945 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Sat, 15 Oct 2022 23:07:20 -0700 Subject: [PATCH 2/8] add in details about MPC and TEEs --- threat-model/readme.md | 92 +++++++++++++++++++++++++----------------- 1 file changed, 54 insertions(+), 38 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index 6dfb994..d4d6b89 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -10,16 +10,18 @@ This document is currently a **draft**, submitted to the PATCG (and eventually t In this document, we outline the security considerations for proposed purpose-constrained APIs for the web platform (that is, within browsers, mobileOSs, and other user-agents) specified by the Private Advertising Technologies Working Group (PATWG). -We assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some (proposal specific) subset of aggregators. +Many of these proposals attempt to leverage the concept of _private computation_ as a component of these purpose-constrained APIs. An ideal private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function to the specified parties. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. -In the presence of this adversary, APIs should aim to achieve the following goals: +Private computation has various constructions, each with different assumptions. The two primary forms considered by existing proposals are _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs.) MPC relies on distinct parties, or _aggregators_, who perform a cryptographic protocol, while TEEs rely on specialized hardware that provides isolation for computation on sensitive data. +For our threat model, we assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some subset of aggregators, when used. +In the presence of this adversary, APIs should aim to achieve the following goals: 1. **Privacy**: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.) 2. **Correctness:** Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known. -Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. +Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. Moreover, this threat model does not address the specific threats to different configurations of aggregators participating in an MPC, nor the threats to TEEs. These configurations enable varying depth of defense against such an attacker, and are instead left to web platform vendors to decide what level is appropriate for their APIs and users. This is explored further in [Section 4: Private Computation Configurations](#4-private-computation-configurations). ## 1. Threat Model @@ -29,19 +31,14 @@ In this section, we enumerate the potential actors that may participate in a pro ### 1.1. Client/User - #### 1.1.1. Assets - - 1. Original inputs provided to client APIs. Clients expose these APIs to other actors below, which can modify the client’s assets, but should not reveal them. 2. Unencrypted input shares, for systems which rely on secret sharing among aggregators. #### 1.1.2. Capabilities - - 1. Individuals can reveal their own input and compromise their own privacy. 2. Clients (user agent software) can compromise privacy by leaking their assets (beyond specified aggregators, when used.) 3. Clients may affect correctness of a protocol by reporting false input. @@ -49,8 +46,6 @@ In this section, we enumerate the potential actors that may participate in a pro #### 1.1.3. Mitigations - - 1. The client/user may be able to provide certain forms of validation to mitigate either individual client’s or coalitions of clients' ability to compromise correctness. These include (but are not limited to): 1. Bound checking via zero-knowledge proof that an oblivious representation of a value lies within certain bounds. This would provide the client/user the ability to lie or misrepresent a value, but constrains that ability. 2. Anonymous assertion via tokens that provide some form of authorization that can be tied to a specific input. Such tokens may be provided by other parties, to assert the authenticity of a report, or reflect that a user has been recognized as an authenticated individual user in another context. These tokens must be anonymous, meaning that it cannot be attributed to a specific user (e.g. by using a cryptographic technique like blind signatures.) @@ -58,11 +53,8 @@ In this section, we enumerate the potential actors that may participate in a pro ### 1.2. First Party Site/App - #### 1.2.1. Assets - - 1. Unencrypted inputs into the proposed API. 2. First party context (e.g. first party cookies / information in first party storage.) 3. User-agent provided information (e.g. user agent string, device information, user configurations.) @@ -72,16 +64,12 @@ In this section, we enumerate the potential actors that may participate in a pro #### 1.2.2. Capabilities - - 1. First parties can modify client assets through the proposed APIs. 2. First parties can choose which encrypted outputs from the proposed API are provided to aggregators. #### 1.2.3. Mitigations - - 1. Modification of client assets should be limited by the API interface to only allow for intended modifications. 2. Use of differential privacy (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization)) should be used to prevent @@ -99,24 +87,18 @@ See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First- #### 1.3.1. Assets - - 1. All the same assets as First Parties ([section 1.2.1](#121-Assets)) 1. Exception that 1.2.1.2 would be partitioned delegated party context. #### 1.3.2. Capabilities - - 1. The same capabilities as First Parties ([section 1.2.2](#122-Capabilities)) 2. Delegated parties can modify client assets through the proposed APIs, which may include overwriting/modifying data provided by the first party or other delegated parties. #### 1.3.3. Mitigations - - 1. The same mitigations as Third Parties ([section 1.2.3](#123-Mitigations)) 2. The client should prevent the ability of a delegated party to overwrite/modify data provided by the first party or other delegated parties. If this is not possible, the client should allow the first party to control and limit a delegated party's ability in this manner. (See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First-Parties-Embedded-Parties-and-Delegated-Parties) for more details.) @@ -128,8 +110,6 @@ An aggregator is an individual party which participates in an aggregation protoc #### 1.4.1. Assets - - 1. Unencrypted individual share. 2. All the assets from first and third parties. 3. Shares of the output of the aggregation protocol. @@ -138,8 +118,6 @@ An aggregator is an individual party which participates in an aggregation protoc #### 1.4.2. Capabilities - - 1. Aggregators may defeat correctness by emitting bogus output shares. 2. Aggregators (who collude with a first or third party) may learn information about the clients/users which contribute to the protocol. 3. Aggregators may compromise the availability of the system by refusing to participate in the protocol. @@ -148,8 +126,6 @@ An aggregator is an individual party which participates in an aggregation protoc #### 1.4.3. Mitigations - - 1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy as long as the (proposal specific) subset of aggregators do not reveal their shares. 2. The aggregation protocol should provide robust analysis that it is in fact a differentially private function (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization).) 3. Bogus inputs can be generated that encode “null” or “noop” shares, designed to not affect the aggregation protocol, but can mask the total number of true inputs. @@ -159,6 +135,8 @@ An aggregator is an individual party which participates in an aggregation protoc If enough aggregators (beyond the proposal specific subset) collude e.g. by sharing unencrypted input shares), then none of the properties of the system hold. Such scenarios are outside the threat mode. +However, we do assume that an attacker can always control at least one aggregator (i.e. there are no perfectly trusted aggregators.) + ### 1.6. Attacker on the network @@ -167,8 +145,6 @@ We assume the existence of attackers on the network links between various partie #### 1.6.1. Capabilities - - 1. Observation of network traffic. Attackers may observe messages exchanged between parties exchanged at the IP layer. 1. Time of transmission by clients could reveal information about user activity. 2. Observation of message size could allow the attacker to learn how much input is being submitted by a specific user. @@ -177,8 +153,6 @@ We assume the existence of attackers on the network links between various partie #### 1.6.2. Mitigations - - 1. All messages exchanged between parties should be encrypted / use TLS / use HTTPS. 2. All messages between aggregators and to/from first/third parties should be mutually authenticated to prevent impersonation. 3. Timing of messages should be carefully considered as not to leak anything beyond what the attacker would learn absent the proposed API. @@ -230,11 +204,53 @@ When managing a privacy budget, that budget is assigned to some party who can us ### 3.3 Privacy Parameters -This results in four parameters which need to reach consensus in the PATWG: - +This results in four parameters: - -1. _Aggregation minimum threshold_: the number of clients/users that need to be included in a given aggregation. +1. _Unit of privacy_: The set of parties used to manage the privacy budget. 2. _Epoch_: The amount of time over which a differential privacy budget should be managed. 3. ε: The differential privacy parameter which measures the amount of individual differential information leakage allowed in each epoch. -1. _Unit of privacy_: The set of parties used to manage the privacy budget. +4. _Aggregation minimum threshold_: the number of clients/users that need to be included in a given aggregation. + +We can divide these parameters into two groups: parameters which should be part of the standard (and thus require consensus) and parameters which can be configured by the web platform vendor. The first two parameters, _unit of privacy_, and _epoch_ should likely be part of the standard. However, the other two parameters, ε and _aggregation minimum threshold_ could differ across web platform vendors without compromising interoperability. + +Let's first address, ε. Suppose that we have _Browser A_ and _Mobile OS B_ which, respectively, decide that the appropriate budget is "1 unit/epoch" and "5 unit/epoch". Luckily, in the worst case, privacy budgets are additive, and thus can be split into smaller pieces. A user of this API could then decide to use "1 unit/epoch" on the API from both _Browser A_ and _Mobile OS B_, and then continue to use the remaining "4 unit/epoch" on the API from _Mobile OS B_. + +Secondly, let's address _aggregation minimum threshold_. Suppose that the same _Browser A_ and _Mobile OS B_ have also, respectively, decided that the _aggregation minimum threshold_ should be "X people" and "2X people". Because this is a minimum, when using the API from both, the minimum can be set to 2X, and satisfy both constraints. However, when using the API from only _Browser A_, the minimum could be reduced to X. + +## 4. Private Computation Configurations + +Many of these proposals aim to leverage the idea of _private computation_, touched on briefly in the introduction. In it's ideal form, a private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. + +There are currently two proposed constructions of a private computation environment: _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs). The following two subsections outline how these can be to create a private computation environment, how they fit into the threat model, and the assumptions required to achieve our goal of assuring that the inputs are not revealed. + +### 4.1 Multi-party Computation (MPC) + +Multi-party computation is a cryptographic protocol in which distinct parties can collectively operate on data which remains oblivious to any individual party throughout the computation, but allows for joint evaluation of a predefined function. + +These protocols typically work with data which is _secret shared_. For example, a three way secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3. At this point, each value si individually appears random, and thus v remains oblivious even when a party learns all by one si. Note that secret shares can also be generated using XOR with random byte strings. + +In terms of our threat model, these parties are aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. + +Given these aggregators, a _client/user_ is able to generate a secret sharing of their input data, and then securely communicate one secret share to each aggregator. This allows the aggregators to then perform the MPC protocol to compute the predefined function. The _client/user_ is assuming that an attacker cannot control enough aggregators to violate that protocol, which implies that they only perform the expected function. + +### 4.2 Trusted Execution Environments (TEEs) + +Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. It also produces an "attestation" that provides a means of verifying that the code run on that data is the code that was intended. Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, and while we don't expect all web platform vendors to support this construction, some vendors have expressed comfort with the required assumptions. + +In order to utilize a TEE to achieve private computation, we need two properties. First, the data must only be able to be decrypted within the TEE, and second, we need to assure that the TEE only executes code which evaluates the predefined function. + +In a very simplified form, we could imagine there is one TEE which is used by every _first party_/_delegated party_ to run the predefined function. In this case, the _user/client_ could encrypt their data with the public key of the TEE. However, this ignores assuring the code running is that that's intended. The TEE produces an attestation of the code it's running, however the _client/user_ needs that attestation to be verified before their input data is decrypted. + +This can be achieved by utilizing an aggregator who verifies the TEE attestation on behalf of the _client/user_. In this case, the _client/user_ encrypts the data towards the aggregator, and the aggregator provides the decryption key into the TEE if and only if the attestation verifies the TEE is only running the expected code. This still has one problem, however, as we assume that our attacker can compromise at least one aggregator. + +We can extend this one step further, by utilizing not one, but multiple aggregators. Each aggregator can hold part of a _threshold key_, which allows data to be encrypted such that it requires all (or a predefined subset) of keys to decrypt. Each aggregator verifies the attestation and provides their decryption key into the TEE if and only if the attestation verifies the TEE is only running the expected code. Thus, only the TEE is ever able to decrypt the data, so long as the attacker is only able to control a subset of aggregators. + +### 4.3 Abstracting Private Computation + +Both of these constructions rely on a set of aggregators, where we assume that an attacker is unable to control all aggregators. Let a specific set of aggregators be called an aggregator network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an aggregator network can be _trusted_, e.g. that attackers would be unable to compromise all aggregators within that specific aggregator network. + +To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the aggregators, diversity in cloud provider (if relevant) used by the aggregators, diversity in jurisdictions in which the aggregators operate, etc. + +It's unlikely that all web vendors will arrive at the same decision as to which aggregator networks can be assumed to be trusted. Our aim, however, is for many such aggregator networks to exist that are trusted by most web platforms, allowing for choice and competition for the first/delegated parties that leverage these APIs. Moreover, as we may have web platform vendors providing different privacy budgets (see section [3.3 Privacy Parameters](#33-privacy-parameters),) the first/delegated parties may use multiple aggregator networks to utilize those different budgets. + +Just as web vendors unlikely to fully agree on the sets of trusted aggregator network, they may also reach different conclusions as to which constructions of private computation can be utilized. If we consider an aggregator network to be the pairing of both the aggregator parties and the private computation construction, it's reasonably straight forward to for the standard to be unopinionated about the construction. Instead, individual web platform vendors can include the private computation construction when considering which aggregator networks are trusted. From f349da673ad3d6db0ffbc6f4437b7692198a8d8a Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Tue, 18 Oct 2022 10:46:24 -0700 Subject: [PATCH 3/8] Apply suggestions from Martin's review Co-authored-by: Martin Thomson --- threat-model/readme.md | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index d4d6b89..b971dd9 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -10,7 +10,7 @@ This document is currently a **draft**, submitted to the PATCG (and eventually t In this document, we outline the security considerations for proposed purpose-constrained APIs for the web platform (that is, within browsers, mobileOSs, and other user-agents) specified by the Private Advertising Technologies Working Group (PATWG). -Many of these proposals attempt to leverage the concept of _private computation_ as a component of these purpose-constrained APIs. An ideal private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function to the specified parties. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. +Many of these proposals attempt to leverage the concept of _private computation_ as a component of these purpose-constrained APIs. An ideal private computation system would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. Private computation can be used to perform aggregation over inputs which, individually, must not be revealed. Private computation has various constructions, each with different assumptions. The two primary forms considered by existing proposals are _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs.) MPC relies on distinct parties, or _aggregators_, who perform a cryptographic protocol, while TEEs rely on specialized hardware that provides isolation for computation on sensitive data. @@ -21,7 +21,7 @@ In the presence of this adversary, APIs should aim to achieve the following goal 1. **Privacy**: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.) 2. **Correctness:** Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known. -Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. Moreover, this threat model does not address the specific threats to different configurations of aggregators participating in an MPC, nor the threats to TEEs. These configurations enable varying depth of defense against such an attacker, and are instead left to web platform vendors to decide what level is appropriate for their APIs and users. This is explored further in [Section 4: Private Computation Configurations](#4-private-computation-configurations). +Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation option provides different options for defense against attacks. Web platform vendors can decide what configurations produce adequate safeguards for their APIs and users. This is explored further in [Section 4: Private Computation Configurations](#4-private-computation-configurations). ## 1. Threat Model @@ -215,27 +215,35 @@ We can divide these parameters into two groups: parameters which should be part Let's first address, ε. Suppose that we have _Browser A_ and _Mobile OS B_ which, respectively, decide that the appropriate budget is "1 unit/epoch" and "5 unit/epoch". Luckily, in the worst case, privacy budgets are additive, and thus can be split into smaller pieces. A user of this API could then decide to use "1 unit/epoch" on the API from both _Browser A_ and _Mobile OS B_, and then continue to use the remaining "4 unit/epoch" on the API from _Mobile OS B_. -Secondly, let's address _aggregation minimum threshold_. Suppose that the same _Browser A_ and _Mobile OS B_ have also, respectively, decided that the _aggregation minimum threshold_ should be "X people" and "2X people". Because this is a minimum, when using the API from both, the minimum can be set to 2X, and satisfy both constraints. However, when using the API from only _Browser A_, the minimum could be reduced to X. +This might be achieved by using a system that distributes information from both Browser A and Mobile OS B to a private computation entity that is configured to consume 1 unit per epoch. Information from Mobile OS B is additionally distributed to another entity that is configured to consume the remaining 4 units. As a practical matter, the two logical private computation entities here could be operated by the same organizations, but they would use different configurations. For instance, each configuration might use different keying material. In this case, Browser A is able to limit information release to 1 unit per epoch by choosing where information can be sent. + +Secondly, let's address _aggregation minimum threshold_. Suppose that the same _Browser A_ and _Mobile OS B_ have also, respectively, decided that the _aggregation minimum threshold_ should be "X people" and "2X people". Because this is a minimum, when using the API from both, the minimum can be set to 2X, and satisfy both constraints. However, when using data that comes from _Browser A_ only, the minimum could be reduced to X. ## 4. Private Computation Configurations Many of these proposals aim to leverage the idea of _private computation_, touched on briefly in the introduction. In it's ideal form, a private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. -There are currently two proposed constructions of a private computation environment: _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs). The following two subsections outline how these can be to create a private computation environment, how they fit into the threat model, and the assumptions required to achieve our goal of assuring that the inputs are not revealed. +There are currently two proposed constructions of a private computation environment under consideration: _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs). The following two subsections outline how these can be to create a private computation environment, how they fit into the threat model, and the assumptions required to achieve our goal of assuring that the inputs are not revealed. -### 4.1 Multi-party Computation (MPC) +### 4.1 Multi-party Computation Multi-party computation is a cryptographic protocol in which distinct parties can collectively operate on data which remains oblivious to any individual party throughout the computation, but allows for joint evaluation of a predefined function. -These protocols typically work with data which is _secret shared_. For example, a three way secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3. At this point, each value si individually appears random, and thus v remains oblivious even when a party learns all by one si. Note that secret shares can also be generated using XOR with random byte strings. +These protocols typically work with data which is _secret shared_. For example, a three way _additive_ secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3 = v - s1 - s2. At this point, each value si individually appears random, and thus v remains oblivious as long as no single entity learns all values of si. A similar secret sharing schemes uses XOR in place of addition; alternatively, Shamir's secret sharing uses polynomial interpolation. In terms of our threat model, these parties are aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. -Given these aggregators, a _client/user_ is able to generate a secret sharing of their input data, and then securely communicate one secret share to each aggregator. This allows the aggregators to then perform the MPC protocol to compute the predefined function. The _client/user_ is assuming that an attacker cannot control enough aggregators to violate that protocol, which implies that they only perform the expected function. +Given these aggregators, a _client/user_ is able to generate a secret sharing of their input data, and then securely communicate one secret share to each aggregator. This allows the aggregators to then perform the MPC protocol to compute the predefined function. The _client/user_ trusts that an attacker cannot control enough aggregators to violate that protocol. + +If the protocol is faithfully executed, the aggregate result can then be reported to the intended recipient without the entities that performed the computation needing to witness the answer. + +This threat model does not require that the MPC protocol provide safeguards against an aggregator spoiling the answers from the system. Though a spoiled answer is possible in many protocols, the primary threat that a compromised aggregator presents is to the privacy of _clients/users_. We assume that there is some contractual relationship between those requesting aggregation and the entities that perform that aggregation such that the aggregators lack significant incentive to spoil results. ### 4.2 Trusted Execution Environments (TEEs) -Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. It also produces an "attestation" that provides a means of verifying that the code run on that data is the code that was intended. Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, and while we don't expect all web platform vendors to support this construction, some vendors have expressed comfort with the required assumptions. +Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. A TEE can produce an "attestation" that acts as a claim - backed by the vendor of the TEE - that only specific code is actively running. + +Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, and while we don't expect all web platform vendors to support this construction, some vendors have expressed comfort with the required assumptions. In order to utilize a TEE to achieve private computation, we need two properties. First, the data must only be able to be decrypted within the TEE, and second, we need to assure that the TEE only executes code which evaluates the predefined function. @@ -247,7 +255,7 @@ We can extend this one step further, by utilizing not one, but multiple aggregat ### 4.3 Abstracting Private Computation -Both of these constructions rely on a set of aggregators, where we assume that an attacker is unable to control all aggregators. Let a specific set of aggregators be called an aggregator network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an aggregator network can be _trusted_, e.g. that attackers would be unable to compromise all aggregators within that specific aggregator network. +Both of these constructions rely on a set of aggregators, where we assume that an attacker is unable to control all aggregators. Let a specific set of aggregators be called an aggregator network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an aggregator network can be _trusted_. That is, that attackers would be unable to compromise any subset of the aggregators within that specific aggregator network such that the privacy guarantees of the system could not be met. To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the aggregators, diversity in cloud provider (if relevant) used by the aggregators, diversity in jurisdictions in which the aggregators operate, etc. From 6cd8b4fc9c8e74d732fed3bf7b2379efb7bac564 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Wed, 19 Oct 2022 13:48:30 -0700 Subject: [PATCH 4/8] Apply more suggestions from review Co-authored-by: Martin Thomson --- threat-model/readme.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index b971dd9..279c2a8 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -12,7 +12,13 @@ In this document, we outline the security considerations for proposed purpose-co Many of these proposals attempt to leverage the concept of _private computation_ as a component of these purpose-constrained APIs. An ideal private computation system would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. Private computation can be used to perform aggregation over inputs which, individually, must not be revealed. -Private computation has various constructions, each with different assumptions. The two primary forms considered by existing proposals are _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs.) MPC relies on distinct parties, or _aggregators_, who perform a cryptographic protocol, while TEEs rely on specialized hardware that provides isolation for computation on sensitive data. +Private computation can be instantiated using several technologies: + +* Multi-party computation (MPC) distributes information between multiple independent entities using secret sharing. +* A trusted execution environment (TEE) isolates computation and its state by using specialized hardware. +* Fully homomorphic encryption (FHE) enables computation on the ciphertext of encrypted inputs. + +Though the implementation details differ for each technology, ultimately they all rely on finding two entities - or _aggregators_ - that can be trusted not to conspire to reveal private inputs. The forms considered by existing attribution proposals are MPC and TEEs. For our threat model, we assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some subset of aggregators, when used. @@ -135,7 +141,7 @@ An aggregator is an individual party which participates in an aggregation protoc If enough aggregators (beyond the proposal specific subset) collude e.g. by sharing unencrypted input shares), then none of the properties of the system hold. Such scenarios are outside the threat mode. -However, we do assume that an attacker can always control at least one aggregator (i.e. there are no perfectly trusted aggregators.) +However, we do assume that an attacker can always control at least one aggregator (i.e., there are no perfectly trusted aggregators.) ### 1.6. Attacker on the network @@ -233,6 +239,8 @@ These protocols typically work with data which is _secret shared_. For example, In terms of our threat model, these parties are aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. +Another security model for MPC is _honest but curious_, where the input data remains oblivious so long as aggregators do not deviate from the protocol. Since we are assuming that an attacker can control some subset of the aggregators, it would be able to deviate from the protocol, and thus the _honest but curious_ is not suitable. + Given these aggregators, a _client/user_ is able to generate a secret sharing of their input data, and then securely communicate one secret share to each aggregator. This allows the aggregators to then perform the MPC protocol to compute the predefined function. The _client/user_ trusts that an attacker cannot control enough aggregators to violate that protocol. If the protocol is faithfully executed, the aggregate result can then be reported to the intended recipient without the entities that performed the computation needing to witness the answer. From 9db8f3b83a8af4cf1f94ef87e756316fa724c452 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Thu, 20 Oct 2022 11:03:46 -0700 Subject: [PATCH 5/8] add more distinct parties --- threat-model/readme.md | 120 +++++++++++++++++++++++++++++++++++------ 1 file changed, 104 insertions(+), 16 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index 279c2a8..3b63f88 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -109,20 +109,29 @@ See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First- 2. The client should prevent the ability of a delegated party to overwrite/modify data provided by the first party or other delegated parties. If this is not possible, the client should allow the first party to control and limit a delegated party's ability in this manner. (See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First-Parties-Embedded-Parties-and-Delegated-Parties) for more details.) -### 1.4. Aggregator +### 1.4 Helper Parties and Helper Party Networks -An aggregator is an individual party which participates in an aggregation protocol. We assume that some (proposal specific) subset of the aggregators act honestly and do not collude with other aggregators or any other parties. Here we outline assets and capabilities assuming that an aggregator colludes up to, but not exceeding, that threshold. +Helper parties are a class of party (i.e., companies, organizations) who participate in a protocol in order to instantiate a private computation system. There are currently two types of helper parties proposed, _aggregators_ and _coordinators_. We outline the properties on each type directly (as opposed to on the class itself.) +We define a _helper party network_ as a collection of specific parties who are acting as helper parties. We assume that an attacker can control some (proposal specific) subset of the helper party network, and that the remaining helper parties act honestly (e.g., they do not collude with other helper parties or any other parties.) Here we outline assets and capabilities in the presence of this attacker. -#### 1.4.1. Assets +All parties in a helper party network should be known a priori and web platform vendors should be able to evaluate risk of an attacker that is more powerful than our assumption, e.g., the attacker is able to control more than the (protocol specific) subset of helper parties in the network. -1. Unencrypted individual share. + +### 1.5. Aggregator Helper Party + +An aggregator is type of helper party which participates in a helper party network to instantiate an MPC based private computation system. Aggregators receive secret shares of the input data and interact with the other aggregators in the helper party network to perform the aggregation. See [section 4.1 Multi-pary Computation](#41-Multi-party-Computation) for more details. + + +#### 1.5.1. Assets + +1. Unencrypted individual share of a secret share. 2. All the assets from first and third parties. 3. Shares of the output of the aggregation protocol. -4. Identity of other aggregators. +4. Identity of other aggregators in the helper party network. -#### 1.4.2. Capabilities +#### 1.5.2. Capabilities 1. Aggregators may defeat correctness by emitting bogus output shares. 2. Aggregators (who collude with a first or third party) may learn information about the clients/users which contribute to the protocol. @@ -130,26 +139,105 @@ An aggregator is an individual party which participates in an aggregation protoc 4. Aggregators can count the total number of inputs to the system, and, depending on the protocol, counts of shares at any point during the protocol. -#### 1.4.3. Mitigations +#### 1.5.3. Mitigations -1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy as long as the (proposal specific) subset of aggregators do not reveal their shares. +1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy in the presence of the attacker. 2. The aggregation protocol should provide robust analysis that it is in fact a differentially private function (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization).) -3. Bogus inputs can be generated that encode “null” or “noop” shares, designed to not affect the aggregation protocol, but can mask the total number of true inputs. +3. Bogus inputs can be generated that encode “null” or “noop” shares, designed mask the total number of true inputs without compromising correctness. + + +### 1.6 Coordinator Helper Party + +An coordinator is type of helper party which participates in a helper party network to instantiate an TEE based private computation system. Coordinators hold a partial decryption key for the input data, which is provided into a specific instance of a TEE if and only if an attestation verifies the TEE is running in the expect state. See [section 4.2 Trusted Execution Environments](#42-Trusted-Execution-Environments) + + +#### 1.6.1 Assets + +1. A partial decryption key used to encrypt the input data. +2. All the assets of from the first and third parties. +3. Identity of the other coordinators. + +#### 1.6.2 Capabilities + +1. Coordinators may compromise the availability of the system by refusing to validate the attestation and refusing to supply their key share. + + +#### 1.6.3 Mitigations + +1. Helper party networks with a larger number of coordinators could utilize a threshold key which requires less than all key shares to decrypt the data, preventing an individual coordinator from compromising availability. This comes at the cost of more complicated analysis about the ability of an attacker to control different subsets of that helper party network. + + +### 1.7. Helper party collusion + +If enough helper parties collude (beyond the proposal specific subset which an attacker is assumed to control), then none of the properties of the system hold. Such scenarios are outside the threat mode. + +However, we do assume that an attacker can always control at least one helper party (i.e., there are no perfectly trusted helper parties.) + + +### 1.8 Cloud Providers for Helper Parties + +Helper parties may run either on physical machines owned by directly by the aggregator or (more commonly) subcontract with a cloud provider. We assume that an attacker can control some subset of cloud providers. + + +#### 1.8.1 Assets + +1. All the assets of the helper party(ies) utilizing the cloud provider. +2. All the assets of from the first and third parties. +3. Identity of the helper parties. + +#### 1.8.2 Capabilities + +1. Cloud providers have all the capabilities of the helper parties utilizing that cloud provider. +2. If a sufficient number of helper parties utilize a common cloud provider, + a. for aggregator networks, it can reconstruct the client/user assets; + b. for coordinator networks, it can reconstruct the decryption key. + + +#### 1.8.3 Mitigations + +1. Helper parties networks should utilize sufficiently distinct cloud providers beyond the proposal specific subset which the attacker is assumed to control. + + +### 1.9 Operators of TEEs + +As a piece of hardware, TEEs will have an operator with access to the machine. Most commonly, this will be a cloud provider. Depending on the specific hardware, there may be known vulnerabilities in which an attacker who only controls the operator can violate the obliviousness of client/user data. These attacks are outside this threat model, but are likely to inform specific web platform decisions about which instantiations of private computation to support. + +#### 1.9.1 Assets +TODO + + +#### 1.9.2 Capabilities +TODO + + +#### 1.9.2 Mitigations +TODO + + +### 1.10 TEE Manufacturers + +TEEs can provide "attestation" which verifies that the TEE is running in the expected state and running the expected code. These attestations are typically produced with an asymmetric key, where the private key is physically embedded into the TEE, and the public key is published via a system similar to a certificate authority. + +#### 1.9.1 Assets + +1. Private keys embedded in TEEs and their corresponding public keys. + +#### 1.9.2 Capabilities +1. If an attacker controls both the cloud provider and the TEE manufacturer, decrypt all data within the TEE. -### 1.5. Aggregator collusion -If enough aggregators (beyond the proposal specific subset) collude e.g. by sharing unencrypted input shares), then none of the properties of the system hold. Such scenarios are outside the threat mode. +#### 1.9.2 Mitigations -However, we do assume that an attacker can always control at least one aggregator (i.e., there are no perfectly trusted aggregators.) +1. Pick a configuration of TEE manufacturer and cloud operator where it can be assumed that an attacker cannot control both. -### 1.6. Attacker on the network +### 1.11. Attacker on the network We assume the existence of attackers on the network links between various parties. -#### 1.6.1. Capabilities +#### 1.11.1. Capabilities 1. Observation of network traffic. Attackers may observe messages exchanged between parties exchanged at the IP layer. 1. Time of transmission by clients could reveal information about user activity. @@ -157,7 +245,7 @@ We assume the existence of attackers on the network links between various partie 3. Tampering with network traffic. Attackers may drop messages or inject new messages into communication between parties. -#### 1.6.2. Mitigations +#### 1.11.2. Mitigations 1. All messages exchanged between parties should be encrypted / use TLS / use HTTPS. 2. All messages between aggregators and to/from first/third parties should be mutually authenticated to prevent impersonation. @@ -247,7 +335,7 @@ If the protocol is faithfully executed, the aggregate result can then be reporte This threat model does not require that the MPC protocol provide safeguards against an aggregator spoiling the answers from the system. Though a spoiled answer is possible in many protocols, the primary threat that a compromised aggregator presents is to the privacy of _clients/users_. We assume that there is some contractual relationship between those requesting aggregation and the entities that perform that aggregation such that the aggregators lack significant incentive to spoil results. -### 4.2 Trusted Execution Environments (TEEs) +### 4.2 Trusted Execution Environments Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. A TEE can produce an "attestation" that acts as a claim - backed by the vendor of the TEE - that only specific code is actively running. From 1b7b1134c40750d0bf1fd52426bd5bc789d83790 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Thu, 20 Oct 2022 11:43:11 -0700 Subject: [PATCH 6/8] address feedback on TEE and layering sections --- threat-model/readme.md | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index 3b63f88..f063d2a 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -27,7 +27,7 @@ In the presence of this adversary, APIs should aim to achieve the following goal 1. **Privacy**: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.) 2. **Correctness:** Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known. -Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation option provides different options for defense against attacks. Web platform vendors can decide what configurations produce adequate safeguards for their APIs and users. This is explored further in [Section 4: Private Computation Configurations](#4-private-computation-configurations). +Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation option provides different options for defense against attacks. Web platform vendors can decide what configurations produce adequate safeguards for their APIs and users. This is explored further in [section 4. Private Computation Configurations](#4-private-computation-configurations). ## 1. Threat Model @@ -260,8 +260,6 @@ Two sites/apps are distinct first parties ([cross-site](https://tess.oconnor.cx/ This comes with certain considerations that need to be made, particularly because a first party may embed multiple delegated parties into their site/app. As such, proposed APIs must take into consideration interactions between delegated parties, such as (but not limited to): - - 1. The ability for one delegated party to corrupt the results of another delegated party. 2. The ability for one delegate party to disable functionality of another delegated party, such as exhausting various limits put in place. 3. The ability to delegate to multiple parties and bypass privacy budget restrictions @@ -276,8 +274,6 @@ Any output from the proposed APIs must avoid leaking information about any indiv A differentially private output limits the inference that can be made from two queries which differ by an individual input. For example, calculating typical aggregation functions like _sum_ and _mean_ across two sets which differ by a single element will tell you the exact value of that element (making those functions entirely non-differentially private.) Making a function differentially private typically requires two factors: - - 1. Sensitivity of the function (the amount the aggregate can be influenced by a single input) needs to be bounded, and 2. Random noise, proportional to the sensitivity needs to be added. @@ -325,7 +321,7 @@ Multi-party computation is a cryptographic protocol in which distinct parties ca These protocols typically work with data which is _secret shared_. For example, a three way _additive_ secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3 = v - s1 - s2. At this point, each value si individually appears random, and thus v remains oblivious as long as no single entity learns all values of si. A similar secret sharing schemes uses XOR in place of addition; alternatively, Shamir's secret sharing uses polynomial interpolation. -In terms of our threat model, these parties are aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. +In terms of our threat model, MPC uses a helper party network composed of aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. Another security model for MPC is _honest but curious_, where the input data remains oblivious so long as aggregators do not deviate from the protocol. Since we are assuming that an attacker can control some subset of the aggregators, it would be able to deviate from the protocol, and thus the _honest but curious_ is not suitable. @@ -343,18 +339,18 @@ Different hardware manufacturers offer different types of TEEs, and there are va In order to utilize a TEE to achieve private computation, we need two properties. First, the data must only be able to be decrypted within the TEE, and second, we need to assure that the TEE only executes code which evaluates the predefined function. -In a very simplified form, we could imagine there is one TEE which is used by every _first party_/_delegated party_ to run the predefined function. In this case, the _user/client_ could encrypt their data with the public key of the TEE. However, this ignores assuring the code running is that that's intended. The TEE produces an attestation of the code it's running, however the _client/user_ needs that attestation to be verified before their input data is decrypted. +The currently proposed system utilizes a helper party network composed of coordinators. This helper party network participate in a key exchange protocol to generate a _threshold key pair_, with a public key for encryption and several partial private keys such that each coordinator has a single partial private key, and all (or a predefined subset) are required for decryption. This allows the _client/user_ to encrypt the data towards the helper party network. -This can be achieved by utilizing an aggregator who verifies the TEE attestation on behalf of the _client/user_. In this case, the _client/user_ encrypts the data towards the aggregator, and the aggregator provides the decryption key into the TEE if and only if the attestation verifies the TEE is only running the expected code. This still has one problem, however, as we assume that our attacker can compromise at least one aggregator. +When a TEE is instantiated to perform an aggregation, each coordinator in the helper party network can validate an attestation from the TEE and provide their partial private key if and only if the attestation verifies that the TEE is only running the expected code. Thus, only the TEE is ever able to decrypt the data, so long as the attacker is only able to control a subset of coordinators in the helper party network. -We can extend this one step further, by utilizing not one, but multiple aggregators. Each aggregator can hold part of a _threshold key_, which allows data to be encrypted such that it requires all (or a predefined subset) of keys to decrypt. Each aggregator verifies the attestation and provides their decryption key into the TEE if and only if the attestation verifies the TEE is only running the expected code. Thus, only the TEE is ever able to decrypt the data, so long as the attacker is only able to control a subset of aggregators. +Note that in [section 1.7. Helper party collusion](#17-Helper-party-collusion), we assume that at least one helper party in a network can be controlled by the attacker, thus requiring at least two coordinators in the helper party network. -### 4.3 Abstracting Private Computation +### 4.3 Layering Private Computation -Both of these constructions rely on a set of aggregators, where we assume that an attacker is unable to control all aggregators. Let a specific set of aggregators be called an aggregator network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an aggregator network can be _trusted_. That is, that attackers would be unable to compromise any subset of the aggregators within that specific aggregator network such that the privacy guarantees of the system could not be met. +Both of these constructions rely on helper party networks, where we assume that an attacker is unable to control all parties within a given helper party network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an helper party network can be _trusted_. That is, that attackers would be unable to compromise a sufficient subset of the helper parties within that specific helper party network such that the privacy guarantees of the system could not be met. -To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the aggregators, diversity in cloud provider (if relevant) used by the aggregators, diversity in jurisdictions in which the aggregators operate, etc. +To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the helper parties, diversity in cloud provider (if relevant) used by the helper parties, diversity in jurisdictions in which the helper parties, cloud providers, and TEE operators operate, etc. The specific instantiation of private computation will also be a factor in this decision. -It's unlikely that all web vendors will arrive at the same decision as to which aggregator networks can be assumed to be trusted. Our aim, however, is for many such aggregator networks to exist that are trusted by most web platforms, allowing for choice and competition for the first/delegated parties that leverage these APIs. Moreover, as we may have web platform vendors providing different privacy budgets (see section [3.3 Privacy Parameters](#33-privacy-parameters),) the first/delegated parties may use multiple aggregator networks to utilize those different budgets. +Arbitrary helper party networks will not be able to automatically participate in these protocols, as web platform vendors will need to provide some form of authorization. As such, our goal is to reach consensus on a core set of methods and helper party networks so that first/delegated parties can get uniform service across web platform vendors. -Just as web vendors unlikely to fully agree on the sets of trusted aggregator network, they may also reach different conclusions as to which constructions of private computation can be utilized. If we consider an aggregator network to be the pairing of both the aggregator parties and the private computation construction, it's reasonably straight forward to for the standard to be unopinionated about the construction. Instead, individual web platform vendors can include the private computation construction when considering which aggregator networks are trusted. +Beyond this core set of methods and operators, we aim to make these APIs interoperable such that individual web platform vendors may offer increments over that, by authorizing a large set of helper party networks, certain instantiations of private computation, and extended privacy budgets. This would allow first/delegated parties to layer their usage of these APIs, utilizing that core set of methods and helper party networks for uniform service, and layering on incremental service where it's available. From 4ca88afe6006237d463d7991e77fedcb7d19d00b Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Thu, 20 Oct 2022 16:02:18 -0700 Subject: [PATCH 7/8] recover edits lost in merge --- threat-model/readme.md | 220 +++++++++++++++++++++++++++++++---------- 1 file changed, 168 insertions(+), 52 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index 6dfb994..f063d2a 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -10,16 +10,24 @@ This document is currently a **draft**, submitted to the PATCG (and eventually t In this document, we outline the security considerations for proposed purpose-constrained APIs for the web platform (that is, within browsers, mobileOSs, and other user-agents) specified by the Private Advertising Technologies Working Group (PATWG). -We assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some (proposal specific) subset of aggregators. +Many of these proposals attempt to leverage the concept of _private computation_ as a component of these purpose-constrained APIs. An ideal private computation system would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. Private computation can be used to perform aggregation over inputs which, individually, must not be revealed. -In the presence of this adversary, APIs should aim to achieve the following goals: +Private computation can be instantiated using several technologies: + +* Multi-party computation (MPC) distributes information between multiple independent entities using secret sharing. +* A trusted execution environment (TEE) isolates computation and its state by using specialized hardware. +* Fully homomorphic encryption (FHE) enables computation on the ciphertext of encrypted inputs. +Though the implementation details differ for each technology, ultimately they all rely on finding two entities - or _aggregators_ - that can be trusted not to conspire to reveal private inputs. The forms considered by existing attribution proposals are MPC and TEEs. +For our threat model, we assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some subset of aggregators, when used. + +In the presence of this adversary, APIs should aim to achieve the following goals: 1. **Privacy**: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.) 2. **Correctness:** Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known. -Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. +Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation option provides different options for defense against attacks. Web platform vendors can decide what configurations produce adequate safeguards for their APIs and users. This is explored further in [section 4. Private Computation Configurations](#4-private-computation-configurations). ## 1. Threat Model @@ -29,19 +37,14 @@ In this section, we enumerate the potential actors that may participate in a pro ### 1.1. Client/User - #### 1.1.1. Assets - - 1. Original inputs provided to client APIs. Clients expose these APIs to other actors below, which can modify the client’s assets, but should not reveal them. 2. Unencrypted input shares, for systems which rely on secret sharing among aggregators. #### 1.1.2. Capabilities - - 1. Individuals can reveal their own input and compromise their own privacy. 2. Clients (user agent software) can compromise privacy by leaking their assets (beyond specified aggregators, when used.) 3. Clients may affect correctness of a protocol by reporting false input. @@ -49,8 +52,6 @@ In this section, we enumerate the potential actors that may participate in a pro #### 1.1.3. Mitigations - - 1. The client/user may be able to provide certain forms of validation to mitigate either individual client’s or coalitions of clients' ability to compromise correctness. These include (but are not limited to): 1. Bound checking via zero-knowledge proof that an oblivious representation of a value lies within certain bounds. This would provide the client/user the ability to lie or misrepresent a value, but constrains that ability. 2. Anonymous assertion via tokens that provide some form of authorization that can be tied to a specific input. Such tokens may be provided by other parties, to assert the authenticity of a report, or reflect that a user has been recognized as an authenticated individual user in another context. These tokens must be anonymous, meaning that it cannot be attributed to a specific user (e.g. by using a cryptographic technique like blind signatures.) @@ -58,11 +59,8 @@ In this section, we enumerate the potential actors that may participate in a pro ### 1.2. First Party Site/App - #### 1.2.1. Assets - - 1. Unencrypted inputs into the proposed API. 2. First party context (e.g. first party cookies / information in first party storage.) 3. User-agent provided information (e.g. user agent string, device information, user configurations.) @@ -72,16 +70,12 @@ In this section, we enumerate the potential actors that may participate in a pro #### 1.2.2. Capabilities - - 1. First parties can modify client assets through the proposed APIs. 2. First parties can choose which encrypted outputs from the proposed API are provided to aggregators. #### 1.2.3. Mitigations - - 1. Modification of client assets should be limited by the API interface to only allow for intended modifications. 2. Use of differential privacy (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization)) should be used to prevent @@ -99,46 +93,45 @@ See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First- #### 1.3.1. Assets - - 1. All the same assets as First Parties ([section 1.2.1](#121-Assets)) 1. Exception that 1.2.1.2 would be partitioned delegated party context. #### 1.3.2. Capabilities - - 1. The same capabilities as First Parties ([section 1.2.2](#122-Capabilities)) 2. Delegated parties can modify client assets through the proposed APIs, which may include overwriting/modifying data provided by the first party or other delegated parties. #### 1.3.3. Mitigations - - 1. The same mitigations as Third Parties ([section 1.2.3](#123-Mitigations)) 2. The client should prevent the ability of a delegated party to overwrite/modify data provided by the first party or other delegated parties. If this is not possible, the client should allow the first party to control and limit a delegated party's ability in this manner. (See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First-Parties-Embedded-Parties-and-Delegated-Parties) for more details.) -### 1.4. Aggregator +### 1.4 Helper Parties and Helper Party Networks -An aggregator is an individual party which participates in an aggregation protocol. We assume that some (proposal specific) subset of the aggregators act honestly and do not collude with other aggregators or any other parties. Here we outline assets and capabilities assuming that an aggregator colludes up to, but not exceeding, that threshold. +Helper parties are a class of party (i.e., companies, organizations) who participate in a protocol in order to instantiate a private computation system. There are currently two types of helper parties proposed, _aggregators_ and _coordinators_. We outline the properties on each type directly (as opposed to on the class itself.) +We define a _helper party network_ as a collection of specific parties who are acting as helper parties. We assume that an attacker can control some (proposal specific) subset of the helper party network, and that the remaining helper parties act honestly (e.g., they do not collude with other helper parties or any other parties.) Here we outline assets and capabilities in the presence of this attacker. -#### 1.4.1. Assets +All parties in a helper party network should be known a priori and web platform vendors should be able to evaluate risk of an attacker that is more powerful than our assumption, e.g., the attacker is able to control more than the (protocol specific) subset of helper parties in the network. +### 1.5. Aggregator Helper Party + +An aggregator is type of helper party which participates in a helper party network to instantiate an MPC based private computation system. Aggregators receive secret shares of the input data and interact with the other aggregators in the helper party network to perform the aggregation. See [section 4.1 Multi-pary Computation](#41-Multi-party-Computation) for more details. -1. Unencrypted individual share. -2. All the assets from first and third parties. -3. Shares of the output of the aggregation protocol. -4. Identity of other aggregators. +#### 1.5.1. Assets -#### 1.4.2. Capabilities +1. Unencrypted individual share of a secret share. +2. All the assets from first and third parties. +3. Shares of the output of the aggregation protocol. +4. Identity of other aggregators in the helper party network. +#### 1.5.2. Capabilities 1. Aggregators may defeat correctness by emitting bogus output shares. 2. Aggregators (who collude with a first or third party) may learn information about the clients/users which contribute to the protocol. @@ -146,28 +139,105 @@ An aggregator is an individual party which participates in an aggregation protoc 4. Aggregators can count the total number of inputs to the system, and, depending on the protocol, counts of shares at any point during the protocol. -#### 1.4.3. Mitigations +#### 1.5.3. Mitigations +1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy in the presence of the attacker. +2. The aggregation protocol should provide robust analysis that it is in fact a differentially private function (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization).) +3. Bogus inputs can be generated that encode “null” or “noop” shares, designed mask the total number of true inputs without compromising correctness. -1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy as long as the (proposal specific) subset of aggregators do not reveal their shares. -2. The aggregation protocol should provide robust analysis that it is in fact a differentially private function (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization).) -3. Bogus inputs can be generated that encode “null” or “noop” shares, designed to not affect the aggregation protocol, but can mask the total number of true inputs. +### 1.6 Coordinator Helper Party +An coordinator is type of helper party which participates in a helper party network to instantiate an TEE based private computation system. Coordinators hold a partial decryption key for the input data, which is provided into a specific instance of a TEE if and only if an attestation verifies the TEE is running in the expect state. See [section 4.2 Trusted Execution Environments](#42-Trusted-Execution-Environments) -### 1.5. Aggregator collusion -If enough aggregators (beyond the proposal specific subset) collude e.g. by sharing unencrypted input shares), then none of the properties of the system hold. Such scenarios are outside the threat mode. +#### 1.6.1 Assets +1. A partial decryption key used to encrypt the input data. +2. All the assets of from the first and third parties. +3. Identity of the other coordinators. -### 1.6. Attacker on the network +#### 1.6.2 Capabilities -We assume the existence of attackers on the network links between various parties. +1. Coordinators may compromise the availability of the system by refusing to validate the attestation and refusing to supply their key share. + + +#### 1.6.3 Mitigations + +1. Helper party networks with a larger number of coordinators could utilize a threshold key which requires less than all key shares to decrypt the data, preventing an individual coordinator from compromising availability. This comes at the cost of more complicated analysis about the ability of an attacker to control different subsets of that helper party network. + + +### 1.7. Helper party collusion + +If enough helper parties collude (beyond the proposal specific subset which an attacker is assumed to control), then none of the properties of the system hold. Such scenarios are outside the threat mode. + +However, we do assume that an attacker can always control at least one helper party (i.e., there are no perfectly trusted helper parties.) + + +### 1.8 Cloud Providers for Helper Parties + +Helper parties may run either on physical machines owned by directly by the aggregator or (more commonly) subcontract with a cloud provider. We assume that an attacker can control some subset of cloud providers. + + +#### 1.8.1 Assets + +1. All the assets of the helper party(ies) utilizing the cloud provider. +2. All the assets of from the first and third parties. +3. Identity of the helper parties. + +#### 1.8.2 Capabilities + +1. Cloud providers have all the capabilities of the helper parties utilizing that cloud provider. +2. If a sufficient number of helper parties utilize a common cloud provider, + a. for aggregator networks, it can reconstruct the client/user assets; + b. for coordinator networks, it can reconstruct the decryption key. + + +#### 1.8.3 Mitigations + +1. Helper parties networks should utilize sufficiently distinct cloud providers beyond the proposal specific subset which the attacker is assumed to control. + + +### 1.9 Operators of TEEs + +As a piece of hardware, TEEs will have an operator with access to the machine. Most commonly, this will be a cloud provider. Depending on the specific hardware, there may be known vulnerabilities in which an attacker who only controls the operator can violate the obliviousness of client/user data. These attacks are outside this threat model, but are likely to inform specific web platform decisions about which instantiations of private computation to support. + +#### 1.9.1 Assets +TODO -#### 1.6.1. Capabilities +#### 1.9.2 Capabilities +TODO +#### 1.9.2 Mitigations +TODO + + +### 1.10 TEE Manufacturers + +TEEs can provide "attestation" which verifies that the TEE is running in the expected state and running the expected code. These attestations are typically produced with an asymmetric key, where the private key is physically embedded into the TEE, and the public key is published via a system similar to a certificate authority. + +#### 1.9.1 Assets + +1. Private keys embedded in TEEs and their corresponding public keys. + +#### 1.9.2 Capabilities + +1. If an attacker controls both the cloud provider and the TEE manufacturer, decrypt all data within the TEE. + + +#### 1.9.2 Mitigations + +1. Pick a configuration of TEE manufacturer and cloud operator where it can be assumed that an attacker cannot control both. + + +### 1.11. Attacker on the network + +We assume the existence of attackers on the network links between various parties. + + +#### 1.11.1. Capabilities 1. Observation of network traffic. Attackers may observe messages exchanged between parties exchanged at the IP layer. 1. Time of transmission by clients could reveal information about user activity. @@ -175,9 +245,7 @@ We assume the existence of attackers on the network links between various partie 3. Tampering with network traffic. Attackers may drop messages or inject new messages into communication between parties. -#### 1.6.2. Mitigations - - +#### 1.11.2. Mitigations 1. All messages exchanged between parties should be encrypted / use TLS / use HTTPS. 2. All messages between aggregators and to/from first/third parties should be mutually authenticated to prevent impersonation. @@ -192,8 +260,6 @@ Two sites/apps are distinct first parties ([cross-site](https://tess.oconnor.cx/ This comes with certain considerations that need to be made, particularly because a first party may embed multiple delegated parties into their site/app. As such, proposed APIs must take into consideration interactions between delegated parties, such as (but not limited to): - - 1. The ability for one delegated party to corrupt the results of another delegated party. 2. The ability for one delegate party to disable functionality of another delegated party, such as exhausting various limits put in place. 3. The ability to delegate to multiple parties and bypass privacy budget restrictions @@ -208,8 +274,6 @@ Any output from the proposed APIs must avoid leaking information about any indiv A differentially private output limits the inference that can be made from two queries which differ by an individual input. For example, calculating typical aggregation functions like _sum_ and _mean_ across two sets which differ by a single element will tell you the exact value of that element (making those functions entirely non-differentially private.) Making a function differentially private typically requires two factors: - - 1. Sensitivity of the function (the amount the aggregate can be influenced by a single input) needs to be bounded, and 2. Random noise, proportional to the sensitivity needs to be added. @@ -230,11 +294,63 @@ When managing a privacy budget, that budget is assigned to some party who can us ### 3.3 Privacy Parameters -This results in four parameters which need to reach consensus in the PATWG: - - +This results in four parameters: -1. _Aggregation minimum threshold_: the number of clients/users that need to be included in a given aggregation. +1. _Unit of privacy_: The set of parties used to manage the privacy budget. 2. _Epoch_: The amount of time over which a differential privacy budget should be managed. 3. ε: The differential privacy parameter which measures the amount of individual differential information leakage allowed in each epoch. -1. _Unit of privacy_: The set of parties used to manage the privacy budget. +4. _Aggregation minimum threshold_: the number of clients/users that need to be included in a given aggregation. + +We can divide these parameters into two groups: parameters which should be part of the standard (and thus require consensus) and parameters which can be configured by the web platform vendor. The first two parameters, _unit of privacy_, and _epoch_ should likely be part of the standard. However, the other two parameters, ε and _aggregation minimum threshold_ could differ across web platform vendors without compromising interoperability. + +Let's first address, ε. Suppose that we have _Browser A_ and _Mobile OS B_ which, respectively, decide that the appropriate budget is "1 unit/epoch" and "5 unit/epoch". Luckily, in the worst case, privacy budgets are additive, and thus can be split into smaller pieces. A user of this API could then decide to use "1 unit/epoch" on the API from both _Browser A_ and _Mobile OS B_, and then continue to use the remaining "4 unit/epoch" on the API from _Mobile OS B_. + +This might be achieved by using a system that distributes information from both Browser A and Mobile OS B to a private computation entity that is configured to consume 1 unit per epoch. Information from Mobile OS B is additionally distributed to another entity that is configured to consume the remaining 4 units. As a practical matter, the two logical private computation entities here could be operated by the same organizations, but they would use different configurations. For instance, each configuration might use different keying material. In this case, Browser A is able to limit information release to 1 unit per epoch by choosing where information can be sent. + +Secondly, let's address _aggregation minimum threshold_. Suppose that the same _Browser A_ and _Mobile OS B_ have also, respectively, decided that the _aggregation minimum threshold_ should be "X people" and "2X people". Because this is a minimum, when using the API from both, the minimum can be set to 2X, and satisfy both constraints. However, when using data that comes from _Browser A_ only, the minimum could be reduced to X. + +## 4. Private Computation Configurations + +Many of these proposals aim to leverage the idea of _private computation_, touched on briefly in the introduction. In it's ideal form, a private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. + +There are currently two proposed constructions of a private computation environment under consideration: _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs). The following two subsections outline how these can be to create a private computation environment, how they fit into the threat model, and the assumptions required to achieve our goal of assuring that the inputs are not revealed. + +### 4.1 Multi-party Computation + +Multi-party computation is a cryptographic protocol in which distinct parties can collectively operate on data which remains oblivious to any individual party throughout the computation, but allows for joint evaluation of a predefined function. + +These protocols typically work with data which is _secret shared_. For example, a three way _additive_ secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3 = v - s1 - s2. At this point, each value si individually appears random, and thus v remains oblivious as long as no single entity learns all values of si. A similar secret sharing schemes uses XOR in place of addition; alternatively, Shamir's secret sharing uses polynomial interpolation. + +In terms of our threat model, MPC uses a helper party network composed of aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. + +Another security model for MPC is _honest but curious_, where the input data remains oblivious so long as aggregators do not deviate from the protocol. Since we are assuming that an attacker can control some subset of the aggregators, it would be able to deviate from the protocol, and thus the _honest but curious_ is not suitable. + +Given these aggregators, a _client/user_ is able to generate a secret sharing of their input data, and then securely communicate one secret share to each aggregator. This allows the aggregators to then perform the MPC protocol to compute the predefined function. The _client/user_ trusts that an attacker cannot control enough aggregators to violate that protocol. + +If the protocol is faithfully executed, the aggregate result can then be reported to the intended recipient without the entities that performed the computation needing to witness the answer. + +This threat model does not require that the MPC protocol provide safeguards against an aggregator spoiling the answers from the system. Though a spoiled answer is possible in many protocols, the primary threat that a compromised aggregator presents is to the privacy of _clients/users_. We assume that there is some contractual relationship between those requesting aggregation and the entities that perform that aggregation such that the aggregators lack significant incentive to spoil results. + +### 4.2 Trusted Execution Environments + +Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. A TEE can produce an "attestation" that acts as a claim - backed by the vendor of the TEE - that only specific code is actively running. + +Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, and while we don't expect all web platform vendors to support this construction, some vendors have expressed comfort with the required assumptions. + +In order to utilize a TEE to achieve private computation, we need two properties. First, the data must only be able to be decrypted within the TEE, and second, we need to assure that the TEE only executes code which evaluates the predefined function. + +The currently proposed system utilizes a helper party network composed of coordinators. This helper party network participate in a key exchange protocol to generate a _threshold key pair_, with a public key for encryption and several partial private keys such that each coordinator has a single partial private key, and all (or a predefined subset) are required for decryption. This allows the _client/user_ to encrypt the data towards the helper party network. + +When a TEE is instantiated to perform an aggregation, each coordinator in the helper party network can validate an attestation from the TEE and provide their partial private key if and only if the attestation verifies that the TEE is only running the expected code. Thus, only the TEE is ever able to decrypt the data, so long as the attacker is only able to control a subset of coordinators in the helper party network. + +Note that in [section 1.7. Helper party collusion](#17-Helper-party-collusion), we assume that at least one helper party in a network can be controlled by the attacker, thus requiring at least two coordinators in the helper party network. + +### 4.3 Layering Private Computation + +Both of these constructions rely on helper party networks, where we assume that an attacker is unable to control all parties within a given helper party network. Each web platform vendor will need to make a judgment as to if it's reasonable to assume that an helper party network can be _trusted_. That is, that attackers would be unable to compromise a sufficient subset of the helper parties within that specific helper party network such that the privacy guarantees of the system could not be met. + +To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the helper parties, diversity in cloud provider (if relevant) used by the helper parties, diversity in jurisdictions in which the helper parties, cloud providers, and TEE operators operate, etc. The specific instantiation of private computation will also be a factor in this decision. + +Arbitrary helper party networks will not be able to automatically participate in these protocols, as web platform vendors will need to provide some form of authorization. As such, our goal is to reach consensus on a core set of methods and helper party networks so that first/delegated parties can get uniform service across web platform vendors. + +Beyond this core set of methods and operators, we aim to make these APIs interoperable such that individual web platform vendors may offer increments over that, by authorizing a large set of helper party networks, certain instantiations of private computation, and extended privacy budgets. This would allow first/delegated parties to layer their usage of these APIs, utilizing that core set of methods and helper party networks for uniform service, and layering on incremental service where it's available. From 48b57a30e415ced4a927bc95662c3dccbaf18676 Mon Sep 17 00:00:00 2001 From: Erik Taubeneck Date: Fri, 21 Oct 2022 12:18:38 -0700 Subject: [PATCH 8/8] Apply suggestions from code review Adding suggestions from @martinthomson and @csharrison. Thanks both! Co-authored-by: Charlie Harrison Co-authored-by: Martin Thomson --- threat-model/readme.md | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/threat-model/readme.md b/threat-model/readme.md index f063d2a..588439a 100644 --- a/threat-model/readme.md +++ b/threat-model/readme.md @@ -18,7 +18,7 @@ Private computation can be instantiated using several technologies: * A trusted execution environment (TEE) isolates computation and its state by using specialized hardware. * Fully homomorphic encryption (FHE) enables computation on the ciphertext of encrypted inputs. -Though the implementation details differ for each technology, ultimately they all rely on finding two entities - or _aggregators_ - that can be trusted not to conspire to reveal private inputs. The forms considered by existing attribution proposals are MPC and TEEs. +Though the implementation details differ for each technology, ultimately they all rely on finding at least two entities - or _aggregators_ - that can be trusted not to conspire to reveal private inputs. The forms considered by existing attribution proposals are MPC and TEEs. For our threat model, we assume that an active attacker can control the network and has the ability to corrupt any number of clients, the parties who call the proposed APIs, and some subset of aggregators, when used. @@ -27,7 +27,7 @@ In the presence of this adversary, APIs should aim to achieve the following goal 1. **Privacy**: Clients (and, more specifically, the vendors who distribute the clients) trust that (within the threat models), the API is purpose constrained. That is, all parties learn nothing beyond the intended result (e.g., a differentially private aggregation function computed over the client inputs.) 2. **Correctness:** Parties receiving the intended result trust that the protocol is executed correctly. Moreover, the amount that a result can be skewed by malicious input is bounded and known. -Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation option provides different options for defense against attacks. Web platform vendors can decide what configurations produce adequate safeguards for their APIs and users. This is explored further in [section 4. Private Computation Configurations](#4-private-computation-configurations). +Specific proposed purpose constrained APIs will provide their own analysis about how they achieve these properties. This threat model does not address aspects that are specific to specific private computation designs or configurations. Each private computation instantiation provides different options for defense against attacks. Web platform vendors can decide which configurations produce adequate safeguards for their APIs and users. This is explored further in [section 4. Private Computation Configurations](#4-private-computation-configurations). ## 1. Threat Model @@ -113,7 +113,7 @@ See [section 2. First Parties, Embedded Parties and Delegated Parties](#2-First- Helper parties are a class of party (i.e., companies, organizations) who participate in a protocol in order to instantiate a private computation system. There are currently two types of helper parties proposed, _aggregators_ and _coordinators_. We outline the properties on each type directly (as opposed to on the class itself.) -We define a _helper party network_ as a collection of specific parties who are acting as helper parties. We assume that an attacker can control some (proposal specific) subset of the helper party network, and that the remaining helper parties act honestly (e.g., they do not collude with other helper parties or any other parties.) Here we outline assets and capabilities in the presence of this attacker. +We define a _helper party network_ as a group of helper parties. We assume that an attacker can control some (proposal specific) subset of the helper parties that participate in helper party network, but that the remaining helper parties act honestly (e.g., they do not collude with other helper parties or any other parties.) Here we outline assets and capabilities in the presence of this attacker. All parties in a helper party network should be known a priori and web platform vendors should be able to evaluate risk of an attacker that is more powerful than our assumption, e.g., the attacker is able to control more than the (protocol specific) subset of helper parties in the network. @@ -143,7 +143,7 @@ An aggregator is type of helper party which participates in a helper party netwo 1. The secret sharing scheme used to provide inputs to the aggregators must ensure privacy in the presence of the attacker. 2. The aggregation protocol should provide robust analysis that it is in fact a differentially private function (see [section 3. Aggregation and Anonymization](#3-Aggregation-and-Anonymization).) -3. Bogus inputs can be generated that encode “null” or “noop” shares, designed mask the total number of true inputs without compromising correctness. +3. Bogus inputs can be generated that encode “null” or “noop” shares, which could be designed to mask the total number of true inputs without compromising correctness. ### 1.6 Coordinator Helper Party @@ -171,7 +171,7 @@ An coordinator is type of helper party which participates in a helper party netw If enough helper parties collude (beyond the proposal specific subset which an attacker is assumed to control), then none of the properties of the system hold. Such scenarios are outside the threat mode. -However, we do assume that an attacker can always control at least one helper party (i.e., there are no perfectly trusted helper parties.) +However, we do assume that an attacker can always control at least one helper party. That is, there can be no perfectly trusted helper parties. ### 1.8 Cloud Providers for Helper Parties @@ -181,7 +181,7 @@ Helper parties may run either on physical machines owned by directly by the aggr #### 1.8.1 Assets -1. All the assets of the helper party(ies) utilizing the cloud provider. +1. All the assets of the helper party (or parties) using the cloud provider. 2. All the assets of from the first and third parties. 3. Identity of the helper parties. @@ -305,13 +305,13 @@ We can divide these parameters into two groups: parameters which should be part Let's first address, ε. Suppose that we have _Browser A_ and _Mobile OS B_ which, respectively, decide that the appropriate budget is "1 unit/epoch" and "5 unit/epoch". Luckily, in the worst case, privacy budgets are additive, and thus can be split into smaller pieces. A user of this API could then decide to use "1 unit/epoch" on the API from both _Browser A_ and _Mobile OS B_, and then continue to use the remaining "4 unit/epoch" on the API from _Mobile OS B_. -This might be achieved by using a system that distributes information from both Browser A and Mobile OS B to a private computation entity that is configured to consume 1 unit per epoch. Information from Mobile OS B is additionally distributed to another entity that is configured to consume the remaining 4 units. As a practical matter, the two logical private computation entities here could be operated by the same organizations, but they would use different configurations. For instance, each configuration might use different keying material. In this case, Browser A is able to limit information release to 1 unit per epoch by choosing where information can be sent. +This might be achieved by using a system that distributes information from both Browser A and Mobile OS B to a private computation entity that is configured to consume 1 unit per epoch. Information from Mobile OS B is additionally distributed to another entity that is configured to consume the remaining 4 units. As a practical matter, the two logical private computation entities here could be operated by the same organizations, but they would use different configurations. These configurations could be chosen via browser choice of keying material, or directly specified in individual user input. Secondly, let's address _aggregation minimum threshold_. Suppose that the same _Browser A_ and _Mobile OS B_ have also, respectively, decided that the _aggregation minimum threshold_ should be "X people" and "2X people". Because this is a minimum, when using the API from both, the minimum can be set to 2X, and satisfy both constraints. However, when using data that comes from _Browser A_ only, the minimum could be reduced to X. ## 4. Private Computation Configurations -Many of these proposals aim to leverage the idea of _private computation_, touched on briefly in the introduction. In it's ideal form, a private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. This is commonly used to perform aggregation over inputs which, individually, must not be revealed. It is also often used to then apply differentially private noise to those aggregates. +Many of these proposals aim to leverage the idea of _private computation_, touched on briefly in the introduction. In it's ideal form, a private computation environment would allow for the evaluation of a predefined function (i.e., the constrained purpose,) without revealing any new information to any party beyond the output of that predefined function. This is commonly used to run a privacy mechanism across user input, which individually must not be revealed. There are currently two proposed constructions of a private computation environment under consideration: _multi-party computation_ (MPC) and _trusted execution environments_ (TEEs). The following two subsections outline how these can be to create a private computation environment, how they fit into the threat model, and the assumptions required to achieve our goal of assuring that the inputs are not revealed. @@ -319,9 +319,9 @@ There are currently two proposed constructions of a private computation environm Multi-party computation is a cryptographic protocol in which distinct parties can collectively operate on data which remains oblivious to any individual party throughout the computation, but allows for joint evaluation of a predefined function. -These protocols typically work with data which is _secret shared_. For example, a three way _additive_ secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3 = v - s1 - s2. At this point, each value si individually appears random, and thus v remains oblivious as long as no single entity learns all values of si. A similar secret sharing schemes uses XOR in place of addition; alternatively, Shamir's secret sharing uses polynomial interpolation. +These protocols typically work with data which is _secret shared_. For example, a three way _additive_ secret share of a value v = s1 + s2 + s3 can be constructed by generating two random values for s1 and s2, and then computing s3 = v - s1 - s2. At this point, each value si individually appears random, and thus v remains oblivious as long as no single entity learns all values of si. A similar secret sharing schemes uses XOR in place of addition; alternatively, [Shamir's secret sharing](https://web.mit.edu/6.857/OldStuff/Fall03/ref/Shamir-HowToShareASecret.pdf) uses polynomial interpolation. -In terms of our threat model, MPC uses a helper party network composed of aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a maliciously secure, honest two out of three majority MPC. +In terms of our threat model, MPC uses a helper party network composed of aggregators and we assume that an attacker can control some subset of those aggregators. That exact threshold may be different for a given proposal, for example, we may assume that an attacker can only control one out of three aggregators. This would enable, in cryptographic terms, a _maliciously secure_, honest two out of three majority MPC, where the input data remains oblivious even in the fact of an attacker who controls one of the three aggregators and deviates from the protocol. The protocol would always detect this attacker, and typically aborts the computation. Another security model for MPC is _honest but curious_, where the input data remains oblivious so long as aggregators do not deviate from the protocol. Since we are assuming that an attacker can control some subset of the aggregators, it would be able to deviate from the protocol, and thus the _honest but curious_ is not suitable. @@ -329,19 +329,19 @@ Given these aggregators, a _client/user_ is able to generate a secret sharing of If the protocol is faithfully executed, the aggregate result can then be reported to the intended recipient without the entities that performed the computation needing to witness the answer. -This threat model does not require that the MPC protocol provide safeguards against an aggregator spoiling the answers from the system. Though a spoiled answer is possible in many protocols, the primary threat that a compromised aggregator presents is to the privacy of _clients/users_. We assume that there is some contractual relationship between those requesting aggregation and the entities that perform that aggregation such that the aggregators lack significant incentive to spoil results. +This threat model does not require that the MPC protocol provide safeguards against an aggregator spoiling the answers from the system. Though a spoiled answer is possible in many protocols, the primary threat that a compromised aggregator presents is to the privacy of _clients/users_. We assume that there is some relationship between those requesting aggregation and the entities that perform that aggregation such that the aggregators lack significant incentive to spoil results. ### 4.2 Trusted Execution Environments -Trusted execution environments are specialized hardware where encrypted data can be sent "in" to an enclave where it is decrypted and operated on, but cannot be otherwise accessed. A TEE can produce an "attestation" that acts as a claim - backed by the vendor of the TEE - that only specific code is actively running. +Trusted execution environments are specialized hardware where encrypted data can be sent "in" to a computing environment where it is decrypted and operated on, but cannot be otherwise accessed. A TEE can produce an "attestation" that acts as a claim - backed by the vendor of the TEE - that only specific code is actively running. -Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, and while we don't expect all web platform vendors to support this construction, some vendors have expressed comfort with the required assumptions. +Different hardware manufacturers offer different types of TEEs, and there are various assumptions which need to be made about the manufacturer, hardware operator, and other tenants on the hardware in order to achieve our ideal of input data remaining oblivious. There are documented attacks for many of these types of hardware, so different vendors will need to make their own choices about which instantiations (if any) to trust. In order to utilize a TEE to achieve private computation, we need two properties. First, the data must only be able to be decrypted within the TEE, and second, we need to assure that the TEE only executes code which evaluates the predefined function. The currently proposed system utilizes a helper party network composed of coordinators. This helper party network participate in a key exchange protocol to generate a _threshold key pair_, with a public key for encryption and several partial private keys such that each coordinator has a single partial private key, and all (or a predefined subset) are required for decryption. This allows the _client/user_ to encrypt the data towards the helper party network. -When a TEE is instantiated to perform an aggregation, each coordinator in the helper party network can validate an attestation from the TEE and provide their partial private key if and only if the attestation verifies that the TEE is only running the expected code. Thus, only the TEE is ever able to decrypt the data, so long as the attacker is only able to control a subset of coordinators in the helper party network. +When a TEE is instantiated to perform an aggregation, each coordinator in the helper party network can validate an attestation from the TEE and provide their partial private key if and only if the attestation verifies that the TEE is only running the expected code. Thus, decrypting data outside the TEE is possible only if the attacker is able to control a proposal specific subset of coordinators in the helper party network. Note that in [section 1.7. Helper party collusion](#17-Helper-party-collusion), we assume that at least one helper party in a network can be controlled by the attacker, thus requiring at least two coordinators in the helper party network. @@ -351,6 +351,6 @@ Both of these constructions rely on helper party networks, where we assume that To make this judgment, web platform vendors are likely to consider various properties about helper party networks such as diversity in ownership of the company/organization operating the helper parties, diversity in cloud provider (if relevant) used by the helper parties, diversity in jurisdictions in which the helper parties, cloud providers, and TEE operators operate, etc. The specific instantiation of private computation will also be a factor in this decision. -Arbitrary helper party networks will not be able to automatically participate in these protocols, as web platform vendors will need to provide some form of authorization. As such, our goal is to reach consensus on a core set of methods and helper party networks so that first/delegated parties can get uniform service across web platform vendors. +Arbitrary helper party networks will not be able to automatically participate in these protocols, as web platform vendors will need to provide some form of authorization. As such, our goal is to reach consensus on a core set of methods and helper party networks so that first/delegated parties can get uniform service across web platform vendors. This will in turn allow sites/apps to access a basic set of standardized capabilities across all participating platforms. -Beyond this core set of methods and operators, we aim to make these APIs interoperable such that individual web platform vendors may offer increments over that, by authorizing a large set of helper party networks, certain instantiations of private computation, and extended privacy budgets. This would allow first/delegated parties to layer their usage of these APIs, utilizing that core set of methods and helper party networks for uniform service, and layering on incremental service where it's available. +Beyond this core set of methods and operators, we aim to make these APIs in a way that allows individual web platform vendors to offer increments over that. This might be by authorizing a large set of helper party networks, alternative instantiations of private computation, or by extending privacy budgets. This would allow first/delegated parties to layer their usage of these APIs, utilizing that core set of methods and helper party networks for uniform service, and layering on incremental service where it's available.