Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid padding #1381

Merged
merged 6 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 17 additions & 4 deletions ipa-core/src/protocol/hybrid/mod.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
pub(crate) mod step;

use step::HybridStep as Step;

use crate::{
error::Error,
ff::{
Expand All @@ -9,12 +11,14 @@ use crate::{
helpers::query::DpMechanism,
protocol::{
context::{ShardedContext, UpgradableContext},
ipa_prf::{oprf_padding::PaddingParameters, shuffle::Shuffle},
ipa_prf::{
oprf_padding::{apply_dp_padding, PaddingParameters},
shuffle::Shuffle,
},
},
report::hybrid::IndistinguishableHybridReport,
secret_sharing::replicated::semi_honest::AdditiveShare as Replicated,
};

// In theory, we could support (runtime-configured breakdown count) ≤ (compile-time breakdown count)
// ≤ 2^|bk|, with all three values distinct, but at present, there is no runtime configuration and
// the latter two must be equal. The implementation of `move_single_value_to_bucket` does support a
Expand Down Expand Up @@ -61,10 +65,10 @@ impl BreakdownKey<256> for BA8 {}
/// # Panics
/// Propagates errors from config issues or while running the protocol
pub async fn hybrid_protocol<'ctx, C, BK, V, HV, const SS_BITS: usize, const B: usize>(
_ctx: C,
ctx: C,
input_rows: Vec<IndistinguishableHybridReport<BK, V>>,
_dp_params: DpMechanism,
_dp_padding_params: PaddingParameters,
dp_padding_params: PaddingParameters,
) -> Result<Vec<Replicated<HV>>, Error>
where
C: UpgradableContext + 'ctx + Shuffle + ShardedContext,
Expand All @@ -75,5 +79,14 @@ where
if input_rows.is_empty() {
return Ok(vec![Replicated::ZERO; B]);
}

// Apply DP padding for OPRF
let _padded_input_rows = apply_dp_padding::<_, IndistinguishableHybridReport<BK, V>, B>(
ctx.narrow(&Step::PaddingDp),
input_rows,
&dp_padding_params,
)
.await?;

unimplemented!("protocol::hybrid::hybrid_protocol is not fully implemented")
}
2 changes: 2 additions & 0 deletions ipa-core/src/protocol/hybrid/step.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,6 @@ use ipa_step_derive::CompactStep;
#[derive(CompactStep)]
pub(crate) enum HybridStep {
ReshardByTag,
#[step(child = crate::protocol::ipa_prf::oprf_padding::step::PaddingDpStep, name="padding_dp")]
PaddingDp,
}
164 changes: 164 additions & 0 deletions ipa-core/src/protocol/ipa_prf/oprf_padding/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ pub(crate) mod distributions;
pub mod insecure;
pub mod step;

use std::iter::{repeat, repeat_with};

#[cfg(any(test, feature = "test-fixture", feature = "cli"))]
pub use insecure::DiscreteDp as InsecureDiscreteDp;
use rand::Rng;
Expand All @@ -28,6 +30,7 @@ use crate::{
},
RecordId,
},
report::hybrid::IndistinguishableHybridReport,
secret_sharing::{
replicated::{semi_honest::AdditiveShare, ReplicatedSecretSharing},
SharedValue,
Expand Down Expand Up @@ -130,6 +133,64 @@ pub trait Paddable {
Self: Sized;
}

impl<BK, V> Paddable for IndistinguishableHybridReport<BK, V>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This trait could be made better...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you take a look and see if this impl is cleaner?

where
BK: BooleanArray + U128Conversions,
V: BooleanArray,
{
fn add_padding_items<VC: Extend<Self>, const B: usize>(
eriktaubeneck marked this conversation as resolved.
Show resolved Hide resolved
direction_to_excluded_helper: Direction,
padding_input_rows: &mut VC,
padding_params: &PaddingParameters,
rng: &mut InstrumentedSequentialSharedRandomness,
) -> Result<u32, Error> {
let mut total_number_of_fake_rows = 0;
match padding_params.oprf_padding {
OPRFPadding::NoOPRFPadding => {}
OPRFPadding::Parameters {
oprf_epsilon,
oprf_delta,
matchkey_cardinality_cap,
oprf_padding_sensitivity,
} => {
let oprf_padding =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still plan to use OPRFPaddingDp or it will be replicated as well for hybrid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets somewhat confusing, because we use the term "OPRF" to name both the entire ipa v2 protocol as well as the pseudo random value that the match key is converted into.

In this instance, we are padding values so that the process of converting match keys into OPRF values is differentially private (same as previously), as opposed to padding values for aggregation or breakdown key reveal. In that sense, I think this naming still makes sense, and actually "HybridPadding" would be less clear (at least once the old protocol is purged.)

Happy for input on this - but that was my thinking. I can add a comment here to make that more clear as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this more closely, this is actually just poor naming of the existing OPRFPaddingDp struct, which itself isn't tied specifically to the OPRF, and is used for aggregation padding as well. I'll open a new PR to make that naming more generic. It still makes sense to me to use the OPRFPadding struct here since we're still using it to provide DP for the match_key to OPRF conversion.

OPRFPaddingDp::new(oprf_epsilon, oprf_delta, oprf_padding_sensitivity)?;
for cardinality in 1..=matchkey_cardinality_cap {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to get this all into a single padding_input_rows.extend but I ran into a conflict between needing to use move so that cardinality didn't outlive the closure and a mutable rng not working with the move.

let sample = oprf_padding.sample(rng);
total_number_of_fake_rows += sample * cardinality;

padding_input_rows.extend(
repeat_with(|| {
let dummy_mk: BA64 = rng.gen();
repeat(IndistinguishableHybridReport::from(
AdditiveShare::new_excluding_direction(
dummy_mk,
direction_to_excluded_helper,
),
))
.take(cardinality as usize)
})
// this means there will be `sample` many unique
// matchkeys to add each with cardinality = `cardinality`
.take(sample as usize)
.flatten(),
);
}
}
}
Ok(total_number_of_fake_rows)
}

fn add_zero_shares<VC: Extend<Self>>(
padding_input_rows: &mut VC,
total_number_of_fake_rows: u32,
) {
padding_input_rows.extend(
repeat(IndistinguishableHybridReport::ZERO).take(total_number_of_fake_rows as usize),
);
}
}

impl<BK, TV, TS> Paddable for OPRFIPAInputRow<BK, TV, TS>
where
BK: BooleanArray + U128Conversions,
Expand Down Expand Up @@ -426,6 +487,7 @@ mod tests {
},
RecordId,
},
report::hybrid::IndistinguishableHybridReport,
secret_sharing::replicated::semi_honest::AdditiveShare,
test_fixture::{Reconstruct, Runner, TestWorld},
};
Expand All @@ -451,6 +513,31 @@ mod tests {
Ok(input)
}

pub async fn set_up_apply_dp_padding_pass_for_indistinguishable_reports<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatting on this function is crazy. Is it because the name of the function is kinda long? Does it make sense to move this to a separate module? Maybe that way you can reduce the size of the function name since it will have a more specific context in which is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the standard formatter. I don't think it makes sense to move to a separate module, all the padding functions live in this module.

C,
BK,
V,
const B: usize,
>(
ctx: C,
padding_params: PaddingParameters,
) -> Result<Vec<IndistinguishableHybridReport<BK, V>>, Error>
where
C: Context,
BK: BooleanArray + U128Conversions,
V: BooleanArray,
{
let mut input: Vec<IndistinguishableHybridReport<BK, V>> = Vec::new();
input = apply_dp_padding_pass::<C, IndistinguishableHybridReport<BK, V>, B>(
ctx,
input,
Role::H3,
&padding_params,
)
.await?;
Ok(input)
}

#[tokio::test]
pub async fn oprf_noise_in_dp_padding_pass() {
type BK = BA8;
Expand Down Expand Up @@ -525,6 +612,83 @@ mod tests {
}
}

#[tokio::test]
pub async fn indistinguishable_report_noise_in_dp_padding_pass() {
// Note: This is a close copy of the test `oprf_noise_in_dp_padding_pass`
// Which will make this easier to delete the former test
// when we remove the oprf protocol.
type BK = BA8;
type V = BA3;
const B: usize = 256;
let world = TestWorld::default();
let oprf_epsilon = 1.0;
let oprf_delta = 1e-6;
let matchkey_cardinality_cap = 10;
let oprf_padding_sensitivity = 2;

let result = world
.semi_honest((), |ctx, ()| async move {
let padding_params = PaddingParameters {
oprf_padding: OPRFPadding::Parameters {
oprf_epsilon,
oprf_delta,
matchkey_cardinality_cap,
oprf_padding_sensitivity,
},
aggregation_padding: AggregationPadding::NoAggPadding,
};
set_up_apply_dp_padding_pass_for_indistinguishable_reports::<_, BK, V, B>(
ctx,
padding_params,
)
.await
})
.await
.map(Result::unwrap);
// check that all three helpers added the same number of dummy shares
assert!(result[0].len() == result[1].len() && result[0].len() == result[2].len());

let result_reconstructed = result.reconstruct();
// check that all fields besides the matchkey are zero and matchkey is not zero
let mut match_key_counts: HashMap<u64, u32> = HashMap::new();
for row in result_reconstructed {
assert!(row.value == 0);
assert!(row.breakdown_key == 0); // since we set AggregationPadding::NoAggPadding
assert!(row.match_key != 0);

let count = match_key_counts.entry(row.match_key).or_insert(0);
*count += 1;
}
// Now look at now many times a match_key occured
let mut sample_per_cardinality: BTreeMap<u32, u32> = BTreeMap::new();
for cardinality in match_key_counts.values() {
let count = sample_per_cardinality.entry(*cardinality).or_insert(0);
*count += 1;
}
let mut distribution_of_samples: BTreeMap<u32, u32> = BTreeMap::new();

for (cardinality, sample) in sample_per_cardinality {
println!("{sample} user IDs occurred {cardinality} time(s)");
let count = distribution_of_samples.entry(sample).or_insert(0);
*count += 1;
}

let oprf_padding =
OPRFPaddingDp::new(oprf_epsilon, oprf_delta, oprf_padding_sensitivity).unwrap();

let (mean, std_bound) = oprf_padding.mean_and_std_bound();
let tolerance_bound = 12.0;
assert!(std_bound > 1.0); // bound on the std only holds if this is true.
println!("mean = {mean}, std_bound = {std_bound}");
for (sample, count) in &distribution_of_samples {
println!("An OPRFPadding sample value equal to {sample} occurred {count} time(s)",);
assert!(
(f64::from(*sample) - mean).abs() < tolerance_bound * std_bound,
"aggregation noise sample was not within {tolerance_bound} times the standard deviation bound from what was expected."
);
}
}

pub async fn set_up_apply_dp_padding_pass_for_agg<C, BK, TV, const B: usize>(
ctx: C,
padding_params: PaddingParameters,
Expand Down
32 changes: 29 additions & 3 deletions ipa-core/src/report/hybrid.rs
Original file line number Diff line number Diff line change
Expand Up @@ -353,9 +353,35 @@ where
BK: SharedValue,
V: SharedValue,
{
match_key: Replicated<BA64>,
value: Replicated<V>,
breakdown_key: Replicated<BK>,
pub match_key: Replicated<BA64>,
pub value: Replicated<V>,
pub breakdown_key: Replicated<BK>,
}

impl<BK, V> IndistinguishableHybridReport<BK, V>
where
BK: SharedValue,
V: SharedValue,
{
pub const ZERO: Self = Self {
match_key: Replicated::<BA64>::ZERO,
value: Replicated::<V>::ZERO,
breakdown_key: Replicated::<BK>::ZERO,
};
}

impl<BK, V> From<Replicated<BA64>> for IndistinguishableHybridReport<BK, V>
where
BK: SharedValue,
V: SharedValue,
{
fn from(match_key: Replicated<BA64>) -> Self {
Self {
match_key,
value: Replicated::<V>::ZERO,
breakdown_key: Replicated::<BK>::ZERO,
}
}
}

impl<BK, V> From<HybridReport<BK, V>> for IndistinguishableHybridReport<BK, V>
Expand Down
8 changes: 8 additions & 0 deletions ipa-core/src/secret_sharing/replicated/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,20 @@ pub mod malicious;
pub mod semi_honest;

use super::{SecretSharing, SharedValue};
use crate::helpers::Direction;

pub trait ReplicatedSecretSharing<V: SharedValue>: SecretSharing<V> {
fn new(a: V, b: V) -> Self;
fn left(&self) -> V;
fn right(&self) -> V;

fn new_excluding_direction(v: V, direction: Direction) -> Self {
match direction {
Direction::Left => Self::new(V::ZERO, v),
Direction::Right => Self::new(v, V::ZERO),
}
}

fn map<F: Fn(V) -> T, R: ReplicatedSecretSharing<T>, T: SharedValue>(&self, f: F) -> R {
R::new(f(self.left()), f(self.right()))
}
Expand Down
41 changes: 41 additions & 0 deletions ipa-core/src/test_fixture/hybrid.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,52 @@
use std::collections::{HashMap, HashSet};

use crate::{
ff::{boolean_array::BooleanArray, U128Conversions},
report::hybrid::IndistinguishableHybridReport,
secret_sharing::{replicated::semi_honest::AdditiveShare as Replicated, IntoShares},
test_fixture::sharing::Reconstruct,
};

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq)]
pub enum TestHybridRecord {
TestImpression { match_key: u64, breakdown_key: u32 },
TestConversion { match_key: u64, value: u32 },
}

#[derive(PartialEq, Eq)]
pub struct TestIndistinguishableHybridReport {
pub match_key: u64,
pub value: u32,
pub breakdown_key: u32,
}

impl<BK, V> Reconstruct<TestIndistinguishableHybridReport>
for [&IndistinguishableHybridReport<BK, V>; 3]
where
BK: BooleanArray + U128Conversions + IntoShares<Replicated<BK>>,
V: BooleanArray + U128Conversions + IntoShares<Replicated<V>>,
{
fn reconstruct(&self) -> TestIndistinguishableHybridReport {
let [s0, s1, s2] = self;
eriktaubeneck marked this conversation as resolved.
Show resolved Hide resolved

let match_key = [&s0.match_key, &s1.match_key, &s2.match_key]
.reconstruct()
.as_u128();

let breakdown_key = [&s0.breakdown_key, &s1.breakdown_key, &s2.breakdown_key]
.reconstruct()
.as_u128();

let value = [&s0.value, &s1.value, &s2.value].reconstruct().as_u128();

TestIndistinguishableHybridReport {
match_key: match_key.try_into().unwrap(),
breakdown_key: breakdown_key.try_into().unwrap(),
value: value.try_into().unwrap(),
}
}
}

struct HashmapEntry {
breakdown_key: u32,
total_value: u32,
Expand Down