-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPA Biweekly Meeting Agenda Request - Additional Sharding Designs #75
Comments
@marianapr and @schoppmp would it work for you to present on the sharding designs for IPA that you've been looking into in next week's PATCG/IPA call (next Tuesday 3pm PT)? Look forward to hearing what you've been looking into and discussing. |
I added an agenda request to present our protocol at the next PATCG meeting: patcg/meetings#125 |
Some quick responses to these:
The model is 3PC with honest majority and malicious parties. We wrote our protocol assuming that inputs (matchkeys and payloads) are encrypted by the user (not the report collector). An alternative way that still works and probably fits IPA's query model better is to assume an MPC functionality
Yes, that's the idea.
The partitioning is shuffling-based and therefore not stable. However, the second variant of our protocol can compute an OPRF with large, sparse output domain, which would allow sorting by matchkey in the clear. The MPC in each partition would then only need to sort by timestamp, which may be more efficient than sorting by the matchkey in MPC.
Shard sizes are made private by inserting dummies, either to all buckets (small OPRF domain), or as in this paper (large OPRF domain). Dummies would have to be filtered out in MPC in each partition. |
Yes, that could be done, but I would be concerned about the cost of doing an HE encryption side the MPC. Encryption by the client might be better but we'd need to see how much more expensive that would be. I think overall the next step should be to sketch out some performance estimates for these different approaches.
I think my concern here is that this paper assumes a sensitivity bound of 1, where as in IPA we may be able to have a reasonable upper bound on the sensitivity but it will be a lot larger than 1. For making shard sizes differentially private, I don't think a larger bound on the sensitivity leads to too much extra dummy records added (as in #35), but if we want to keep the granularity at the matchkey level I'm concerned that the amount of noise needed would be very large to get a decent DP bound. |
I think in the paper we have estimates with epsilon 5 as well. We can get estimates with other epsilons as well (I will be surprised if the number of dummies will be very large). I think the sharing of the payload can be done after the shuffle. |
The IPA team has done some internal brainstorming on this and how we might get to a hard cap on the sensitivity for how many times a matchkey occurs. We have another agenda item for the first 45min tomorrow, but we can use the last 15min to give an update in the call if @schoppmp @marianapr or other interested folks are there. |
Agenda+: What do you want to discuss?
@marianapr and @schoppmp have been thinking about additional ideas for sharding that could benefit IPA scalability. We would like to let them present and get feedback on this direction in a future PATCG/IPA call.
To prepone a few questions about their design
Time
We can probably spend most of a meeting on this
Issue Link
Other Links
A couple other issues related to sharding for IPA are:
#35
#49
The text was updated successfully, but these errors were encountered: