Tolerate empty vectors of shuffle intermediates #1383

andyleiserson · 2024-10-29T17:16:52Z

This is more important for the sharded shuffle, which for small inputs is reasonably likely to produce an empty output on some shard. It adds a new compute_non_empty_hash function with the existing behavior of rejecting empty input, and changes the compute_hash function to accept an empty input. Then, it changes all of the existing calls to compute_hash, except the one in shuffle::malicious, to call compute_non_empty_hash instead. (I did not analyze the existing uses for others that may be able to tolerate an empty input.)

This is more important for the sharded shuffle, which for small inputs is reasonably likely to produce an empty output on some shard.

codecov · 2024-10-29T18:13:06Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.71%. Comparing base (4fd5e2b) to head (e9d8562).
Report is 24 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1383      +/-   ##
==========================================
+ Coverage   93.58%   93.71%   +0.12%     
==========================================
  Files         223      223              
  Lines       37165    37611     +446     
==========================================
+ Hits        34781    35246     +465     
+ Misses       2384     2365      -19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

akoshelev · 2024-10-30T02:44:41Z

ipa-core/src/helpers/hashing.rs

+///
+/// ## Panics
+/// Panics when Iterator is empty.
+pub fn compute_non_empty_hash<I, T, S>(input: I) -> Hash


Is it better to return Option<Hash> here and let callers explicitly decide what to do when the input is empty? I feel like treating 0 as a valid hash is dangerous if you accidentally call compute_hash because after that all bets are off

How often do you need an empty hash? Or could those potentially-empty cases be amended with once(foo).chain(input) at the caller?

I think we just saw our first case where some shards after shuffle ended up with 0 shares. To me, the fact that hash was never computed, feels like something callers need to be aware of.

Option<Hash> would mean adding an expect to most calls, and would also mean doing something like unwrap_or(SHA256_HASH_OF_EMPTY_MESSAGE) for the empty-ok case, since we need to send something to the other helper to signify "I have no data".

I'd rather expose an extra function than prepend a dummy value to force a message to be non-empty, which seems like a kludge.

it just feels unsafe to allow someone to ignore the fact that hash wasn't computed at all. Yes it introduces clutter at the callsite, but it is a good thing for this use case imo. Slows down impatient writer and makes them think what exactly they want to do.

I like Martin's suggestion to flip the naming - it somewhat mitigates the issue, although not entirely imo - someone can still just copy-paste the code w/o thinking. Those things are hard to catch in review

martinthomson · 2024-10-30T04:34:49Z

ipa-core/src/helpers/hashing.rs

@@ -74,9 +70,37 @@ where
        sha.update(&buf);


Is there a risk that this is a bit churn-y. Is there value in writing multiple items before updating the hash? Or are we OK with our vectorization code.

My guess is that this can optimize reasonably well. I'm inclined to leave it unless/until it shows up in profiling.

martinthomson · 2024-10-30T04:35:39Z

ipa-core/src/helpers/hashing.rs

+///
+/// ## Panics
+/// Panics when Iterator is empty.
+pub fn compute_non_empty_hash<I, T, S>(input: I) -> Hash


How often do you need an empty hash? Or could those potentially-empty cases be amended with once(foo).chain(input) at the caller?

martinthomson · 2024-10-30T04:36:14Z

ipa-core/src/helpers/hashing.rs

+}
+
+/// Computes Hash of serializable values from an iterator
+pub fn compute_hash<I, T, S>(input: I) -> Hash


I'd flip the naming, so that the (maybe) empty hash requires the special name.

andyleiserson · 2024-10-30T18:22:04Z

Another question is whether empty vs. not is even the right property to be looking at, or if we should be looking at the amount of entropy in the input (i.e. at least 128 bits) instead. On the other hand, I don't know if I necessarily think these properties should be enforced in this routine at all. The callers know more about the input and can check a more precise set of properties. Looking at the places compute_hash gets used, I see three categories: (1) malicious shuffle verification (this PR); (2) input is an array; (3) in validate_three_two_way_sharing_of_zero, which is unused.

andyleiserson added 2 commits October 29, 2024 10:11

Tolerate empty vectors of shuffle intermediates

fc37a43

This is more important for the sharded shuffle, which for small inputs is reasonably likely to produce an empty output on some shard.

Fix test

c764425

andyleiserson force-pushed the empty-hash branch from f104326 to c764425 Compare October 29, 2024 17:45

akoshelev reviewed Oct 30, 2024

View reviewed changes

martinthomson reviewed Oct 30, 2024

View reviewed changes

Swap the names of the compute_hash variants

e9d8562

akoshelev approved these changes Oct 30, 2024

View reviewed changes

andyleiserson merged commit 6d29275 into private-attribution:main Oct 30, 2024
12 checks passed

andyleiserson deleted the empty-hash branch October 30, 2024 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tolerate empty vectors of shuffle intermediates #1383

Tolerate empty vectors of shuffle intermediates #1383

andyleiserson commented Oct 29, 2024

codecov bot commented Oct 29, 2024 •

edited

Loading

akoshelev Oct 30, 2024

martinthomson Oct 30, 2024

akoshelev Oct 30, 2024

andyleiserson Oct 30, 2024

akoshelev Oct 30, 2024

martinthomson Oct 30, 2024

andyleiserson Oct 30, 2024

martinthomson Oct 30, 2024

martinthomson Oct 30, 2024

andyleiserson commented Oct 30, 2024

Tolerate empty vectors of shuffle intermediates #1383

Tolerate empty vectors of shuffle intermediates #1383

Conversation

andyleiserson commented Oct 29, 2024

codecov bot commented Oct 29, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyleiserson commented Oct 30, 2024

codecov bot commented Oct 29, 2024 •

edited

Loading