-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TestWorld for sharded environments #982
TestWorld for sharded environments #982
Conversation
This change introduces the ability to run very simple circuits on multiple shards in parallel. It is not possible to communicate between shards yet, but it is possible to use the same test infrastructure to create multiple shards and use PRSS inside them as well as provide the input for each shard and consume their output. The next and hopefully final change will bring the ability to communicate across shards.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #982 +/- ##
==========================================
- Coverage 89.43% 89.17% -0.26%
==========================================
Files 163 167 +4
Lines 21897 22805 +908
==========================================
+ Hits 19584 20337 +753
- Misses 2313 2468 +155 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be necessary to provide type safety for shard index vs. helper identity in the transport layer, since we have the context layer above it to provide safer APIs to protocols.
Do you imagine it will be possible to cast a sharded context into a non sharded context? Because of the potential of shard locality to leak information, I feel like protocols written against a sharded context are more dangerous, and call for more scrutiny, than protocols written against a non-sharded context.
/// Helper trait to bind in-memory request handlers to transport identity. | ||
pub trait IdentityHandlerExt: TransportIdentity { | ||
type Handler: RequestHandler<Self>; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these new traits really need to be different for in-memory vs. network transport?
One strategy to simplify might be to move TransportCallbacks
up a level. Instead of having five special purpose Box<dyn>
handlers in TransportCallbacks
, use Box<dyn Handler>
(or Box<dyn TransportMessageHandler>
for a more specific name) as the type of the callbacks: I::Handler
argument.
The relationship amongst these traits also seems complicated to me. It might be possible to define something like:
trait Transport {
type Identity;
type Handler;
}
impl Transport for ShardTransport {
type Identity = ShardIndex;
type Handler = ...;
}
impl Transport for MpcTransport {
type Identity = HelperIdentity;
type Handler = HelperRequestHandler;
}
(This doesn't by itself do anything about the duplication between net and memory transports, however.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of moving generic parameter to associated type. I also agree that this stuff is complicated, don't see a clear way how to make it easier to follow.
The main idea here was to keep listen
method the same for both shard and MPC in-memory transports. This method handles all incoming requests, but for shard traffic only Addr::Records
request is valid, everything else only applies to MPC-MPC or client-MPC.
The handler stuff does not apply to net transports as callbacks are handled at the higher level. So adding Handler
associated type to Transport
may make HttpTransport
implementation look clunky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping that using dynamic dispatch might make it easier to make pieces of this the same across transport types (helper, and shard, and ideally HTTP as well).
Maybe something like fn listen
takes Option<&dyn Handler>
as the callbacks argument? (The HTTP transport doesn't have fn listen
, but it does use TransportCallbacks
.) The shard transport can omit the handler entirely (because it only needs to handle records, at least for now), and for all transport types they no longer need to enumerate the management request types ("prepare query", "receive query", etc.), which feels like something that transports don't need to know about, and that shouldn't need to behave differently across the different transport types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed IdentityHandlerExt
in favour of ListenerSetup
- makes it a little less confusing imo.
I am not sure I fully understand your proposal, will sync up with you offline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at ListenerSetup
and I like it 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed it offline and agreed that TransportCallbacks
needs to go away. A better interface for handlers would be similar to this one.
It is still worth to encapsulate the delivery method inside Transport
interface, so messages delivered via HTTP or potentially CF workers channel have the same structure when handled by application logic. We may have a good abstraction for it already - Addr
struct used inside in-memory network.
So our handler interface may look like this
trait Handler<I: TransportIdentity> {
fn handle(&self, _: &Addr<I>) -> Result<Response>
}
HTTP transport can parse HTTP requests into Addr
struct and application will have a unified logic to process requests based on Addr:RouteId
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is actually blocking now because prepare_inputs
handler will require all transports to properly initialize the gateway. Naive approach would make MPC transport (that handles callbacks) to depend on Shard transport implementation (just to pass it to Gateway) which would make things way more complicated.
I am thinking that I would implement something like this
flowchart LR
HelperApp --> Transports
HelperApp --> QueryProcessor
HelperApp --> Handler
Handler --> Transports
Handler --> QueryProcessor
ipa-core/src/sharding.rs
Outdated
|
||
/// A unique zero-based index of the helper shard. | ||
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct ShardIndex(u32); | ||
|
||
#[derive(Debug, Copy, Clone)] | ||
pub struct Shard { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might call this Sharded
? I like how SomeType<Sharded>
reads better than SomeType<Shard>
. Right now it's purely shard identity information, but Context<Sharded>
feels like a better home for a get-channel-to-shard method than Context<Shard>
.
(NoSharding
could be renamed to NotSharded
for symmetry, but that's a super nit.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind renaming it, We already have Sharded
struct inside test infrastructure, but it may not be a big deal to have two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too annoying to have two, renamed Sharded
struct defined inside TestWorld
to WithShards
inner: Inner<'a>, | ||
gate: Gate, | ||
total_records: TotalRecords, | ||
sharding: B, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make more sense as part of Gateway
, to avoid frequent cloning of static information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was imagining Gateway
to be the same for sharded vs non-sharded world because that struct is complicated enough with the stall detection wrapper around it. It would just reference both transports and provide means to open shard-to-shard and mpc channels.
Depending on the context used, shard-to-shard channels will or will not be be accessible by MPC circuits.
I agree that cloning this stuff often may be less ideal - one way to mitigate that is to move this to Inner
struct - the downside will be that this will have to be done for all context types separately. I am going to add a comment that we may need to do it later (shouldn't be too hard)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I am wondering why the shard identity should be returned out of the context, whereas the role is returned by asking the gateway:
fn role(&self) -> Role {
self.inner.gateway.role()
}
Putting stuff in Inner
doesn't change when it gets cloned -- I'm not sure why Base
and Inner
are two different things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested that this also exist on the gateway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point now. Yea, I think I can just return it from shard transport and when creating a gateway for non-sharded execution provide a panicking shard transport to get a runtime error. Maybe parametrizing Gateway can turn this into a compile error, but for some reason I seem to prefer not to do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started changing Gateway in the next PR, in this change I won't be able to get it, so I'll follow up on it
I tried to re-use the existing functionality that we implemented for in-memory infrastructure and it seems that the only difference between MPC and shard channels is the identities used to authenticate peers, hence the change at the
I agree, that's why I spent quite a few hours modifying the |
I was thinking of something like the |
ipa-core/src/test_fixture/world.rs
Outdated
.iter() | ||
.collect::<Vec<_>>() | ||
.try_into() | ||
.unwrap_or_else(|_| unreachable!()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that just unwrap()
will achieve nearly the same thing. Is this due to a lack of a Debug impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using .ok().unwrap()
in this situation. (Not worth an edit just to change that, more of a side note.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok().unwrap()
is strictly better, worth changing it
replace it with `ListenerSetup` trait that is hopefully less confusing to use
I am not sure if we want to make that distinction. ShardedContext should be a fully-functional semi-honest/malicious context that allows to run MPC with the same shards on other helpers. I am not sure I see a reason why we want to hide the fact that it is sharded from attribution and intra-shard aggregation steps (note that OPRF resharding also needs sharded context) It feels a bit error prone to change the context types during the execution at least for now when we have the same shard instance running the whole OPRF IPA |
This change introduces the ability to run very simple circuits on multiple shards in parallel. It is not possible to communicate between shards yet, but it is possible to use the same test infrastructure to create multiple shards and use PRSS inside them as well as provide the input for each shard and consume their output.
The next and hopefully final change will bring the ability to communicate across shards.