-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ACP-118 Aggregator #3394
base: master
Are you sure you want to change the base?
Conversation
2e59a5d
to
e7648e5
Compare
368ad1a
to
2a47036
Compare
19d8f83
to
3af3bc9
Compare
6e6e88f
to
cf4ecba
Compare
b4d7b35
to
60761f8
Compare
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
|
||
// NewClientWithPeers generates a client to communicate to a set of peers | ||
func NewClientWithPeers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this a separate PR if requested - but I've gotten feedback in the past about PRs with test utilities where it might be hard to understand why it's needed without corresponding usage.
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
for nodeID := range nodeIDs { | ||
network, ok := peerNetworks[nodeID] | ||
if !ok { | ||
return fmt.Errorf("%s is not connected", nodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason for the big testing diff... the test utility now enforces that you're sending requests to a node registered in the peer map. We could also just drop the requests instead of erroring as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this drop the requests since normally an error from the sender would be treated as a fatal error?
Signed-off-by: Joshua Kim <[email protected]>
sampleable = append(sampleable, v.NodeID) | ||
} | ||
|
||
signatures := make([]*bls.Signature, 0, len(sampleable)+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see we do +1
here to account for the original signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can leave a comment on this
network/p2p/acp118/aggregator.go
Outdated
if err := s.client.AppRequest(ctx, set.Of(nodeIDCopy), requestBytes, job.HandleResponse); err != nil { | ||
results <- result{Validator: nodeIDsToValidator[nodeIDCopy], Err: err} | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove goroutines since the p2p client is non-blocking (added ref from another PR today: ava-labs/hypersdk#1801 (comment) ) ?
} | ||
|
||
failedStakeWeight := uint64(0) | ||
minThreshold := (totalStakeWeight * quorumNum) / quorumDen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this match https://github.com/ava-labs/avalanchego/blob/master/vms/platformvm/warp/signature.go#L150 exactly?
&warp.BitSetSignature{Signature: [bls.SignatureLen]byte{}}, | ||
) | ||
require.NoError(err) | ||
gotMsg, gotNum, gotDen, err := aggregator.AggregateSignatures( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should these be aggregatedSignatureWeight
and totalWeight
rather than num and den which suggests numerator/denominator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah these names make a lot more sense
wantNum := uint64(0) | ||
for _, i := range tt.wantSigners { | ||
require.True(bitSet.Contains(i)) | ||
wantNum += tt.validators[i].Weight | ||
} | ||
|
||
wantDen := uint64(0) | ||
for _, v := range tt.validators { | ||
wantDen += v.Weight | ||
} | ||
|
||
require.Equal(wantNum, gotNum) | ||
require.Equal(wantDen, gotDen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same naming comment on numerator/denominator here
{ | ||
name: "aggregates from all validators 1/1", | ||
peers: map[ids.NodeID]p2p.Handler{ | ||
nodeID0: NewHandler(&testVerifier{}, signer0), | ||
}, | ||
ctx: context.Background(), | ||
validators: []Validator{ | ||
{ | ||
NodeID: nodeID0, | ||
PublicKey: pk0, | ||
Weight: 1, | ||
}, | ||
}, | ||
wantSigners: []int{0}, | ||
quorumNum: 1, | ||
quorumDen: 1, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super easy to read these test cases ❤️
quorumDen: 1, | ||
}, | ||
{ | ||
name: "aggregates from some validators - 1/3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make these names a little more descriptive to the edge case that they're testing?
ex.
name: "aggregates from some validators - 1/3", | |
name: "aggregates from min threshold - 1/3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through the rest, this is probably fine as is, since there's already a convention to these names for success/failure, could just be more explicit in the success case
quorumDen: 3, | ||
}, | ||
{ | ||
name: "aggregates from some validators - 2/3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the intended difference for this test case? Just > 1 success rather than > 1 failure but still meeting minimum threshold?
It seems each success test case is meeting the exact required threshold. This is very well tested as is, but could also add cases for reaching greater than minimum threshold.
for nodeID := range nodeIDs { | ||
network, ok := peerNetworks[nodeID] | ||
if !ok { | ||
return fmt.Errorf("%s is not connected", nodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this drop the requests since normally an error from the sender would be treated as a fatal error?
aggregatedStakeWeight := uint64(0) | ||
totalStakeWeight := uint64(0) | ||
for i, v := range validators { | ||
totalStakeWeight += v.Weight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably protect against overflow here. (It is possible for this to overflow with a real subnet)
} | ||
|
||
failedStakeWeight := uint64(0) | ||
minThreshold := (totalStakeWeight * quorumNum) / quorumDen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totalStakeWeight * quorumNum
can overflow
// Fast-fail if it's not possible to generate a signature that meets the | ||
// minimum threshold | ||
failedStakeWeight += result.Validator.Weight | ||
if totalStakeWeight-failedStakeWeight < minThreshold { | ||
return nil, 0, 0, ErrFailedAggregation | ||
} | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with hypersdk's expected usage here? I thought that num/dem
were going to be the maximum weights it would wait for, but that the minimum would be lower than that (meaning that if we are passing in the max here, we could be terminating when we realize we can't get the maximum... But we actually could have gotten the number that hypersdk wanted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it makes more sense for us to change the behavior so that this api blocks until all responses come back, or we reach the provided num/den
threshold instead of failing instead.
if !bls.Verify(validator.PublicKey, signature, r.message.UnsignedMessage.Bytes()) { | ||
r.results <- result{Validator: validator, Err: errFailedVerification} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
// AggregateSignatures blocks until quorumNum/quorumDen signatures from | ||
// validators are requested to be aggregated into a warp message or the context | ||
// is canceled. Returns the signed message and the amount of stake that signed | ||
// the message. Caller is responsible for providing a well-formed canonical | ||
// validator set corresponding to the signer bitset in the message. | ||
func (s *SignatureAggregator) AggregateSignatures( | ||
ctx context.Context, | ||
message *warp.Message, | ||
justification []byte, | ||
validators []Validator, | ||
quorumNum uint64, | ||
quorumDen uint64, | ||
) (*warp.Message, uint64, uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this correctly handles the case that BLS public keys are shared across validators.
In Warp, only 1 signature is ever allowed from a BLS key in a warp message. If different nodeIDs have the same BLS key, their weights are aggregated for the BLS key's index
Co-authored-by: Stephen Buttolph <[email protected]> Signed-off-by: Joshua Kim <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]> Signed-off-by: Joshua Kim <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]> Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Why this should be merged
Implements
p2p
client + server logic for signature request handling as described inacp-118
(ref).How this works
Client:
How this was tested