Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ACP-118 Aggregator #3394

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open

Implement ACP-118 Aggregator #3394

wants to merge 39 commits into from

Conversation

joshua-kim
Copy link
Contributor

@joshua-kim joshua-kim commented Sep 17, 2024

Why this should be merged

Implements p2p client + server logic for signature request handling as described in acp-118 (ref).

How this works

Client:

  • Introduces an aggregator type which will make signature requests in batches block until a caller-provided threshold of stake signs the provided warp message
  • Introduces a server handler that servers aggregation requests

How this was tested

  • Added unit tests

@joshua-kim joshua-kim changed the base branch from master to p2p-sync September 17, 2024 16:08
@joshua-kim joshua-kim self-assigned this Sep 17, 2024
Base automatically changed from p2p-sync to master September 25, 2024 16:58
@joshua-kim joshua-kim force-pushed the acp-118 branch 2 times, most recently from 6e6e88f to cf4ecba Compare October 1, 2024 18:19
@joshua-kim joshua-kim changed the title Implement ACP-118 Package Implement ACP-118 Aggregator Oct 22, 2024
@joshua-kim joshua-kim force-pushed the acp-118 branch 2 times, most recently from b4d7b35 to 60761f8 Compare October 22, 2024 20:14
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
@joshua-kim joshua-kim marked this pull request as ready for review November 12, 2024 21:48
network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>

// NewClientWithPeers generates a client to communicate to a set of peers
func NewClientWithPeers(
Copy link
Contributor Author

@joshua-kim joshua-kim Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this a separate PR if requested - but I've gotten feedback in the past about PRs with test utilities where it might be hard to understand why it's needed without corresponding usage.

Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
for nodeID := range nodeIDs {
network, ok := peerNetworks[nodeID]
if !ok {
return fmt.Errorf("%s is not connected", nodeID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason for the big testing diff... the test utility now enforces that you're sending requests to a node registered in the peer map. We could also just drop the requests instead of erroring as an alternative.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this drop the requests since normally an error from the sender would be treated as a fatal error?

Signed-off-by: Joshua Kim <[email protected]>
sampleable = append(sampleable, v.NodeID)
}

signatures := make([]*bls.Signature, 0, len(sampleable)+1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see we do +1 here to account for the original signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can leave a comment on this

Comment on lines 140 to 142
if err := s.client.AppRequest(ctx, set.Of(nodeIDCopy), requestBytes, job.HandleResponse); err != nil {
results <- result{Validator: nodeIDsToValidator[nodeIDCopy], Err: err}
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove goroutines since the p2p client is non-blocking (added ref from another PR today: ava-labs/hypersdk#1801 (comment) ) ?

}

failedStakeWeight := uint64(0)
minThreshold := (totalStakeWeight * quorumNum) / quorumDen
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&warp.BitSetSignature{Signature: [bls.SignatureLen]byte{}},
)
require.NoError(err)
gotMsg, gotNum, gotDen, err := aggregator.AggregateSignatures(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should these be aggregatedSignatureWeight and totalWeight rather than num and den which suggests numerator/denominator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah these names make a lot more sense

Comment on lines +348 to +360
wantNum := uint64(0)
for _, i := range tt.wantSigners {
require.True(bitSet.Contains(i))
wantNum += tt.validators[i].Weight
}

wantDen := uint64(0)
for _, v := range tt.validators {
wantDen += v.Weight
}

require.Equal(wantNum, gotNum)
require.Equal(wantDen, gotDen)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same naming comment on numerator/denominator here

Comment on lines +55 to +71
{
name: "aggregates from all validators 1/1",
peers: map[ids.NodeID]p2p.Handler{
nodeID0: NewHandler(&testVerifier{}, signer0),
},
ctx: context.Background(),
validators: []Validator{
{
NodeID: nodeID0,
PublicKey: pk0,
Weight: 1,
},
},
wantSigners: []int{0},
quorumNum: 1,
quorumDen: 1,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super easy to read these test cases ❤️

quorumDen: 1,
},
{
name: "aggregates from some validators - 1/3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make these names a little more descriptive to the edge case that they're testing?

ex.

Suggested change
name: "aggregates from some validators - 1/3",
name: "aggregates from min threshold - 1/3",

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through the rest, this is probably fine as is, since there's already a convention to these names for success/failure, could just be more explicit in the success case

quorumDen: 3,
},
{
name: "aggregates from some validators - 2/3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intended difference for this test case? Just > 1 success rather than > 1 failure but still meeting minimum threshold?

It seems each success test case is meeting the exact required threshold. This is very well tested as is, but could also add cases for reaching greater than minimum threshold.

for nodeID := range nodeIDs {
network, ok := peerNetworks[nodeID]
if !ok {
return fmt.Errorf("%s is not connected", nodeID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this drop the requests since normally an error from the sender would be treated as a fatal error?

network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
aggregatedStakeWeight := uint64(0)
totalStakeWeight := uint64(0)
for i, v := range validators {
totalStakeWeight += v.Weight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably protect against overflow here. (It is possible for this to overflow with a real subnet)

network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
}

failedStakeWeight := uint64(0)
minThreshold := (totalStakeWeight * quorumNum) / quorumDen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalStakeWeight * quorumNum can overflow

network/p2p/acp118/aggregator.go Outdated Show resolved Hide resolved
Comment on lines +173 to +179
// Fast-fail if it's not possible to generate a signature that meets the
// minimum threshold
failedStakeWeight += result.Validator.Weight
if totalStakeWeight-failedStakeWeight < minThreshold {
return nil, 0, 0, ErrFailedAggregation
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with hypersdk's expected usage here? I thought that num/dem were going to be the maximum weights it would wait for, but that the minimum would be lower than that (meaning that if we are passing in the max here, we could be terminating when we realize we can't get the maximum... But we actually could have gotten the number that hypersdk wanted).

Copy link
Contributor Author

@joshua-kim joshua-kim Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes more sense for us to change the behavior so that this api blocks until all responses come back, or we reach the provided num/den threshold instead of failing instead.

Comment on lines +248 to +251
if !bls.Verify(validator.PublicKey, signature, r.message.UnsignedMessage.Bytes()) {
r.results <- result{Validator: validator, Err: errFailedVerification}
return
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Comment on lines +63 to +75
// AggregateSignatures blocks until quorumNum/quorumDen signatures from
// validators are requested to be aggregated into a warp message or the context
// is canceled. Returns the signed message and the amount of stake that signed
// the message. Caller is responsible for providing a well-formed canonical
// validator set corresponding to the signer bitset in the message.
func (s *SignatureAggregator) AggregateSignatures(
ctx context.Context,
message *warp.Message,
justification []byte,
validators []Validator,
quorumNum uint64,
quorumDen uint64,
) (*warp.Message, uint64, uint64, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this correctly handles the case that BLS public keys are shared across validators.

In Warp, only 1 signature is ever allowed from a BLS key in a warp message. If different nodeIDs have the same BLS key, their weights are aggregated for the BLS key's index

joshua-kim and others added 4 commits November 21, 2024 12:22
Co-authored-by: Stephen Buttolph <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Co-authored-by: Stephen Buttolph <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Signed-off-by: Joshua Kim <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants