Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to sign ValidationUptime message #3556

Open
perfilev-faraway opened this issue Nov 20, 2024 · 12 comments
Open

Unable to sign ValidationUptime message #3556

perfilev-faraway opened this issue Nov 20, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@perfilev-faraway
Copy link

Describe the bug
I'm trying to request signing of ValidationUptime message. I use latest signature-aggregator from https://github.com/ava-labs/awm-relayer. I use v1.12.0-initial-poc.9 of avalanchego. I also use subnet-evm of this commit - ava-labs/subnet-evm@cc414c8.

I see Response contained an empty signature","nodeID":"NodeID-D8ookM2Ne9VWvpjCBnrwiniwqqP8YuUNQ in signature-aggregator logs. Actual AppResponse is zeroed for all connected validators. Note that signature-aggregator signs RegisterValidator message successfully.

To Reproduce

  1. Create and deploy PoS L1 SoV (acp-77) on etna-devnet - remember blockchain_id
  2. Add one more validator (in addition to bootstrapped ones) - remember validation_id
  3. Craft UnsignedMessage according to ACP-77 (network_id: 76, blockchain_id: target blockchain id, source address of addressed call: nil)

Expected behavior
Signed message by validators.

Screenshots
n/a

Logs
signature-aggregator.log
logs.zip - logs from both validators (node1 - bootstrapped, node2 - newly added)

Metrics
n/a

Operating System

Distributor ID: Ubuntu
Description:    Ubuntu 24.04.1 LTS
Release:        24.04
Codename:       noble

Additional context
n/a

To best protect the Avalanche community security bugs should be reported in accordance to our Security Policy

@perfilev-faraway perfilev-faraway added the bug Something isn't working label Nov 20, 2024
@ceyonur
Copy link
Contributor

ceyonur commented Nov 20, 2024

Can you show how you craft the ValidationUptime message? I don't think ACP-77 has any specification for that message type.

An example can be found here.

@perfilev-faraway
Copy link
Author

I followed your example.

  1. My original ValidationUptime message has been crafted to exact the same bytes. So no issues with my builder.
  2. When I specified my node's url: http://44.218.255.99:9650 as uri, 2fW9co8x6YMshzVDBC2S1jkfxxuVMiMQA8a25bz4VnMQT3oxed as validationId and 2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM as sourceChainID in the example, I've got the following error:
2024/11/26 18:34:02 received app_error: chain_id:"\xb2\xccy\xaaΟT\x0b\xc2\t'{\xcc\xd5\xdb\t\"t\x83\xa7\xe2\x95\xe4!g\xb27C\xc4"  error_code:2  error_message:"failed to get validator for validationID 2fW9co8x6YMshzVDBC2S1jkfxxuVMiMQA8a25bz4VnMQT3oxed: not found"

but I just added this validator:

initializeValidatorRegistration:  0x7e2ef599dbd99e27bfebf0b1591c512173194f21b63e990e9f18fd5f7a49b887
Validation ID:  2fW9co8x6YMshzVDBC2S1jkfxxuVMiMQA8a25bz4VnMQT3oxed
Weight:  3n
registerSubnetValidator:  YgTHdoCzSFnXDBgzxLd5wQFLkQpDoTwkiECyoJxgdkbg1KTFt
completeValidatorRegistration:  0x4eee302589d501ce866ecd2ea124a5d0d8c389fb3bbfccf60b31ad82628585de

@perfilev-faraway
Copy link
Author

I just tried the following:

  1. Create new L1 blockchain with latest CLI (ava-labs/avalanche-cli@b62b006)
  2. Deploy it to Etna Devnet (PoS Validator)
  3. Add new PoS validator (both nodes uses v1.12.0-fuji, subnet-evm - v0.6.12)
  4. Use example from https://github.com/ava-labs/subnet-evm/blob/master/examples/sign-uptime-message/main.go#L48

I've got:

2024/11/26 21:12:50 failed to start peer: closed

@ceyonur
Copy link
Contributor

ceyonur commented Nov 26, 2024

Hmm, there seems a buggy behavior as that example creates a peer with [email protected] (the dependency in subnet-evm v0.6.12) and then tries to connect to a node that's running v1.12.0 and gets rejected. Could you try with this version https://github.com/ava-labs/subnet-evm/blob/v0.6.12-fuji-example/examples/sign-uptime-message/main.go?

For your other issue, could you confirm that validator is registered with P-Chain API platform.GetL1Validator. There is a 1 minute delay between P-chain validator state and Subnet-EVM validator state, so it might not be immediately available to Subnet-EVM.

You can also see the current state in Subnet-EVM with https://docs.avax.network/api-reference/subnet-evm-api#validators_getcurrentvalidators

@perfilev-faraway
Copy link
Author

perfilev-faraway commented Nov 27, 2024

I repeated all my previous steps with your branch of subnet-evm. If I use uptime more than 0:

./main                                                          
2024/11/27 15:23:53 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 0 is less than queried uptime 1 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

If I set uptime to 0:

2024/11/27 15:25:51 received app_response: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" app_bytes:"\n`\x85\x1bG\x83lZi3\x06\xef\x15Dx\xea63\x02x\xdb;\xf2\x0f\xe2\x08\xc58\xf4\xc1\x11\x95Q\xe4GN\xf5\x9d\x8c\xca\x119`\x04?\x18$P\xea\x98\x14Pj\xc6\x0e\xc1\x81E\x02\xb7:\xff(\xb2\xda\xcc\xc8\x7f\x12\xb7tI\xe5X\xb9Q\x1fz\x9a\x0bM\xaf\xb2 \x161\x9d*\n%\xe7\xb7ݦ\x1b\xd9s\t"

So, it looks like probably signing works (I suppose these bytes are signed message). But there's another issue with uptime tracking.

I'm sure I added my PoS validator, check logs below:

[11-27|15:15:40.852] INFO <P Chain> validators/logger.go:50 node added to validator set {"subnetID": "2edLNXw85rxZEAZnBtnJkCim59qDpupJwdzXmJP7dkTSNrTssy", "nodeID": "NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr", "publicKey": "0xa4c4a45f2d75e6613aa95ae5953fd0c75a92cbe5b6178210ca471be069880e3f7a617d51e941f3d426e87b9ac87680fd", "txID": "2kyk8VNfrzbQq4LTZQS75Q5qSNx1WdD8u22WBDZXVvfEXDg8nJ", "weight": 3}
[11-27|15:15:40.854] INFO <X Chain> snowman/engine.go:511 starting consensus {"lastAcceptedID": "QX9f1LK6fpFeQXffW6WkECWUzcfx38HCnwGdRBum3W47QLDUy", "lastAcceptedHeight": 1}
[11-27|15:15:40.854] INFO <C Chain> snowman/engine.go:511 starting consensus {"lastAcceptedID": "2Cko9ff7JrKH8btWTqMfsxwJWsa8xkM5FW61uAJxj4mUHs3wsi", "lastAcceptedHeight": 388}
[11-27|15:16:01.665] INFO health/worker.go:261 check started passing {"name": "health", "name": "network", "tags": ["application"]}
[11-27|15:16:01.666] INFO health/worker.go:261 check started passing {"name": "health", "name": "P", "tags": ["11111111111111111111111111111111LpoYY"]}
[11-27|15:16:01.666] INFO health/worker.go:261 check started passing {"name": "health", "name": "X", "tags": ["11111111111111111111111111111111LpoYY"]}
[11-27|15:16:01.666] INFO health/worker.go:261 check started passing {"name": "health", "name": "C", "tags": ["11111111111111111111111111111111LpoYY"]}

I waited ~10-15 mins before I tried to sign uptime proof.

UPD:

after bootstrap node restarting I see uptime tracking has been starting. But it's strange behavior imo. Uptime is not tracked in realtime. Looks like it refreshes only after node restarts.

./main
2024/11/27 15:46:48 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 51 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

one min later...

./main
2024/11/27 15:47:10 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 51 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

node restart

2024/11/27 15:47:43 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 62 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

one min later... (unexpectedly 63 instead of 62)

./main
2024/11/27 15:48:06 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 63 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

one min later...

 ./main
2024/11/27 15:48:45 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 63 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

one min later...

 ./main
2024/11/27 15:49:11 received app_error: chain_id:"{O\xb8\x07\x0f\x84\x8b\xfc\xae{\xb8T\xa0\xb7\xecQ\xcf֫\xf0fHGDѣ\x83\xad%n\xaf<" error_code:2 error_message:"current uptime 63 is less than queried uptime 500 for nodeID NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr"

@ceyonur
Copy link
Contributor

ceyonur commented Nov 27, 2024

Could it be NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr was never active (don't have enough fees) or never connected to the node you're trying to query? Could you check this API on the node you're querying https://docs.avax.network/api-reference/subnet-evm-api#validators_getcurrentvalidators?

This seems a behaviour that the node you're trying to sign that message from was neve active or not connected to NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr. Uptime increases when you restart because the node you're restarting optimistically assumes NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr was online&active during the offline period.

@perfilev-faraway
Copy link
Author

here's my validators output:

{
  "jsonrpc": "2.0",
  "result": {
    "validators": [
      {
        "validationID": "VxhkQPVgphgGNoHUhxaBMzDXQEngWNFR13weqnHpwurrX8N96",
        "nodeID": "NodeID-8XXvVGYhuQTgDpU5xQrAAYHA96o5tNSiR",
        "weight": 100,
        "startTimestamp": 1732702169,
        "isActive": true,
        "isL1Validator": true,
        "isConnected": true,
        "uptimePercentage": 100,
        "uptimeSeconds": 16753
      },
      {
        "validationID": "2kyk8VNfrzbQq4LTZQS75Q5qSNx1WdD8u22WBDZXVvfEXDg8nJ",
        "nodeID": "NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr",
        "weight": 3,
        "startTimestamp": 1732702418,
        "isActive": true,
        "isL1Validator": true,
        "isConnected": false,
        "uptimePercentage": 36.02763,
        "uptimeSeconds": 5946
      }
    ]
  },
  "id": 1
}

@ceyonur
Copy link
Contributor

ceyonur commented Nov 27, 2024

that node (NodeID-JyFeprQ8ATipxgqruMcN3RMeZjSCiQGkr) was not connected to the node you're querying. see isConnected field.

other one seems connected (or it's the same node).

@perfilev-faraway
Copy link
Author

Yep, looks like the issue is in lack of connection between two nodes. I was able to fix it by adding nodes in --bootstrap-ids, ---bootstrap-ips.

When this branch of subnet-evm is supposed to be merged or tagged? So I could rely on it as some kind of finalized?

@perfilev-faraway
Copy link
Author

And another Q. Am I right that with public IP nodes should be observed without need of specifying bootstrap nodes?

@ceyonur
Copy link
Contributor

ceyonur commented Nov 27, 2024

Yep, looks like the issue is in lack of connection between two nodes. I was able to fix it by adding nodes in --bootstrap-ids, ---bootstrap-ips.

When this branch of subnet-evm is supposed to be merged or tagged? So I could rely on it as some kind of finalized?

All features/changes are merged to v0.6.12. The v0.6.12-fuji-example tag tagged only for this specific example, you don't actually need to upgrade your nodes with that version. The bug is just in the example code and it does not need to have it's own release, all other stuff should be working fine with v0.6.12.

Am I right that with public IP nodes should be observed without need of specifying bootstrap nodes?

I see this in your logs [11-19|07:57:31.289] WARN node/node.go:497 P2P IP is private, you will not be publicly discoverable {"ip": "10.1.26.50"} , so your nodes might not be publicly discoverable.

@perfilev-faraway
Copy link
Author

perfilev-faraway commented Nov 27, 2024

@ceyonur, first of all thank you a lot for your help here! really appreciate it

I tried to upgrade my devnet setup (upgraded avalanchego to v1.12.0-fuji and subnet-evm to v0.6.12). Then I dropped db folder and chainData/11111111111111111111111111111111LpoYY (because I saw some error on P-Chain). And finally one of my node (bootstrap node with weight 100) was able to boot and work well.

Then I tried to repeat everything with 2nd node (PoS, weight 3). But I can't sync it with bootstrap node. First run ends with:

Nov 27 17:10:00 ip-10-1-26-91 avalanchego[70286]: [11-27|17:10:00.378] INFO <P Chain> validators/logger.go:50 node added to validator set {"subnetID": "vawTks8qap7RurVKhCkBLbhSW8a1Gw5PYR2GuQeq1J9jC2spF", "nodeID": "NodeID-HM2QB24GYQQyfK8BDgfTwmNV1xj4Dp5mz", "publicKey": "0xa25f2b6df9caa137eec77ac44687545b6fe6fd1ede0d186bc5518b2ee77b0d139929deeaddbc52764b1ccc1d03c09f7a", "txID": "2fW9co8x6YMshzVDBC2S1jkfxxuVMiMQA8a25bz4VnMQT3oxed", "weight": 3}

it successfully downloads chainData, but bootstrapping never ends. After node restart I got the following:

Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.041] INFO <P Chain> validators/logger.go:50 node added to validator set {"subnetID": "vawTks8qap7RurVKhCkBLbhSW8a1Gw5PYR2GuQeq1J9jC2spF", "nodeID": "NodeID-HM2QB24GYQQyfK8BDgfTwmNV1xj4Dp5mz", "publicKey": "0xa25f2b6df9caa137eec77ac44687545b6fe6fd1ede0d186bc5518b2ee77b0d139929deeaddbc52764b1ccc1d03c09f7a", "txID": "2fW9co8x6YMshzVDBC2S1jkfxxuVMiMQA8a25bz4VnMQT3oxed", "weight": 3}
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.707] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> bootstrap/bootstrapper.go:388 starting to fetch blocks {"numKnownBlocks": 0, "numAcceptedBlocks": 2, "numMissingBlocks": 2}
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.736] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> bootstrap/storage.go:194 executing blocks {"numToExecute": 37}
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.736] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> bootstrap/storage.go:186 executed blocks {"numExecuted": 1, "numToExecute": 37, "halted": false, "duration": "234.38µs"}
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.736] FATAL <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> handler/handler.go:312 shutting down chain {"reason": "received an unexpected error", "error": "failed to verify block 2FkFk9Ai7XahyfWU1hfRzKoZQYT3Gaq6WgDj8Z1ZqccS1Ad6mk (height=1, parentID=v6c31JsdJNoYifPXLcedEYcNJkUHqg8zNXzpt2sHNucgdJhmu) in bootstrapping: unexpected proposer block type: expected block v6c31JsdJNoYifPXLcedEYcNJkUHqg8zNXzpt2sHNucgdJhmu to be a snowman.OracleBlock but it's a *chain.BlockWrapper with last accepted v6c31JsdJNoYifPXLcedEYcNJkUHqg8zNXzpt2sHNucgdJhmu (height=0) while processing sync message: ancestors from NodeID-KQi8kBX1xZFanG7n9aWkiN16mrQARGs4P"}
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.736] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> bootstrap/bootstrapper.go:782 shutting down bootstrapper
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/txpool/legacypool/legacypool.go:445 Transaction pool stopped
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:938 Closing quit channel
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:941 Stopping Acceptor
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:944 Acceptor queue drained                   t="1.146µs"
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:947 Shutting down sender cacher
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:951 Closing scope
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:955 Waiting for background processes to complete
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:974 Shutting down state manager
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:979 State manager shut down                  t=552ns
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.737] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> core/blockchain.go:984 Blockchain stopped
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.739] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> eth/backend.go:393 Stopped shutdownTracker
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.739] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> eth/backend.go:396 Closed chaindb
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.739] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> eth/backend.go:398 Stopped EventMux
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.739] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> plugin/evm/vm.go:882 Ethereum backend stop completed
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: [11-27|17:13:01.739] INFO <2MkB54t77iAmA2dT4xdkU32JCLaXbiuUV5KC7uc6YkPSMwukDM Chain> plugin/evm/vm.go:884 Subnet-EVM Shutdown completed
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: runtime engine: received shutdown signal: terminated
Nov 27 17:13:01 ip-10-1-26-91 avalanchego[70320]: vm server: graceful termination success

really sorry for prob off top here, but I'm in one step from checking if the original issue is solved for me completely.

UPD: looks like I missed upgrade flow and that's the reason. I'll recreate the chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants