Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(p2p): bootstrap peer connectivity broken in v0.8.7 #647

Open
teslashibe opened this issue Dec 3, 2024 · 2 comments
Open

bug(p2p): bootstrap peer connectivity broken in v0.8.7 #647

teslashibe opened this issue Dec 3, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@teslashibe
Copy link
Contributor

Masa-Oracle version:
Application Version: v0.8.7
Protocol Version: v0.8.4

Environment, CPU architecture, OS, and Version:

Environment: test
Installation type: Local build (make run-api-enabled)

Describe the bug
Network connectivity regression in v0.8.7 prevents nodes from connecting to bootstrap peers. Nodes are unable to establish connections with bootstrap peers, resulting in an empty routing table and network isolation. This functionality was previously working in v0.8.6.

To Reproduce

  1. Build and run masa-oracle v0.8.7:
make run-api-enabled
  1. Observe logs showing repeated connection failures to bootstrap nodes
  2. Check routing table size remains at 0
  3. Verify protocol advertisement fails

Expected behavior

  • Node should successfully connect to configured bootstrap peers
  • Routing table should populate with discovered peers
  • Protocol advertisement should succeed
  • Node should fully participate in the network as a peer

Logs

ERRO[0030] [-] Failed to connect to bootstrap peer 16Uiu2HAmBcNRvvXMxyj45fCMAmTKD4bkXu92Wtv4hpzRiTQNLTsL: identify failed to complete: context deadline exceeded

ERRO[0030] [-] Connection failed for peer: 16Uiu2HAm7KfNcv3QBPRjANctYjcDnUvcog26QeJnhDN9nazHz9Wi failed to dial: failed to dial 16Uiu2HAm7KfNcv3QBPRjANctYjcDnUvcog26QeJnhDN9nazHz9Wi: no addresses

ERRO[0035] [-] Failed to connect to bootstrap peer 16Uiu2HAm7KfNcv3QBPRjANctYjcDnUvcog26QeJnhDN9nazHz9Wi: failed to dial: failed to dial 16Uiu2HAm7KfNcv3QBPRjANctYjcDnUvcog26QeJnhDN9nazHz9Wi: all dials failed
  * [/ip4/3.213.117.85/udp/4001/quic-v1] timeout: no recent network activity

INFO[0045] [-] Unable to connect to a boot node at this time. Waiting...

DEBU[0115] [-] Failed to advertise protocol with error failed to find any peer in table

INFO[0120] [+] Routing table size: 0

Additional context
Configuration details:

  • UDP: true
  • TCP: false
  • Port: 4001
  • Bootstrap nodes:
    • /ip4/52.6.77.89/udp/4001/quic-v1/p2p/16Uiu2HAmBcNRvvXMxyj45fCMAmTKD4bkXu92Wtv4hpzRiTQNLTsL
    • /ip4/3.213.117.85/udp/4001/quic-v1/p2p/16Uiu2HAm7KfNcv3QBPRjANctYjcDnUvcog26QeJnhDN9nazHz9Wi
    • /ip4/52.20.183.116/udp/4001/quic-v1/p2p/16Uiu2HAm9Nkz9kEMnL1YqPTtXZHQZ1E9rhquwSqKNsUViqTojLZt
    • /ip4/73.70.162.103/udp/4001/quic-v1/p2p/16Uiu2HAmKA2LQTtyfc44cfgzf5toxA1gJDoTgjX4ezKBJD2XM35T

This is a regression as the networking functionality was working correctly in v0.8.4. Investigation should focus on changes in peer connection handling, QUIC/UDP protocol modifications, and DHT implementation changes between v0.8.4 and v0.8.7.

@teslashibe teslashibe added the bug Something isn't working label Dec 3, 2024
@mcamou
Copy link
Contributor

mcamou commented Dec 4, 2024

@teslashibe 52.6.77.89 and 52.20.183.116 are 2 of our official nodes, and those two are now working (I had to restart 52.20.183.116, it was locked up). I don't know where 3.213.117.85 and 73.30.162.103 came from, could you please elaborate?

@5u6r054
Copy link
Contributor

5u6r054 commented Dec 5, 2024

@mcamou @teslashibe noticing this references v0.8.7, but the nodes and we are on v0.8.8 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants