Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster's master node crashed, Server panics and not able to start the node again #116

Open
TheBlockchainDevNeeraj opened this issue Aug 12, 2019 · 8 comments

Comments

@TheBlockchainDevNeeraj
Copy link

I have a three node cluster, After 7500+ transactions, the master node suddenly stopped and crashed, and now not even starting, following is the stack trace, please suggest what is wrong:

rpc call eth_coinbase() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_coinbase() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_coinbase() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_blockNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_blockNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_getBlockByNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
2019/08/12 13:07:22 http: panic serving 172.17.0.1:60370: runtime error: invalid memory address or nil pointer dereference
goroutine 315 [running]:
net/http.(*conn).serve.func1(0xc420185b80)
/usr/local/go/src/net/http/server.go:1726 +0xd0
panic(0x7e6900, 0xc5edb0)
/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/ybbus/jsonrpc.(*RPCResponse).GetObject(0x0, 0x7af020, 0xc42029a780, 0x8e, 0x0)
/go/src/github.com/ybbus/jsonrpc/jsonrpc.go:609 +0x26
github.com/synechron-finlabs/quorum-maker-nodemanager/client.(*EthClient).GetBlockByNumber(0xc4200619f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/client/EthClient.go:154 +0x201
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).latestBlockDetails(0xc420051250, 0x7fff2ca86e81, 0x16, 0xc4203ccf80, 0xc42004ebb8)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeService.go:1026 +0x100
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).LatestBlockHandler(0xc420051250, 0x8a7d20, 0xc4202a6460, 0xc4203e4600)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeServiceHandler.go:442 +0x51
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).LatestBlockHandler-fm(0x8a7d20, 0xc4202a6460, 0xc4203e4600)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/main.go:78 +0x48
net/http.HandlerFunc.ServeHTTP(0xc420051b70, 0x8a7d20, 0xc4202a6460, 0xc4203e4600)
/usr/local/go/src/net/http/server.go:1947 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc420126180, 0x8a7d20, 0xc4202a6460, 0xc42031b600)
/go/src/github.com/gorilla/mux/mux.go:212 +0xcd
net/http.serverHandler.ServeHTTP(0xc420090ea0, 0x8a7d20, 0xc4202a6460, 0xc42031b600)
/usr/local/go/src/net/http/server.go:2694 +0xbc
net/http.(*conn).serve(0xc420185b80, 0x8a80a0, 0xc42006b0c0)
/usr/local/go/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2795 +0x27b
rpc call eth_blockNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_blockNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_getBlockByNumber() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
2019/08/12 13:07:22 http: panic serving 172.17.0.1:60796: runtime error: invalid memory address or nil pointer dereference
goroutine 333 [running]:
net/http.(*conn).serve.func1(0xc420300140)
/usr/local/go/src/net/http/server.go:1726 +0xd0
panic(0x7e6900, 0xc5edb0)
/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/ybbus/jsonrpc.(*RPCResponse).GetObject(0x0, 0x7af020, 0xc420175b80, 0x8e, 0x0)
/go/src/github.com/ybbus/jsonrpc/jsonrpc.go:609 +0x26
github.com/synechron-finlabs/quorum-maker-nodemanager/client.(*EthClient).GetBlockByNumber(0xc4200619f0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/client/EthClient.go:154 +0x201
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).latestBlockDetails(0xc420051250, 0x7fff2ca86e81, 0x16, 0xc4203fb080, 0xc42004ebb8)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeService.go:1026 +0x100
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).LatestBlockHandler(0xc420051250, 0x8a7d20, 0xc420272b60, 0xc42031ba00)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeServiceHandler.go:442 +0x51
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).LatestBlockHandler-fm(0x8a7d20, 0xc420272b60, 0xc42031ba00)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/main.go:78 +0x48
net/http.HandlerFunc.ServeHTTP(0xc420051b70, 0x8a7d20, 0xc420272b60, 0xc42031ba00)
/usr/local/go/src/net/http/server.go:1947 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc420126180, 0x8a7d20, 0xc420272b60, 0xc42031b800)
/go/src/github.com/gorilla/mux/mux.go:212 +0xcd
net/http.serverHandler.ServeHTTP(0xc420090ea0, 0x8a7d20, 0xc420272b60, 0xc42031b800)
/usr/local/go/src/net/http/server.go:2694 +0xbc
net/http.(*conn).serve(0xc420300140, 0x8a80a0, 0xc42025bf40)
/usr/local/go/src/net/http/server.go:1830 +0x651
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2795 +0x27b
rpc call eth_coinbase() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused
rpc call eth_coinbase() on http://localhost:22000: Post http://localhost:22000: dial tcp 127.0.0.1:22000: connect: connection refused

@TheBlockchainDevNeeraj TheBlockchainDevNeeraj changed the title Cluster's master node crashed, Server panincs and node able to start the node again Cluster's master not crashed, Server panics and node able to start the node again Aug 12, 2019
@TheBlockchainDevNeeraj TheBlockchainDevNeeraj changed the title Cluster's master not crashed, Server panics and node able to start the node again Cluster's master node crashed, Server panics and not able to start the node again Aug 12, 2019
@TheBlockchainDevNeeraj
Copy link
Author

I have tried looking into the constellation logs which reads as:
gas limit reached **

is this related.

@zhjzcbm
Copy link

zhjzcbm commented Aug 21, 2019

Has it been solved?

@TheBlockchainDevNeeraj
Copy link
Author

No I am still seeing this issues popping up all the time when ever I crossed the 6K or 7k block count. Can you sense any reason for this. I guess there is something with syncing node as it some times shows the error in syncing the raft ID.

@zhjzcbm
Copy link

zhjzcbm commented Aug 21, 2019

我使用raft.remove() 删除故障节点,然后重新加入网络来解决这个问题

I use raft. remove () to delete the fault node and then rejoin the network to solve this problem.

@TheBlockchainDevNeeraj
Copy link
Author

TheBlockchainDevNeeraj commented Aug 21, 2019

My Master node was down but the slaves were working fine, showing 2 nodes in the network. Can I remove the Master node from slaves, and even if it is possible i still don't know how to do that.
Can you please help me in figuring this out in terms of what will be the command to remove the node using said method. Then I will be able to rejoin on my own 😄

@zhjzcbm
Copy link

zhjzcbm commented Aug 22, 2019

使用geth打开geth.ip文件 这个文件在你的节点目录里node/qdate/
attach ./geth.ipc
如果你没安装geth需要安装一个。你需要去https://github.com/jpmorganchase/quorum.git 下载
这里还需要安装 sudo apt install make -y| sudo yum install make -y
然后cd quorum
geth make
sudo cp ./build/bin/geth /usr/bin/

当你使用attach ./geth.ipc进入控制台后
使用raft查看节点信息
raft.remove(节点ID)
来删除故障节点
以上是在正常节点上进行

Use geth to open the geth.ipc file. This file is in your node directory node/qdate/
code:
attach ./geth.ipc
If you don't install geth, you need to install one. You need to download it at https://github.com/jpmorganchase/quorum.git

You also need to install sudo apt install make -y | sudo yum install make -y

cd quorum

make geth

sudo cp ./build/bin/geth /usr/bin/

When you enter the console using attach ./geth.ipc.

Use raft to view node information

raft.remove(node ID)

To delete the fault node

This is done on normal nodes.

@fullkomnun
Copy link

fullkomnun commented Jan 22, 2020

Just ran into the same issue yesterday, while running a tessera-based quorum network of 3 nodes
as generated by quorum-maker(with some amendments to tessera-config.json) as part of integration testing using test-containers docker-compose module. Running on macOS.

The crash of node3:
Waiting for Node 1 to deploy NetworkManager contract...
{"level":"info","msg":"Node Manager listening on :22004...","time":"2020-01-21T17:03:24Z"}
{"level":"info","msg":"Adding whitelisted IPs","time":"2020-01-21T17:03:28Z"}
rpc call eth_getBlockByNumber() on http://localhost:22000: Post http://localhost:22000: EOF
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x703b86]

goroutine 19 [running]:
github.com/ybbus/jsonrpc.(*RPCResponse).GetObject(0x0, 0x7af020, 0xc420149040, 0x5c, 0x0)
/go/src/github.com/ybbus/jsonrpc/jsonrpc.go:609 +0x26
github.com/synechron-finlabs/quorum-maker-nodemanager/client.(*EthClient).GetBlockByNumber(0xc420061b60, 0xc420272810, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/client/EthClient.go:154 +0x201
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).getContracts(0xc420051250, 0x7ffc2cecff21, 0x16)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeService.go:1440 +0x736
github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).ContractCrawler.func1(0xc420360000, 0xc420051250, 0x7ffc2cecff21, 0x16)
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeService.go:1427 +0xa1
created by github.com/synechron-finlabs/quorum-maker-nodemanager/service.(*NodeServiceImpl).ContractCrawler
/go/src/github.com/synechron-finlabs/quorum-maker-nodemanager/service/NodeService.go:1424 +0x70

This crash happened (somewhat) consistently when running a bunch of tests on the same underlying network. I cannot reproduce it now for no apparent reason.
It seems that performing the same test with only 2 nodes eliminates this issue.

Any insights regarding the reason for this crash?

@fullkomnun
Copy link

I think I might have solved this mystery.

Raft consensus that is used by quorum-maker is sensitive to clock sync issues and there were known time-drift issues on macOS.
Making sure the latest version of docker for mac is installed and adding volume binding such as /etc/localtime:/etc/localtime:ro
to docker-compose.yml seems to eliminate these issues.

Resources:
https://stackoverflow.com/questions/22800624/will-docker-container-auto-sync-time-with-the-host-machine
https://www.docker.com/blog/addressing-time-drift-in-docker-desktop-for-mac/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants