Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when handleChanOut was exit, it will cause lotus synchronization block #47

Open
firesWu opened this issue Apr 8, 2021 · 2 comments
Open

Comments

@firesWu
Copy link

firesWu commented Apr 8, 2021

log.Warnf("sendRequest failed: %s", err)

lotus.log

2021-04-08T03:13:34.867+0800    WARN    rpc     [email protected]/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: broken pipe
2021-04-08T03:13:34.867+0800    WARN    rpc     [email protected]/websocket.go:246        sendRequest failed: writev tcp4 ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    rpc     [email protected]/websocket.go:246        sendRequest failed: writev tcp4  ip -> ip: writev: no route to host
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries
2021-04-08T03:13:34.867+0800    WARN    chainstore      store/store.go:293      head change sub is slow, has 16 buffered entries

When miner call lotus ChainNotify api, lotus will exec SubHeadChanges. But sometime miner was crash, lotus will send request failed, so that handleOutChans method was exit and it can not consume 'headchange' topic.
finally lead to takeHeaviestTipSet method was block.

func (cs *ChainStore) takeHeaviestTipSet(ctx context.Context, ts *types.TipSet) error {
	....

	if cs.heaviest != nil { // buf
		if len(cs.reorgCh) > 0 {
			log.Warnf("Reorg channel running behind, %d reorgs buffered", len(cs.reorgCh))
		}

                // there will block
		cs.reorgCh <- reorg{
			old: cs.heaviest,
			new: ts,
		}
	} else {
		log.Warnf("no heaviest tipset found, using %s", ts.Cids())
	}

	....
}
@ricktian1226
Copy link

Similar problem...

2021-04-23T00:57:50.921+0800 ^[[33mWARN^[[0m storageminer storage/wdpost_sched.go:121 window post scheduler notifs channel closed
When miner showed "window post scheduler notifs channel closed" and redo "api.ChainNotify", it blocked for 54 minutes :
goroutine 51840 [select, 54 minutes]:

github.com/filecoin-project/go-jsonrpc.websocketClient.func2(0x3fdbf40, 0xc05dbcab70, 0x3a434ad, 0x3, 0xc0852a46f0, 0xc061c06820, 0x14, 0xc05dbcab10, 0x1, 0x1, ...)
/opt/lotus/extern/go-jsonrpc/client.go:233 +0x23e
github.com/filecoin-project/go-jsonrpc.(*client).sendRequest(0xc0005f1170, 0x3fdbf40, 0xc05dbcab70, 0x3a434ad, 0x3, 0xc0852a46f0, 0xc061c06820, 0x14, 0xc05dbcab10, 0x1, ...)
/opt/lotus/extern/go-jsonrpc/client.go:423 +0x178
github.com/filecoin-project/go-jsonrpc.(*rpcFunc).handleRpcCall(0xc0000514a0, 0xc05dbcaae0, 0x2, 0x2, 0x0, 0x0, 0x0)
/opt/lotus/extern/go-jsonrpc/client.go:535 +0x5d8
reflect.callReflect(0xc00260e600, 0xc01c829da0, 0xc01c829d88)
/usr/local/go/src/reflect/value.go:565 +0x32a
reflect.makeFuncStub(0x3fdbe80, 0xc003611900, 0xc0852a46e8, 0x8, 0x6563060, 0x35ac8e0, 0xc0852a46e0, 0xc01c829fb8, 0x1b66f83, 0xc001050e00, ...)
/usr/local/go/src/reflect/asm_amd64.s:20 +0x42
github.com/filecoin-project/lotus/api/apistruct.(*FullNodeStruct).ChainNotify(0xc001050e00, 0x3fdbe80, 0xc003611900, 0xc0852a46e8, 0x8, 0x0, 0x0, 0x0)
/opt/lotus/api/apistruct/struct.go:985 +0x55
github.com/filecoin-project/lotus/storage.(*WindowPoStScheduler).Run(0xc00368de40, 0x3fdbe80, 0xc003611900)
/opt/lotus/storage/wdpost_sched.go:95 +0x9a3
created by github.com/filecoin-project/lotus/node/modules.StorageMiner.func1.1
/opt/lotus/node/modules/storageminer.go:249 +0x67

And it shows these logs at lotus's side:
2021-04-23T00:57:50.922+0800 ^[[34mINFO^[[0m chainstore store/store.go:275 chain head sub from f0216849 begin 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:297 chain head sub from f0216849 unsub 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:297 chain head sub from f0216849 unsub 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:285 chain head sub from f0216849 exit loop 2021-04-23T00:57:58.706+0800 ^[[34mINFO^[[0m chainstore store/store.go:286 chain head sub from f0216849 end 2021-04-23T00:57:58.706+0800 ^[[33mWARN^[[0m chainstore store/store.go:285 chain head sub from f0216849 exit loop 2021-04-23T00:57:58.706+0800 ^[[34mINFO^[[0m chainstore store/store.go:286 chain head sub from f0216849 end

It required restarting miner to resolve this issue ...

@ricktian1226
Copy link

ricktian1226 commented Apr 23, 2021

2 questions:

  1. it's a problem that 'unsub' and 'end' miner head sub twice?
  2. go-jsonrpc sendRequest hang up for a long time, does it have any way to set timeout from hanging up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants