User Logs & Feedback for issue 10788 #10907
Replies: 11 comments 8 replies
-
@RobQuistNL1 - #10788 (comment)Lotus Version
Repro Steps
Describe the Bug$ lotus sync wait Logging Information
2 - #10788 (comment)Just happened again, looks like the last block sync happened 12:56. ( I see no "New heaviest tipset" in the logs after that) Here's a bunch of logs; |
Beta Was this translation helpful? Give feedback.
-
@marshyonline1 - #10788 (comment)See: https://filecoinproject.slack.com/archives/CP50PPW2X/p1682716335088099 Out of 8 nodes - 3 keep getting stuck like this every few hours and require a reboot to get back into sync. |
Beta Was this translation helpful? Give feedback.
-
@scaseye1 - #10788 (comment)my lotus goes more than 5 epochs out of sync ~4 hours... most recent out of sync was 825am this morning. daemon log for 8am-9am is attached. also attaching 2 more logs for the prior to hour windows where lotus fell out of sync today. prior to the 1.23.0 upgrade i was running splitstore without issue for over a month.
8am-daemon.txt 2 - #10788 (comment)here are more logs pmap and -QUIT on the daemon 3 - #10788 (comment)i also run my daemon service file with these set
4 - #10788 (comment)build settings
5 - #10788 (comment)my last sync issue was May 3rd 12:20am pst. since then no issues. it is as if something changed on the network. as i have not changed anything on my node. been in sync since restarting lotus daemon at that time. 6 - #10788 (comment)my current daemon systemd setup
7 - #10788 (comment)yes on splitstore. 8 - #10788 (comment)i already posted my working systemd config. would be interesting to see Stus non systemd settings and compare |
Beta Was this translation helpful? Give feedback.
-
@marco-storswift1 - #10788 (comment)#10791 , the same case 2 - #10788 (comment)I found a special case, the chain sync block eight minutes
3 - #10788 (comment)
4 - #10788 (comment)goroutines.zip |
Beta Was this translation helpful? Give feedback.
-
@Trevor-K-Smith1 - #10788 (comment)My configuration is identical to that in Scaseye, including the Lotus version, Go version, and overall settings. There isn't much more to mention, aside from persistent issues that are causing significant damage. I have attempted adjusting the settings using various configurations, but none have led to a stable fast Lotus. It syncs quickly, then gets stuck, and after restarting, the cycle repeats. This is quite unstable compared to earlier versions. |
Beta Was this translation helpful? Give feedback.
-
@stuberman1 - #10788 (comment)I have also struggled with chain sync issues, even after upgrading to Lotus v1.23.0 and v1.23.1-rc1 2 - #10788 (comment)I run splitstore, but only seems to be an issue using systemd 3 - #10788 (comment)Here are my splitstore settings and CLI command:
|
Beta Was this translation helpful? Give feedback.
-
@piknikSteven20211 - #10788 (comment)Our systemd settings:
We are NOT running splitstore and seeing syncing issues as well. Block validations are quick, but the validations seem to be extremely delayed. It appears to me that this issue only occurs on daemons with an active boost node connected to it.
All Lotus nodes are v1.23.0 and Boost v1.7.2. We'll apply #10855 and see how it goes with sync. 2 - #10788 (comment)Cherry-picking commit
I'll leave it going overnight and see how it goes. 3 - #10788 (comment)
4 - #10788 (comment)
5 - #10788 (comment)Here is the lotus pprof goroutines output of an out of sync daemon with custom v1.23.0 Lotus version: 6 - #10788 (comment)out-of-sync-goroutines-2.txt 7 - #10788 (comment)I can confirm that this is only prevalent on nodes connected to miners, and only if they are sealing. For us, it is impacting regardless if they are snap deals or not. All sealing causes sync issues. |
Beta Was this translation helpful? Give feedback.
-
@donkabatConversation conducted in Slack 👇 |
Beta Was this translation helpful? Give feedback.
-
@TippyFlitsUKConversation conducted in Slack 👇 |
Beta Was this translation helpful? Give feedback.
-
@TippyFlitsUK hello, https://filecoinproject.slack.com/archives/CEGN061C5/p1684498564684429
|
Beta Was this translation helpful? Give feedback.
-
This issue has now been resolved in #10906 |
Beta Was this translation helpful? Give feedback.
-
User-provided logs and feedback for issue #10788 in the thread below.
Please also refer to the Slack thread for various @arajasek , @magik6k & @ZenGround0 responses 👍
Beta Was this translation helpful? Give feedback.
All reactions