-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fork in sidechain block production #1231
Comments
So the phenomenon you observed is basically two corner cases combined: Due to network latency, worker1 started to produce block So to answer the questions:
|
Thanks @clangenb
|
Sorry for the delayed response here.
On another note, if I analyse the logs correctly, worker1 failed to produce the block because the import of the parentchain block took too long. This must not be the case: #1275. |
See litentry/litentry-parachain#1524 for the compete logs.
It doesn't happen very often, actually it's the first time we've seen this. But it has a very bad consequence that two workers will diverge and never run in sync again.
I've investigated it (see litentry/litentry-parachain#1524 (comment)) and the root cause seems to be:
[2023-03-24T11:12:30Z]
worker0 claimed aura slot279942725
and produced sidechain bn13283
based on parentchain bn10024
[2023-03-24T11:12:36Z]
in the last second of aura duration (6s in our case), it finished composing and broadcasted it to worker1[2023-03-24T11:12:36Z]
the broadcast took time and worker1 started to claim slot279942726
and built its own version of sidechain bn13283
, but with parentchain bn10025
(that's the latest parentchain header worker1 synced)[2023-03-24T11:12:44Z]
worker1 failed to produce the block as it took too long[2023-03-24T11:12:44Z]
worker1 got the sidechain block from worker0 and attempted to import it. However it couldn't verify the block because it couldn't find the matching parentchain block header in its parentchain import queue.[2023-03-24T11:12:44Z]
the block was discarded by worker1 and from this point on, they couldn't sync with each other anymore.My questions:
the queue can't be popped twice (reminds me of double-free), otherwise the desired parentchain header can't be peeked. It seems we can easily run into this situation when a worker fails to produce blocks but the queue is already popped. Then the worker can't import a block that was composed based on an older parentchain bn anymore. Have you already got an idea regarding the solutions?
13283
on its slot? Then we have two valid sidechain blocks with different parentchain bn backings. Worker0 will probably reject the block from worker1 as it was already imported, what about worker1? Is the order of block import guaranteed (that it will first import the block from worker0, then worker1 itself)? What if under a very bad network situation where worker1 couldn't get the block from worker0 in time?why was that? Shall we use set instead of vectors in the
peers
orurls
collections?The text was updated successfully, but these errors were encountered: