You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, on Westend, we are able to follow the tip of the chain well. However, at a certain point finalization stalls and does not pick back up. The primary goal of this epic is to stabilize finalization when we are following the tip of the chain.
The root cause of this is that we are not receiving commit messages. Commit messages are propagated throughout the network to notify about finalization. The reason we do not receive commit messages is that in order to receive grandpa messages from our peers, we need to send neighbor messages to establish a connection with them, and this is not something we do currently.
Thus, this epic contains the implementation of sending and receiving neighbor messages. One thing to note is that neighbor messages are used to gather and store information about our peer's state and use that info to decide if we should send messages to them. For example, if a node has a voter set different from ours, do not send commit messages to them. This means we need to implement this peer state caching, leading me to my next point.
I said above that peers will not send grandpa messages to peers who are behind. This means that if we are trying to catch up to the current finalized block when we are behind, simply sending neighbor messages and listening for commit messages will not be enough, since our peers will see that we are behind and thus not send us commit messages.
This highlights the second major requirement for this epic, catch-up logic. We need to be able to submit catch-up requests and process catch-up responses in order to get our node to the updated state of our peers.
The solution looks like this:
When we lag behind, we issue the catch-up process to get our state up to date with our peers.
Once we are up to date, our peers will see, via the neighbor msg protocol we implemented, that we have caught up and thus start sending up messages.
This means we should start receiving commit messages, then we will use these to finalize and thus Grandpa will proceed with finalizing
This epic can be considered complete when we are able to:
successfully execute the catch-up process when we are behind
send and receive neighbor messages to make us eligible to receive commit messages
receive commit messages and thus finalization will continue
Issue summary
Currently, on Westend, we are able to follow the tip of the chain well. However, at a certain point finalization stalls and does not pick back up. The primary goal of this epic is to stabilize finalization when we are following the tip of the chain.
The root cause of this is that we are not receiving commit messages. Commit messages are propagated throughout the network to notify about finalization. The reason we do not receive commit messages is that in order to receive grandpa messages from our peers, we need to send neighbor messages to establish a connection with them, and this is not something we do currently.
Thus, this epic contains the implementation of sending and receiving neighbor messages. One thing to note is that neighbor messages are used to gather and store information about our peer's state and use that info to decide if we should send messages to them. For example, if a node has a voter set different from ours, do not send commit messages to them. This means we need to implement this peer state caching, leading me to my next point.
I said above that peers will not send grandpa messages to peers who are behind. This means that if we are trying to catch up to the current finalized block when we are behind, simply sending neighbor messages and listening for commit messages will not be enough, since our peers will see that we are behind and thus not send us commit messages.
This highlights the second major requirement for this epic, catch-up logic. We need to be able to submit catch-up requests and process catch-up responses in order to get our node to the updated state of our peers.
The solution looks like this:
This epic can be considered complete when we are able to:
Other information and links
The text was updated successfully, but these errors were encountered: