Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to bootstrapping #3480

Draft
wants to merge 13 commits into
base: master
Choose a base branch
from

Commits on Oct 16, 2024

  1. Unify engines

    The lifecycle of an avalanchego node can be separated to three main stages:
    
    Stage I - If supported by the VM, then a state sync occurs. Otherwise, this stage is skipped.
    In the X-chain, this stage is entirely substituted by a custom bootstrapping phase that replicates and executes the X-chain DAG.
    
    Stage II - A stage called bootstrapping, which entails replicating the blocks from other avalanchego nodes, and executing them.
    
    Stage III - The final form of an avalanchego node, in which it runs the snowman consensus protocol.
    
    The phases are implemented by components called "engines":
    
    - avalanche bootstrapping
    - snowman
    - state syncer
    - snowman bootstrapper
    
    And the handler in snow/networking/handler/ is responsible to route messages from the network into the correct engine.
    Engines all implement the same common.Engine interface, but the interface consists of the union of all operations across all engines.
    
    Indeed, it is often the case that a message of type `m` dispatched by engine `e`, cannot be processed by a different engine `e'.
    For instance, a Chits message cannot be processed by any engine other than consensus, and a message about a query of state summary is only relevant to the state sync engine.
    
    To that end, each engine simply implements a no-op dispatcher for messages it does not care about.
    
    The biggest problem with the existing aforementioned structure, is that the lifecycle of the engines imposes a strict one way movement across the various stages,
    and there is no single component which consolidates the transition among the stages. The movement between the various stages takes place by a callback given to each
    engine at every stage but the final one (Stage I and Stage II).
    
    The structure therefore makes it difficult to introduce movement from snowman consensus back to bootstrapping / state sync, or to have better control over the message dispatch.
    
    This commit unifies all engines into a single one under snow/engine/unified
    
    As a result, the implementation of the handler in snow/networking/handler/handler.go is now simpler,
    as it only interacts with the unified engine, and it never needs to query the snow.EngineState.
    
    The state transition between the various stages is now taken care of by the unified engine, and since the code to dispatch messages to the right engine
    is now all in the unified engine, it's not only more testable but it lets us move among the stages in the same place where we consider the stage we're in to dispatch
    the message to the correct engine.
    
    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 16, 2024
    Configuration menu
    Copy the full SHA
    d2505c2 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2024

  1. Remove unused meteredHandler struct type (ava-labs#3450)

    Signed-off-by: Joshua Kim <[email protected]>
    joshua-kim authored and yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    a3a4aaa View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    d891af5 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d02a375 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c332ec7 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    18a3c62 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    7817c28 View commit details
    Browse the repository at this point in the history
  7. Update versions for v1.11.12 (ava-labs#3470)

    Co-authored-by: Darioush Jalali <[email protected]>
    2 people authored and yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    ee6f5bb View commit details
    Browse the repository at this point in the history
  8. Detect if our node is behind the majority

    This commit adds a mechanism that detects that our node is behind the majority of the stake.
    The intent is to later have this mechanism be the trigger for the bootstrapping mechanism.
    Currently, the bootstrapping mechanism is only active upon node boot, but not at a later point.
    
    The mechanism works in the following manner:
    
    - It intercepts the snowman engine's Chits message handling, and upon every reception of the Chits message,
      the mechanism that detects if the node is a straggler (a node with a ledger height behind the rest) may be invoked,
      if it wasn't invoked too recently.
    - The mechanism draws statistics from the validators known to it, and computes the latest accepted block for each validator.
    - The mechanism then proceeds to determine which blocks are pending to be processed (a block pending to be processed was not accepted).
    - The mechanism then collects a snapshot of all blocks it hasn't accepted yet, and the amount of stake that has accepted this block.
    - The mechanism then waits for its next invocation, in order to see if it has accepted blocks correlated with enough stake.
    - If there is too much stake that has accepted blocks by other nodes correlated to it that the node hasn't accepted,
      then the mechanism announces the node is behind, and returns the time period between the two invocations.
    - The mechanism sums the total time it has detected the node is behind, until a sampling concludes it is not behind, and then
      the total time is nullified.
    
    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    903651e View commit details
    Browse the repository at this point in the history
  9. Rebase on top of unified engine

    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    8845cf1 View commit details
    Browse the repository at this point in the history
  10. Fallback to bootstrapping when detected we're behind

    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    31d0184 View commit details
    Browse the repository at this point in the history
  11. Only set VM's state once

    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 17, 2024
    Configuration menu
    Copy the full SHA
    4758bc0 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2024

  1. shutdown VMs

    Signed-off-by: Yacov Manevich <[email protected]>
    yacovm committed Oct 18, 2024
    Configuration menu
    Copy the full SHA
    9d8a7b1 View commit details
    Browse the repository at this point in the history