Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream Updates - Mon Dec 16 00:16:10 UTC 2024 #980

Closed
github-actions bot opened this issue Dec 16, 2024 · 0 comments
Closed

Upstream Updates - Mon Dec 16 00:16:10 UTC 2024 #980

github-actions bot opened this issue Dec 16, 2024 · 0 comments
Assignees
Labels
AUTO: Upstream Updates Auto-generated from tracking upstream repos

Comments

@github-actions
Copy link
Contributor

SIMD

Opened

Closed

Agave Wiki

b71362244f6ac998d989250364793715fd0f76d1^..9bcfb3e0d271ff4e693ee113959ece58ec61c402

diff --git a/2024-12-11-Testnet-Restart.md b/2024-12-11-Testnet-Restart.md
new file mode 100644
index 0000000..97bc186
--- /dev/null
+++ b/2024-12-11-Testnet-Restart.md
@@ -0,0 +1,113 @@
+## Edit
+As of 2024-12-12 18:56 UTC testnet is back online. The instructions below are no longer relevant. Nodes that haven't yet joined the cluster will need to update their shred version and start normally:
+
+    --expected-shred-version 64506 n+
+***
+This testnet restart is NOT urgent. Follow these instructions when you have time, but don’t skip sleep or disrupt other plans for this.
+
+## Summary
+|Attribute|Value|
+|---------|-----|
+|Validator version|Agave: v2.1.5 </br> Frankendancer: v0.202.20016|
+|Snapshot slot|306450862|
+|Restart slot|306450862|
+|Shred version|64506|
+|Expected bank hash|BiGFLfFewfTB2asBRLjwRL6z7VNfuvYraS3H7RfQNCrf|
+
+
+## Step 1. Stop validator process if you haven’t already
+
+## Step 2: Install Agave v2.1.5 or Frankendancer v0.202.20016
+This is necessary in order to create the correct snapshot in step 3.
+
+Agave:  `agave-install init v2.1.5`
+
+Frankendancer: Install `v0.202.20016`
+
+## Step 3. Create snapshot
+This command creates a snapshot but removes 3 activated v1.18 feature gate accounts.
+
+    agave-ledger-tool --ledger <ledger-path> create-snapshot n+    --incremental n+    --snapshot-archive-path  <snapshot-path> n+    --hard-fork 306450862 n+    --  306450862 <snapshot-path>
+
+
+The output should include this at (or near) the end:
+```
+    Successfully created snapshot for slot 306450862, hash BiGFLfFewfTB2asBRLjwRL6z7VNfuvYraS3H7RfQNCrf: /home/sol/ledger-snapshots/incremental-snapshot-<BASE_SLOT>-306450862-<SNAPSHOT_HASH>.tar.zst
+    Shred version: 64506
+```
+
+Note that each operator's snapshot file name may contain different base slot number and hash, but 
+* the bank hash should be BiGFLfFewfTB2asBRLjwRL6z7VNfuvYraS3H7RfQNCrf
+* the second slot number should be 306450862
+* the shred version should be 64506
+
+Once you have created a snapshot move all the other snapshots to a backup directory, so your snapshot directory contains one full snapshot and one incremental snapshot. Note that the <BASE_SLOT> in these two filenames should match.
+
+    snapshot-<BASE_SLOT>-<BASE_SNAPSHOT_HASH>.tar.zst
+    incremental-snapshot-<BASE_SLOT>-306450862-<SNAPSHOT_HASH>.tar.zst
+
+If you fail to create a snapshot see the appendix for possible fixes.
+
+## Step 4: Update startup config and start your validator
+### Agave
+Add these arguments to your validator startup script:
+
+    --wait-for-supermajority 306450862 n+    --expected-shred-version 64506 n+    --expected-bank-hash BiGFLfFewfTB2asBRLjwRL6z7VNfuvYraS3H7RfQNCrf n+
+
+As it starts, the validator will load the snapshot for slot `306450862` and wait for 80% of the stake to come online before producing/validating new blocks. 
+
+To confirm your restarted validator is correctly waiting for 80% stake, look for this periodic log message to confirm it is waiting:
+
+    INFO  solana_core::validator] Waiting for 80% of activated stake at slot 306450862 to be in gossip...
+
+And if you have RPC enabled, ask it for the current slot:
+
+    solana --url http://127.0.0.1:8899 slot
+
+Any number other than `306450862` means you did not complete the steps correctly.
+
+Once started you should see log entries for “active stake” visible in gossip and “waiting for 80% of stake” to be visible. You can track these to see how the stake progresses.
+
+
+***
+
+## Appendix (use this only if step 3 failed)
+
+If you get an error like this:
+
+    Error: Slot 306450862 is not available
+
+Or this:
+
+    Unable to process blockstore from starting slot <slot> to 306450862; the ending slot is less than the starting slot. The starting slot will be the latest snapshot slot, or genesis if the --no-snapshot flag is specified or if no snapshots are found.
+
+Your snapshots directory contains a snapshot that is for a slot `>306450862`. If you also have a snapshot for slot `<=306450862` then move snapshots for slots `>306450862` to a backup directory and run the `agave-ledger-tool` command again. If you do not have a snapshot for slot `<=306450862` then you will need to download a snapshot
+
+If you successfully created a snapshot, resume the instructions above starting at Step 4. If you are unable to create a snapshot, follow the instructions below on downloading a snapshot.
+
+If you couldn’t produce your snapshot locally follow these appendix steps
+
+### Step 1: Download a snapshot from a known validator
+
+If you are unable to generate a snapshot locally for slot `306450862` you will need to download one from a known validator. Add these lines to your startup script.
+
+    --known-validator 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on n+    --expected-shred-version 64506 n+
+Remove the flag `--no-snapshot-fetch` in your startup script if it is present.
+
+### Step 2: After download, restart
+
+Verify that you have a new snapshot in your snapshot directory.  If the snapshot is done downloading, stop your validator process.
+
+Add the flag `--no-snapshot-fetch` to your startup script
+
+Resume the instructions above starting at Step 4.
diff --git a/Feature-Gate-Tracker-Schedule.md b/Feature-Gate-Tracker-Schedule.md
index 99670cd..1ed6b3a 100644
--- a/Feature-Gate-Tracker-Schedule.md
+++ b/Feature-Gate-Tracker-Schedule.md
@@ -7,7 +7,7 @@ The version floor is the current minimum supported software version for a cluste
 
 || Testnet  | Devnet   | Mainnet Beta |
 | :-----: | :------: | :------: | :----------: |
-| Current floor | v2.0.15  | v2.0.9 | v1.18.22 |
+| Current floor | v2.0.15  | v2.0.9 | v2.0.9 |
 | Next expected floor * | -- | -- | -- |
 
 * These dates are tentative. Please keep an eye out for comms as the dates near
@@ -17,30 +17,29 @@ The version floor is the current minimum supported software version for a cluste
 ### Pending Mainnet Beta Activation
 | Key | Version | Testnet | Devnet | Description | Owner |
 |-----|---------|---------|--------|-------------|-------|
-| 7uZBkJXJ1HkuP6R3MJfZs7mLwymBcDbKdqbF51ZWLier | v1.18.2 | 674 | 734 | Enable chained Merkle shreds | behzadnouri |
 | ed9tNscbWLYBooxWA7FE2B5KHWs8A6sxfY8EzezEcoo | v2.0.4 | 707 | 791 | Use verify_strict for signature verification in ed25519 precompile | samkim-crypto |
 | FuS3FPfJDKSNot99ECLXtp3rueq36hMNStJkPJwWodLh | v2.0.0 | 709 | 793 | error invalid curve/op id | samkim-crypto |
 | wLckV1a64ngtcKPRGU4S4grVTestXjmNjxBjaKZrAcn | 1.18 | 710 | 794 | cost model uses number of requested write locks | apfitzge |
 | GDH5TVdbTPUpRnXaRyQqiKUa7uZAbZ28Q2N9bhbKoMLm | v1.14 | 711 | 797 | loosen cpi restrictions | jstarry |
 | 7bTK6Jis8Xpfrs8ZoUfiMDPazTcdPcTWheZFJTA5Z6X4 | v2.0.0 | 712 |  | SIMD0148: MoveStake and MoveLamports | 2501babe |
+| EQUMpNFr7Nacb1sva56xn1aLfBxppEoSBH8RRVdkcD1x | v2.1.1, v2.0.15 | 713 | 800 | Disable account loader special case | Lichtso |
+| zkhiy5oLowR7HY4zogXjCjeMXyruLqBwSWH21qcFtnv | v2.0.0 | 714 | 801 | Enable ZK ElGamal Proof program | samkim-crypto |
+| BtVN7YjDzNE6Dk7kTT7YTDgMNUZTNgiSJgsdzAeTg2jF | v2.0.0 | 715 | 802 | Removing unwanted rounding in fee calculation | tao-stones |
 
 ### Pending Devnet Activation
 | Key | Version | Testnet | Devnet | Description | Owner |
 |-----|---------|---------|--------|-------------|-------|
-| EQUMpNFr7Nacb1sva56xn1aLfBxppEoSBH8RRVdkcD1x | v2.1.1, v2.0.15 | 713 |  | Disable account loader special case | Lichtso |
-| zkhiy5oLowR7HY4zogXjCjeMXyruLqBwSWH21qcFtnv | v2.0.0 | 714 |  | Enable ZK ElGamal Proof program | samkim-crypto |
-| BtVN7YjDzNE6Dk7kTT7YTDgMNUZTNgiSJgsdzAeTg2jF | v2.0.0 | 715 |  | Removing unwanted rounding in fee calculation | tao-stones |
 | 3opE3EzAKnUftUDURkzMgwpNgimBAypW1mNDYH4x4Zg7 | v2.0.0 | 716 |  | Reward full priority fee to validators | tao-stones |
 | CLCoTADvV64PSrnR6QXty6Fwrt9Xc6EdxSJE4wLRePjq | v2.0.0 | 717 |  | SIMD0127: sol_get_sysvar | 2501babe |
 | tSynMCspg4xFiCj1v3TDb4c7crMR5tSBhLz4sF7rrNA | v2.0.0 | 718 |  | Add TowerSync ix | AshwinSekar |
 | 4eohviozzEeivk1y9UbrnekbAFMDQyJz5JjA9Y6gyvky | v2.0.7 | 719 |  | Feature Gate: Programify Feature Gate | buffalojoec |
+| 2Fr57nzzkLYXW695UdDxDeR5fhnZWSttZeZYemrnpGFV | v2.0.7 | 720 |  | Feature Gate: Migrate Config program to Core BPF | buffalojoec |
+| 8U4skmMVnF6k2kMvrWbQuRUT3qQSiTYpSjqmhmgfthZu | v2.0.0 | 722 |  | Add new unwritable reserved accounts | jstarry |
 
 
 ### Pending Testnet Activation
 | Key | Version | Testnet | Devnet | Description | Owner |
 |-----|---------|---------|--------|-------------|-------|
-| 2Fr57nzzkLYXW695UdDxDeR5fhnZWSttZeZYemrnpGFV | v2.0.7 | 720 |  | Feature Gate: Migrate Config program to Core BPF | buffalojoec |
-| 8U4skmMVnF6k2kMvrWbQuRUT3qQSiTYpSjqmhmgfthZu | v2.0.0 | 999999999 |  | Add new unwritable reserved accounts | jstarry |
 | CGB2jM8pwZkeeiXQ66kBMyBR6Np61mggL7XUsmLjVcrw | v2.1.0 | 999999999 |  | skip rent rewrites | jeffwashington |
 | CJzY83ggJHqPGDq8VisV3U91jDJLuEaALZooBrXtnnLU | v2.1 | 999999999 |  | Disable rent fees collection | HaoranYi |
 | sr11RdZWgbHTHxSroPALe6zgaT5A1K9LcE4nfsZS4gi | v2.1.1 | 999999999 |  | Enable secp256r1 precompile | samkim-crypto |
diff --git a/Snapshot-Guide.md b/Snapshot-Guide.md
new file mode 100644
index 0000000..8fa7850
--- /dev/null
+++ b/Snapshot-Guide.md
@@ -0,0 +1,45 @@
+This guide is for operators who have had trouble generating a snapshot in the past, or would like to better understand how snapshots and ledger work together.
+
+## Context
+In order to process a transaction, the validator needs information about the pre-existing state of the blockchain. That state could be determined by starting at genesis and replaying every block prior to the transaction of interest. Replaying that many transactions is impractical, so instead we use snapshots.
+
+The agave-validator process stores several kinds of state on disk, including ledger and snapshots:
+* blockstore - a collection of transactions, packed into blocks. Due to space limitations most nodes only retain the last 1-2 days worth of transactions in their local ledger.
+  * This dates corresponds to the `rocksdb` directory
+* snapshots
+  * full - a complete set of information about a specific block, containing all the state necessary to replay transactions for the next block. These are named `snapshot-<slot>-<hash>.tar.zst`.
+  * incremental - A set of differences that can be applied to a full snapshot to fast-forward to a subsequent block without replaying all the transactions in between. These are named `incremental-snapshot-<base slot>-<slot>-<hash>.tar.zst`.
+
+By default, Agave generates a full snapshot every 25,000 blocks and an incremental snapshot every 100 blocks. 
+
+Transactions can only be replayed going forward, not in reverse, so if you have a snapshot for slot S, and a ledger containing nearby blocks you can generate a snapshot for slot S+1, S+2, etc, but not S-1 or earlier slots.
+
+In order to generate a snapshot for slot X you need:
+* A snapshot for slot S, where S < X
+  * This can be a full snapshot at slot S OR
+  * A full snapshot at slot R along with an incremental snapshot at slot S that is based on slot R full snapshot
+* A blockstore containing all the blocks from slots (S, X]
+
+## Common pitfalls
+
+There are three common reasons that might prevent an operator from creating a snapshot at slot X.
+
+### 1. All available snapshots are at some slot T where T > X
+
+**Cause:** This could happen if your validator continues running after slot X. The validator continually makes new snapshots and the newest snapshots are retained (as defined by snapshot retention flags). When new snapshots are created, older snapshots are deleted in FIFO order.
+
+**Solution:** As previously mentioned, it is not possible to replay blocks backwards. Thus, these newer snapshots are incapable of producing a snapshot at the earlier slot X. The solution is to be proactive and ensure your node halts at the appointed time when testnet has planned restarts.
+
+### 2. A suitable snapshot at some slot S < X is available, but the blockstore doesn't contain all the blocks in the range (S, X]
+
+**Cause:** This could happen if the validator goes offline (manual stop, crash, etc) before the cluster reaches slot X.
+
+**Solution:** For planned restarts, the first 33% of stake to halt their nodes usually end up in this situation. For future testnet restarts, Anza and the Solana Foundation will halt their nodes just before the appointed time in hopes of preventing operators from getting into this situation. For an (unplanned) outage, this is somewhat luck of the draw but avoid manually stopping your node until instructed to do so.
+
+### 3. A suitable snapshot at some slot S < X is available and the necessary blocks (S, X] are available in the blockstore, but an additional snapshot exists at slot T where T > X
+
+**Cause:** agave-ledger-tool always tries to use the latest snapshot available
+
+**Solution:** Examine the slot numbers in the snapshot filenames and move snapshots at slot T where T > X to some different, backup directory. This will allow `agave-ledger-tool` to find the correct snapshot at slot S where S < X
+
+
diff --git a/_Sidebar.md b/_Sidebar.md
index fe8e078..0255ee4 100644
--- a/_Sidebar.md
+++ b/_Sidebar.md
@@ -10,6 +10,7 @@
   * [General Debugging](General-Debugging)
   * [Debugging Consensus Failures](Debugging-Consensus-Failures)
   * [Incremental Snapshots](Incremental-Snapshots)
+  * [Snapshot Guide](Snapshot-Guide)
 * **Policy**
   * [Backport Guidelines](Backport-Guidelines)
 * **Schedule**
@@ -26,4 +27,5 @@
   * [2024-08-26 Testnet Restart](2024-08-26-Testnet-Restart)
   * [2024-10-09 Testnet Restart](2024-10-09-Testnet-Rollback-and-Restart)
   * [2024-10-16 Testnet Restart](2024-10-16-Testnet-Rollback-and-Restart)
+  * [2024-12-11 Testnet Restart](2024-12-11-Testnet-Restart)
   
 No newline at end of file 
@github-actions github-actions bot added the AUTO: Upstream Updates Auto-generated from tracking upstream repos label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AUTO: Upstream Updates Auto-generated from tracking upstream repos
Projects
None yet
Development

No branches or pull requests

1 participant