From 8533cc0bb7177276b878a7ca01a8c383ccaf4803 Mon Sep 17 00:00:00 2001
From: Kyle Harding <kyle@balena.io>
Date: Mon, 4 Nov 2024 14:49:57 -0500
Subject: [PATCH] Update RAID 1 instructions and link to meta-balena docs

Change-type: patch
Signed-off-by: Kyle Harding <kyle@balena.io>
---
 README.md | 74 ++++++++++++++++++++++++-------------------------------
 1 file changed, 32 insertions(+), 42 deletions(-)

diff --git a/README.md b/README.md
index 9b3f1b8..84d0a71 100644
--- a/README.md
+++ b/README.md
@@ -11,8 +11,8 @@ See [github-runner-vm](https://github.com/product-os/github-runner-vm) and [self
 Firecracker allows overprovisioning or oversubscribing of both CPU and memory resources for virtual machines (VMs) running on a host.
 This means that the total vCPUs and memory allocated to the VMs can exceed the actual physical CPU cores and memory available on the host machine.
 
-In order to make the most efficient use of host resources, we want to slightly overprovision the host hardware
-so if/when all allocated resources are consumed by jobs (e.g. yocto) there would be minimal overlap that could lead to performance degredation.
+In order to make the most efficient use of host resources, we want to slightly underprovision the host hardware
+so if/when all allocated resources are consumed by jobs (e.g. yocto) there should be no overlap that could lead to performance degredation.
 
 See the [github-runner-vm](https://github.com/product-os/github-runner-vm) README for more.
 
@@ -24,13 +24,15 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM
 
 1. [Order](https://robot.your-server.de/order) a suitable machine in an `ES rack` (remote power controls)
 2. Download balenaOS production image from the target balenaCloud fleet:
-   - x64: https://dashboard.balena-cloud.com/fleets/2123949
-   - ARM64: https://dashboard.balena-cloud.com/fleets/2123948
+   - x64: <https://dashboard.balena-cloud.com/fleets/2123949>
+   - ARM64: <https://dashboard.balena-cloud.com/fleets/2123948>
 3. For x64 only: [Unwrap](https://github.com/balena-os/balena-image-flasher-unwrap) the image
-4. Copy unwrapped image to S3 playground bucket and make public:
-   ```
+4. Copy unwrapped image to S3 playground bucket and make public
+
+   ```shell
    aws s3 cp balena.img s3://{{bucket}}/ --acl public-read
    ```
+
 5. Activate Hetzner Rescue system
 6. Reboot or reset server
 
@@ -39,54 +41,42 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM
 > [!NOTE] This leaves the second block device unpaired and empty
 
 1. Download and uncompress unwrapped balenaOS image to `/tmp` using `wget`
-2. (Optional) Zero out target disk(s):
-   ```
+2. (Optional) Zero out target disk(s)
+
+   ```shell
    for device in nvme{0,1}n1; do
        blkdiscard /dev/${device} -f
    done
    ```
+
 3. Download image from S3 via wget (URL is in S3 dashboard)
-4. Write image to disk:
-   ```
+
+4. Write image to disk (Check `lsblk` output for block device)
+
+   ```shell
    dd if=balena.img of=/dev/nvme1n1 bs=$(blockdev --getbsz /dev/nvme1n1)
    ```
-   (Check `lsblk` output for block device)
-5. Check resulting partitions with `fdisk -l /dev/nvme1n1`
-6. Reboot
-7. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)
-8. The machine should provision into the corresponding fleet
+
+5. Reboot
+6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)
 
 #### Two drives via RAID1
 
 > [!NOTE] Use `generic-amd64` or `generic-aarch64` balenaOS device type
 
-1. Remove any existing RAID array:
-   ```
-   mdadm --stop /dev/md127
-   mdadm --remove /dev/md127
-   ```
-2. Create RAID array:
-   ```
-   mdadm --create --verbose /dev/md127 \
-     --level=1 \
-     --raid-devices=2 /dev/nvme{0,1}n1 \
-     --metadata=1.0
-   ```
-3. Increase (re)sync speed:
-   ```
-   sysctl -w dev.raid.speed_limit_min=500000
-   sysctl -w dev.raid.speed_limit_max=5000000
-   ```
-4. Download image from S3 via wget (URL is in S3 dashboard)
-5. Write image to RAID array:
-   ```
-   dd if=balena.img of=/dev/md127 bs=$(blockdev --getbsz /dev/md127)
-   ```
-6. Check resulting partitions with `fdisk -l /dev/md127`
-7. Monitor synchronization progress:
+1. Follow RAID1 setup steps [here](https://github.com/balena-os/meta-balena/blob/master/docs/raid.md)
+2. Download image from S3 via wget (URL is in S3 dashboard)
+3. Write image to RAID array
+
+   ```shell
+   dd if=balena.img of=/dev/md/balena bs=4096
    ```
+
+4. Monitor synchronization progress
+
+   ```shell
    watch cat /proc/mdstat
    ```
-8. Reboot when 100% synchronized
-9. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)
-10. The machine should provision into the corresponding fleet
+
+5. Reboot when 100% synchronized
+6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)