From 8533cc0bb7177276b878a7ca01a8c383ccaf4803 Mon Sep 17 00:00:00 2001 From: Kyle Harding Date: Mon, 4 Nov 2024 14:49:57 -0500 Subject: [PATCH] Update RAID 1 instructions and link to meta-balena docs Change-type: patch Signed-off-by: Kyle Harding --- README.md | 74 ++++++++++++++++++++++++------------------------------- 1 file changed, 32 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 9b3f1b8..84d0a71 100644 --- a/README.md +++ b/README.md @@ -11,8 +11,8 @@ See [github-runner-vm](https://github.com/product-os/github-runner-vm) and [self Firecracker allows overprovisioning or oversubscribing of both CPU and memory resources for virtual machines (VMs) running on a host. This means that the total vCPUs and memory allocated to the VMs can exceed the actual physical CPU cores and memory available on the host machine. -In order to make the most efficient use of host resources, we want to slightly overprovision the host hardware -so if/when all allocated resources are consumed by jobs (e.g. yocto) there would be minimal overlap that could lead to performance degredation. +In order to make the most efficient use of host resources, we want to slightly underprovision the host hardware +so if/when all allocated resources are consumed by jobs (e.g. yocto) there should be no overlap that could lead to performance degredation. See the [github-runner-vm](https://github.com/product-os/github-runner-vm) README for more. @@ -24,13 +24,15 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM 1. [Order](https://robot.your-server.de/order) a suitable machine in an `ES rack` (remote power controls) 2. Download balenaOS production image from the target balenaCloud fleet: - - x64: https://dashboard.balena-cloud.com/fleets/2123949 - - ARM64: https://dashboard.balena-cloud.com/fleets/2123948 + - x64: + - ARM64: 3. For x64 only: [Unwrap](https://github.com/balena-os/balena-image-flasher-unwrap) the image -4. Copy unwrapped image to S3 playground bucket and make public: - ``` +4. Copy unwrapped image to S3 playground bucket and make public + + ```shell aws s3 cp balena.img s3://{{bucket}}/ --acl public-read ``` + 5. Activate Hetzner Rescue system 6. Reboot or reset server @@ -39,54 +41,42 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM > [!NOTE] This leaves the second block device unpaired and empty 1. Download and uncompress unwrapped balenaOS image to `/tmp` using `wget` -2. (Optional) Zero out target disk(s): - ``` +2. (Optional) Zero out target disk(s) + + ```shell for device in nvme{0,1}n1; do blkdiscard /dev/${device} -f done ``` + 3. Download image from S3 via wget (URL is in S3 dashboard) -4. Write image to disk: - ``` + +4. Write image to disk (Check `lsblk` output for block device) + + ```shell dd if=balena.img of=/dev/nvme1n1 bs=$(blockdev --getbsz /dev/nvme1n1) ``` - (Check `lsblk` output for block device) -5. Check resulting partitions with `fdisk -l /dev/nvme1n1` -6. Reboot -7. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510) -8. The machine should provision into the corresponding fleet + +5. Reboot +6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510) #### Two drives via RAID1 > [!NOTE] Use `generic-amd64` or `generic-aarch64` balenaOS device type -1. Remove any existing RAID array: - ``` - mdadm --stop /dev/md127 - mdadm --remove /dev/md127 - ``` -2. Create RAID array: - ``` - mdadm --create --verbose /dev/md127 \ - --level=1 \ - --raid-devices=2 /dev/nvme{0,1}n1 \ - --metadata=1.0 - ``` -3. Increase (re)sync speed: - ``` - sysctl -w dev.raid.speed_limit_min=500000 - sysctl -w dev.raid.speed_limit_max=5000000 - ``` -4. Download image from S3 via wget (URL is in S3 dashboard) -5. Write image to RAID array: - ``` - dd if=balena.img of=/dev/md127 bs=$(blockdev --getbsz /dev/md127) - ``` -6. Check resulting partitions with `fdisk -l /dev/md127` -7. Monitor synchronization progress: +1. Follow RAID1 setup steps [here](https://github.com/balena-os/meta-balena/blob/master/docs/raid.md) +2. Download image from S3 via wget (URL is in S3 dashboard) +3. Write image to RAID array + + ```shell + dd if=balena.img of=/dev/md/balena bs=4096 ``` + +4. Monitor synchronization progress + + ```shell watch cat /proc/mdstat ``` -8. Reboot when 100% synchronized -9. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510) -10. The machine should provision into the corresponding fleet + +5. Reboot when 100% synchronized +6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)