Skip to content

Commit

Permalink
Merge pull request #283 from product-os/kyle/docs
Browse files Browse the repository at this point in the history
Update RAID 1 instructions and link to meta-balena docs
  • Loading branch information
flowzone-app[bot] authored Nov 4, 2024
2 parents b22bcce + 8533cc0 commit 95ba080
Showing 1 changed file with 32 additions and 42 deletions.
74 changes: 32 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ See [github-runner-vm](https://github.com/product-os/github-runner-vm) and [self
Firecracker allows overprovisioning or oversubscribing of both CPU and memory resources for virtual machines (VMs) running on a host.
This means that the total vCPUs and memory allocated to the VMs can exceed the actual physical CPU cores and memory available on the host machine.

In order to make the most efficient use of host resources, we want to slightly overprovision the host hardware
so if/when all allocated resources are consumed by jobs (e.g. yocto) there would be minimal overlap that could lead to performance degredation.
In order to make the most efficient use of host resources, we want to slightly underprovision the host hardware
so if/when all allocated resources are consumed by jobs (e.g. yocto) there should be no overlap that could lead to performance degredation.

See the [github-runner-vm](https://github.com/product-os/github-runner-vm) README for more.

Expand All @@ -24,13 +24,15 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM
1. [Order](https://robot.your-server.de/order) a suitable machine in an `ES rack` (remote power controls)
2. Download balenaOS production image from the target balenaCloud fleet:
- x64: https://dashboard.balena-cloud.com/fleets/2123949
- ARM64: https://dashboard.balena-cloud.com/fleets/2123948
- x64: <https://dashboard.balena-cloud.com/fleets/2123949>
- ARM64: <https://dashboard.balena-cloud.com/fleets/2123948>
3. For x64 only: [Unwrap](https://github.com/balena-os/balena-image-flasher-unwrap) the image
4. Copy unwrapped image to S3 playground bucket and make public:
```
4. Copy unwrapped image to S3 playground bucket and make public

```shell
aws s3 cp balena.img s3://{{bucket}}/ --acl public-read
```

5. Activate Hetzner Rescue system
6. Reboot or reset server

Expand All @@ -39,54 +41,42 @@ See the [github-runner-vm](https://github.com/product-os/github-runner-vm) READM
> [!NOTE] This leaves the second block device unpaired and empty

1. Download and uncompress unwrapped balenaOS image to `/tmp` using `wget`
2. (Optional) Zero out target disk(s):
```
2. (Optional) Zero out target disk(s)

```shell
for device in nvme{0,1}n1; do
blkdiscard /dev/${device} -f
done
```

3. Download image from S3 via wget (URL is in S3 dashboard)
4. Write image to disk:
```

4. Write image to disk (Check `lsblk` output for block device)

```shell
dd if=balena.img of=/dev/nvme1n1 bs=$(blockdev --getbsz /dev/nvme1n1)
```
(Check `lsblk` output for block device)
5. Check resulting partitions with `fdisk -l /dev/nvme1n1`
6. Reboot
7. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)
8. The machine should provision into the corresponding fleet

5. Reboot
6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)

#### Two drives via RAID1

> [!NOTE] Use `generic-amd64` or `generic-aarch64` balenaOS device type

1. Remove any existing RAID array:
```
mdadm --stop /dev/md127
mdadm --remove /dev/md127
```
2. Create RAID array:
```
mdadm --create --verbose /dev/md127 \
--level=1 \
--raid-devices=2 /dev/nvme{0,1}n1 \
--metadata=1.0
```
3. Increase (re)sync speed:
```
sysctl -w dev.raid.speed_limit_min=500000
sysctl -w dev.raid.speed_limit_max=5000000
```
4. Download image from S3 via wget (URL is in S3 dashboard)
5. Write image to RAID array:
```
dd if=balena.img of=/dev/md127 bs=$(blockdev --getbsz /dev/md127)
```
6. Check resulting partitions with `fdisk -l /dev/md127`
7. Monitor synchronization progress:
1. Follow RAID1 setup steps [here](https://github.com/balena-os/meta-balena/blob/master/docs/raid.md)
2. Download image from S3 via wget (URL is in S3 dashboard)
3. Write image to RAID array

```shell
dd if=balena.img of=/dev/md/balena bs=4096
```

4. Monitor synchronization progress

```shell
watch cat /proc/mdstat
```
8. Reboot when 100% synchronized
9. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)
10. The machine should provision into the corresponding fleet

5. Reboot when 100% synchronized
6. Manually power cycle again via the Robot dashboard to work around [this issue](https://balena.fibery.io/Inputs/Pattern/Generic-x86_64-GPT-with-sw-RAID1-does-not-come-up-after-initial-flash-without-additional-power-cycle-4510)

0 comments on commit 95ba080

Please sign in to comment.