Orchestrated RPi cluster

Objective

Build a HPC home-lab based on RPIs managed by a K8S cluster on a laptop. Basically, the laptop is used to host the K8S cluster and all services required in an HPC cluster. The RPis are used as compute nodes and 1 is acting as the storage node delivering K8S and backup storage spaces.

Warning: security is implemented in a minimal manner

HTTPS on UI: YES
SSO configuration to be done manually after Keycloak is deployed
RBAC: default model provided by Helm chart/deployment files
No service mesh / no HTTPS between pod services

Content

Services front-ends:

Forecastle as application dashboard solution
Keycloak as SSO solution
Harbor as registry & chart museum solution
Grafana / Prometheus / Karama / node_exporters as monitoring solution
Dashboard K8S as overall admin solution
Netbox as DCIM solution
JupyterHub as developers dream
Discourse as FAQ solution
Velero as backup solution
ELK stack as logging solution (optionnal since it requires plenty of CPUs/RAM on the Windows machine)
Singularity Enterprise as job container solution (optionnal since it requires a valid license - not available on this github)

End-user additonal services:

Ansible pod as deployment solution
Slurm as Workload scheduler solution

Storage node services:

minio as S3 solution
NFS server as dynamic storage class for K8S services

Screenshots

RPi Cluster View ( 3D printed racks )

Apps dashboard

K8S lens ... dashboard on steroids

minio dashboard - backup content on S3

Netbox: 'The source of truth' :-)

Grafana Cluster Status

Grafana RPI Cluster Status

Grafana SLURM monitoring

Kibana ElasticSearch Fluentd Metricbeat (Work in progress)

Prometheus Targets

K8S dashboard nodes

K8S dashboard services

Harbor registry dashboard

High-level design

Compatibility

Branch / Kubernetes Version	1.19.X	1.20.X
Master/Workers	✅	✅

✅ Compatible
⚠️ Has issues
❌ Incompatible

System description

RPis[01]:
- Specs: RPi 3 or 4, with attached USB disk
- role: storage node with NFS server
- role: ansible deployment node (not yet implemented)
RPis[02-0x]: compute nodes
Laptop: Vagrant-based K8S server providing the management stack of this tiny HPC-RPi cluster.
- Specs: CPU: VT-x capable CPU, RAM: min: 8GB memory (without the EFK stack), desired: 16GB, max: no limits
- Windows 10 PRO - not tested with Linux, but should work
- Vagrant
- Virtualbox
1 switch
Some RJ45 cables
1 multi-USB power station

Network and Laptop configuration

The aim is to have the laptop connected using the Wifi to the external world, and use the Laptop Eth0 interface to connect the RPi Cluster. This needs some preparation:

If your laptop does not have an Ethernet port (yeah, many now just have a Wifi adapter), you can buy a USB-C 10 adapters with an ethernet port
Update the registry key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\SharedAccess\Parameters and set the values to 10.0.0.1 instead of 192.168.137.1
Configure Windows ISC on the Wifi adapter to share the wifi and setup the Windows 10 built-in DHCP server
Connect the switch to the Laptop ethernet port, power-up the RPis, after some time you should get all online with an ip on the 10.0.0.0/24 subnet

Using this setup, the Vagrant VMs will be assigned the following IPs:

kv-master-0: 10.0.0.210
kv-worker-0: 10.0.0.220
kv-worker-1: 10.0.0.221
kv-worker-2: 10.0.0.222

Update the file: C:\Windows\System32\drivers\etc\hosts to add the node[01-05] IPs, it will be more convenient.
Install some Vagrant plugins

$ vagrant plugin install vagrant-scp
$ vagrant plugin install vagrant-winnfsd
$ vagrant plugin install vagrant-cachier

Default docker registry setup on worker nodes (can be customized), but it is automatically like this by default:

$ cat /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "insecure-registries": ["https://registry.home.lab,"https://harbor.home.lab"],
  "registry-mirrors": ["https://docker.io","https://quay.io","https://hub.docker.com"],
  "storage-driver": "overlay2"
}
$  systemctl daemon-reload && systemctl restart docker

Note: easily populate your local docker repositories with all running images of your K8S cluster. This is used when the cluster is up and running to populate the Harbor registry with all running docker images. From the workers do:

docker login --username=admin --password=Harbor12345
for i in $(docker images --format="{{.Repository}}:{{.Tag}}" | grep -v home.lab); do 
  docker tag $i harbor.home.lab/library/$i
  docker push harbor.home.lab/library/$i
done

Raspeberry PIs configuration

build a stand or buy a RPis cluster case
flash all the RPi SD with the latest Raspbian version
connect all power/switch ports
power up

Those steps will be automated later on...

Update all /etc/dhcpcd.conf or /etc/network/interfaces (depending on the RPi version) to fix the IPs of the nodes:
- node01: 10.0.0.2
- node02: 10.0.0.3
- node03: 10.0.0.4
- node04: 10.0.0.5
- node05: 10.0.0.6 Also fix in /etc/dhcpcd.conf the line about:
```
static routers=10.0.0.1
static domain_name_servers=10.0.0.20 8.8.8.8
```
or in /etc/network/interfaces:
```
gateway 10.0.0.1
dns-nameservers 10.0.0.20 8.8.8.8
```
The node01.home.lab is acting as a special node:
- apt-cacher-ng
```
apt-get install apt-cacher-ng
```
On the other nodes, just add a file: /etc/apt/apt.conf.d/02proxy and add: Acquire::http::proxy "http://node01.home.lab:3142";

In order to provide NFS shares to the K8S cluster, it is good to split it into partitions to have dedicated storage for the apps. Note: the number of partitions needed will depeend on how many persistent volumes you will need for the apps (5 + 1 is a good number for the default setup) So, you can split your HDD on the node01 RPi using those commands:

$ fdisk -l
** retrieve the /dev/sdX matching the HDD **

If your HDD is not configured with 6 partitions: Note:

/mnt/sda1..5: 10GB - ext4 - static K8S Volumes.
/mnt/sda6: at least 50GB - ext4 - should be bigger since it acts as NFS dynamic volume for K8S.

$ parted /dev/sda
** here use the "mkpart primary ext4 xxG yyG" to create the different partitions **
$ mkfs.ext4 /dev/sdaX
** update the fstab to identify the new /dev/sdaX partitions **
$ mkdir /mnt/usbX
$ chown nobody:nogroup -R /mnt/usbX
$ chmod 777 -R /mnt/usbX
$ vi /etc/exports
** add "/mnt/usbX        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)"
$ exportfs -ra

example:

Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: 00G2B0C-00PX
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: DE0AC0F0-D4B3-4ABB-A2AB-733DCF8E40C3

Device         Start       End   Sectors  Size Type
/dev/sda1     999424  19531775  18532352  8.9G Linux filesystem
/dev/sda2   19531776  39061503  19529728  9.3G Linux filesystem
/dev/sda3   39061504  58593279  19531776  9.3G Linux filesystem
/dev/sda4   58593280  78125055  19531776  9.3G Linux filesystem
/dev/sda5   78125056  97656831  19531776  9.3G Linux filesystem
/dev/sda6   97656832 214843750 117186919 55.9G Linux filesystem
/dev/sda7  214845440 351561727 136716288 65.2G Linux filesystem
root@node01:/home/pi# cat <<EOF >>/etc/exports
/mnt/usb1        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb2        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb3        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb4        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb5        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb6        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/mnt/usb7        10.0.0.0/24(rw,sync,no_root_squash,no_subtree_check)
EOF
root@node01:/home/pi# mkdir /mnt/usb1
root@node01:/home/pi# mkdir /mnt/usb2
root@node01:/home/pi# mkdir /mnt/usb3
root@node01:/home/pi# mkdir /mnt/usb4
root@node01:/home/pi# mkdir /mnt/usb5
root@node01:/home/pi# mkdir /mnt/usb6
root@node01:/home/pi# mkdir /mnt/usb7
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb1
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb2
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb3
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb4
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb5
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb6
root@node01:/home/pi# chown nobody:nogroup -R /mnt/usb7
root@node01:/home/pi# chmod 777 -R /mnt/usb1
root@node01:/home/pi# chmod 777 -R /mnt/usb2
root@node01:/home/pi# chmod 777 -R /mnt/usb3
root@node01:/home/pi# chmod 777 -R /mnt/usb4
root@node01:/home/pi# chmod 777 -R /mnt/usb5
root@node01:/home/pi# chmod 777 -R /mnt/usb6
root@node01:/home/pi# chmod 777 -R /mnt/usb7
root@node01:/home/pi# exportfs -ra

NTP time on RPIs

$ vi /etc/systemd/timesyncd.conf
... NTP=10.0.0.210
$ timedatectl set-ntp true
$ systemctl daemon-reload
$ service systemd-timesyncd restart
$ timedatectl set-timezone Europe/Paris

Services IPs

Fixed Cluster Services endpoints for admins:

NFS Server: 10.0.0.2
DNS: 10.0.0.20
UI entry point: https://apps.home.lab (aka: 10.0.0.10)
SLURM controller for rack01: 10.0.0.9

Workload scheduler with Slurm

Currently using SLURM 18.08.5 - Ubuntu 18.04 (otherwise the slurm node_exporter for Ubuntu cannot be compiled) Note: missleading name of the containers: docker-ubuntu1604-xxx are in fact Ubuntu18.04 :-).

Open MPI - is compiled and running on the RPis.

Example:

$ kubectl get pods -n slurm-ns
NAME                     READY   STATUS    RESTARTS   AGE
slurm-745f46bd9b-26nms   1/1     Running   0          4m28s

$ kubectl exec slurm-745f46bd9b-26nms -n slurm-ns -- sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE   NODELIST
allnodes*    up   infinite      1   unk*   node05
allnodes*    up   infinite      1  drain   node03
allnodes*    up   infinite      2  idle    node[02,04]

How to easily deploy all services on the cluster

Steps to deploy:

Power up the K8S cluster on your laptop By default: 1 Master and 3 workers Update the Vagrantfile to match your subnets.

$ vagrant up

Then, wait 1h or so depending on the speed of your internet speed.

$ vagrant status
$ scp -P 2200 vagrant@kv-master-0:/vagrant/.kube/config ~/.kube/config

From your laptop, check the status of your K8S cluster:

$ kubectl get nodes
NAME          STATUS   ROLES    AGE    VERSION
kv-master-0   Ready    master   4d4h   v1.18.2
kv-worker-0   Ready    <none>   4d4h   v1.18.2
kv-worker-1   Ready    <none>   4d3h   v1.18.2
kv-worker-2   Ready    <none>   4d3h   v1.18.2

Easy way to start all services from the PC

$ wsl
$ create-ca.sh
$ create-certificates.sh <your domain>
$ ./pb-install-all.sh --deploy

Then just wait .... and enjoy. Launch your SLURM jobs :-)

Note: the main UI giving access to all others UI is set to: https://apps.home.lab This UI is protected by keycloak. So you must add users in Keycloak first: https://keycloak.home.lab (User: admin, Password: to be retrieved as a secret: keycloak-http)

Debug tips

usually after the VMs are stopped, at restart kubectl won't work again. It fix it:

login on master node vagrant ssh kv-master-0
run the commands:

sudo -i
swapoff -a
exit
kubectl version

then enjoy!

when you have a worker vm which is stuck or need a reboot, here is how to take it back in the cluster

From the worker node after reboot
1. Reset your node
$ sudo kubeadm reset
2. Turn off the swap
$ sudo swapoff -a
3. start kubeadm join (info found ni the kubeadm-init.out file generated during vagrant deployment)
$ sudo kubeadm join YourMasterNodeIPAddress --token xxxx --discovery-token-ca-cert-hash \
sha256...

Note: If the token expired, just create one from the master by running:
kubeadm token create --print-join-command

RBAC related topics

Currently RBAc is being used in some areas but not all ... Using https://github.com/alcideio/rbac-tool you can get more details about your running cluster

RBAC for K8S dashboard services

RBAC for Grafana

The Hard way if you want to customize something :-)

You can always depp-dive into every repositories and change whatever you need :-). A README.md file is there to tell you how to deploy the part. However, all services have to be deployed in a given order, so follow the same steps as in pb-install-all.sh script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Orchestrated RPi cluster

Objective

Content

Screenshots

High-level design

Compatibility

System description

Network and Laptop configuration

Raspeberry PIs configuration

Services IPs

Workload scheduler with Slurm

How to easily deploy all services on the cluster

Debug tips

RBAC related topics

The Hard way if you want to customize something :-)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Orchestrated RPi cluster

Objective

Content

Screenshots

High-level design

Compatibility

System description

Network and Laptop configuration

Raspeberry PIs configuration

Services IPs

Workload scheduler with Slurm

How to easily deploy all services on the cluster

Debug tips

RBAC related topics

The Hard way if you want to customize something :-)