Note
First take a look at our Hands-On BiBiGrid Tutorial and our example bibigrid.yml. This documentation is no replacement for those, but provides greater detail - too much detail for first time users.
The configuration file (often called bibigrid.yml
) contains important information about cluster creation.
The cluster configuration holds a list of configurations where each configuration has a specific
cloud (location) and infrastructure (e.g. OpenStack). For single-cloud use cases you just need a single configuration.
However, you can use additional configurations to set up a multi-cloud.
The configuration file is best stored in ~/.config/bibigrid/
. BiBiGrid starts its relative search there.
If you have a single-cloud use case, you can skip ahead.
Only the first configuration holds a master
key (also called master configuration
).
Every following configuration must hold a vpngtw
key.
Later, this vpngtw allows BiBiGrid to connect multiple clouds.
Here you can get a technical overview regarding BiBiGrid's multi-cloud setup.
Apart from the master key the master configuration (first configuration) also holds all information that is -
simply put - true over the entire cluster. We also call those keys global
. Keys that belong only to a single cloud configuration are called local
.
For example whether the master works alongside the workers is a general fact (global). Therefore, it is stored within the master configuration.
sshPublicKeyFiles
expects a list of public keyfiles to be registered on every instance. After cluster creation, you
or others can use the corresponding private key to log into the instances.
sshPublicKeyFiles:
- /home/user/.ssh/id_ecdsa_colleague.pub
Warning: If a volume has an obscure filesystem, this might overwrite your data!
If True
all masterMounts will be automatically mounted by BiBiGrid if possible.
If a volume is not formatted or has an unknown filesystem, it will be formatted to ext4
.
Default False
.
masterMounts
expects a list of volumes and snapshots. Those will be attached to the master. If any snapshots are
given, volumes are first created from them. Volumes are not deleted after Cluster termination.
What is mounting?
Mounting adds a new filesystem to the file tree allowing access.
nfsShares
expects a list of folder paths to share over the network using nfs.
In every case, /vol/spool/
is always an nfsShare.
This key is only relevant if the nfs key is set True
.
If you would like to share a masterMount, take a look here.
What is NFS?
NFS (Network File System) is a stable and well-functioning network protocol for exchanging files over the local network.
Yet to be explained and implemented.
- file: SomeFile
hosts: SomeHosts
name: SomeName
vars: SomeVars
vars_file: SomeVarsFile
Yet to be explained and implemented.
- hosts: SomeHost
name: SomeName
galaxy: SomeGalaxy
git: SomeGit
url: SomeURL
vars: SomeVars
vars_file: SomeVarsFile
In general, this key is ignored.
It expects True
or False
and helps some specific users to create a filesystem to their liking. Default is False
.
If True
, master will store DNS information for his workers. Default is False
.
More information.
If False
, the cluster will start without the job scheduling system slurm.
This is relevant to the fewest. Default is True
.
If True
, the monitoring solution zabbix will be installed on the master. Default is False
.
If True
, nfs is set up. Default is False
.
If True
, Theia Web IDE is installed.
After creation connection information is printed.
If False
, master will no longer help workers to process jobs. Default is True
.
If False
, master will not be created with an attached floating ip. Default is True
.
Expects a list of services to wait for.
This is required if your provider has any post-launch services interfering with the package manager. If not set,
seemingly random errors can occur when the service interrupts ansible's execution. Services are
listed on de.NBI Wiki at Computer Center Specific
(not yet).
In order to save valuable floating ips, BiBiGrid can also make use of a gateway to create the cluster. For more information on how to set up a gateway, how gateways work and why they save floating ips please continue reading here.
BiBiGrid needs the gateway-ip and a function that maps ips of nodes behind the gateway (private nodes) to the port over which you can connect to said node over the gateway.
In the example below the gateway-ip is 123.123.123.42 (ip of the gateway node) and the port function is 30000 + oct4.
Hereby, Oct4 stands for the fourth octet of the private node's ip (the last element). You can use your own custom port function
using all octets if needed.
A private node with ip "123.123.123.12" is reachable over 123.123.123.42:30012 (because the fourth octet is 12).
gateway:
ip: 123.123.123.42 # IP of gateway to use
portFunction: 30000 + oct4 # variables are called: oct1.oct2.oct3.oct4
Using gateway also automatically sets useMasterWithPublicIp to False
.
infrastructure
sets the used provider implementation for this configuration. Currently only openstack
is available.
Other infrastructures would be AWS and so on.
cloud
decides which entry in the clouds.yaml
is used. When using OpenStack the entry is named openstack
.
You can read more about the clouds.yaml
here.
workerInstances
expects a list of worker groups (instance definitions with count
key).
If count
is omitted, count: 1
is assumed.
workerInstance:
- type: de.NBI tiny
image: Ubuntu 22.04 LTS (2022-10-14)
count: 2
type
sets the instance's hardware configuration.image
sets the bootable operating system to be installed on the instance.count
sets how many workers of thattype
image
combination are in this work group
openstack image list --os-cloud=openstack | grep active
Currently, images based on Ubuntu 20.04/22.04 (Focal/Jammy) and Debian 11(Bullseye) are supported.
Instead of using a specific image you can also provide a regex.
For example if your images are named by following the pattern Ubuntu 22.04 LTS ($DATE)
and on ly the
most recent release is active, you can use Ubuntu 22.04 LTS \(.*\)
so it always picks the right one.
This regex will also be used when starting worker instances on demand and is therefore mandatory to automatically resolve image updates of the described kind while running a cluster.
There's also a Fallback Option.
flavor
is just the OpenStack terminology for type
.
openstack flavor list --os-cloud=openstack
You can declare a list of features for a worker group. Those are then attached to each node in the worker group. For example:
workerInstance:
- type: de.NBI tiny
image: Ubuntu 22.04 LTS (2022-10-14)
count: 2
features:
- hasdatabase
- holdsinformation
Features allow you to force Slurm to schedule a job only on nodes that meet a certain bool
constraint.
This can be helpful when only certain nodes can access a specific resource - like a database.
If you would like to know more about how features exactly work, take a look at slurm's documentation.
Only in the first configuration and only one:
masterInstance:
type: de.NBI tiny
image: Ubuntu 22.04 LTS (2022-10-14)
You can create features for the master in the same way as for the workers:
masterInstance:
type: de.NBI tiny
image: Ubuntu 22.04 LTS (2022-10-14) # regex allowed
features:
- hasdatabase
- holdsinformation
Exactly one in every configuration but the first:
vpngtw:
type: de.NBI tiny
image: Ubuntu 22.04 LTS (2022-10-14) # regex allowed
If set to true
and an image is not among the active images,
BiBiGrid will try to pick a fallback image for you by finding the closest active image by name that has at least 60% name overlap.
This will not find a good fallback every time.
You can also set fallbackOnOtherImage
to a regex like Ubuntu 22.04 LTS \(.*\)
in which case BiBiGrid will pick an
active image matching that regex.
This can be combined with the regular regex option from the image key.
In that case the fallback regex should be more open to be still useful when the original regex failed to find an active image.
This fallback will also be used when starting worker instances on demand and can be helpful to when image updates occur while running a cluster.
sshUser
is the standard user of the installed images. For Ubuntu 22.04
this would be ubuntu
.
Every region has its own openstack deployment. Every avilability zone belongs to a region.
Find your regions
:
openstack region list --os-cloud=openstack
availability zones allow to logically group nodes.
Find your availabilityZones
:
openstack region list --os-cloud=openstack
subnet
is a block of ip addresses.
Find available subnets
:
openstack subnet list --os-cloud=openstack
If no full DNS service for started instances is available, set localDNSLookup: True
.
Currently, the case in Berlin, DKFZ, Heidelberg and Tuebingen.
You can declare a list of features that are then attached to every node in the configuration. If both worker group or master features and configuration features are defined, they are merged.