Skip to content

BiBiGrid is a tool for an easy cluster setup inside a cloud environment.

Notifications You must be signed in to change notification settings

BiBiServ/bibigrid

Repository files navigation

BiBiGrid

BiBiGrid is a cloud cluster creation and management framework for OpenStack (and more providers in the future).

BiBiGrid uses Ansible to configure standard Ubuntu 20.04/22.04 LTS as well as Debian 11 cloud images. Depending on your configuration BiBiGrid can set up an HCP cluster for grid computing (Slurm Workload Manager, a shared filesystem (on local discs and attached volumes), a cloud IDE for writing, running and debugging (Theia Web IDE) and many more.

Note The latest version is currently work in progress. Future changes are likely. Not all features of the previous version are available, but they will come soon. The previous version is still available, but not maintained anymore.

Getting Started

For most users the Hands-On BiBiGrid Tutorial is the best entry point.

However, if you are already quite experienced with OpenStack and the previous BiBiGrid the following brief explanation might be just what you need.

Brief, technical BiBiGrid overview

How to configure a cluster?

Configuration File: bibigrid.yml

A template file is included in the repository (bibigrid.yml).

The cluster configuration file consists of a list of configurations. Every configuration describes the provider specific configuration. The first configuration additionally contains all the keys that apply to the entire cluster (roles for example). Currently only clusters with one provider are possible, so focus only on the first configuration in the list.

The configuration template bibigrid.yml contains many helpful comments, making completing it easier for you.

You need more details?

Cloud Specification Data: clouds.yml

To access the cloud, authentication information is required. You can download your clouds.yaml from OpenStack.

Your clouds.yaml is to be placed in ~/.config/bibigrid/ and will be loaded by BiBiGrid on execution.

You need more details?

Quick First Time Usage

If you haven't used BiBiGrid1 in the past or are unfamiliar with OpenStack, we heavily recommend following the tutorial instead.

Preparation

  1. Download (or create) the clouds.yaml (and optionally clouds-public.yaml) file as described above.
  2. Place the clouds.yaml into ~/.config/bibigrid
  3. Fill the configuration, bibigrid.yml, with your specifics. At least you need: A master instance with valid type and image, a region, an availability zone, an sshUser (most likely ubuntu) and a subnet. You probably also want at least one worker with a valid type, image and count.
  4. If your cloud provider runs post-launch services, you need to set the waitForServices key appropriately which expects a list of services to wait for.
  5. Create a virtual environment from bibigrid/requirements.txt. See here for more detailed info.
  6. Take a look at First execution

First execution

Before follow the steps described at Preparation.

After cloning the repository navigate to bibigrid. In order to execute BiBiGrid source the virtual environment created during preparation. Take a look at BiBiGrid's Command Line Interface if you want to explore for yourself.

A first execution run through could be:

  1. ./bibigrid.sh -i [path-to-bibigrid.yml] -ch: checks the configuration
  2. ./bibigrid.sh -i 'bibigrid.yml -i [path-to-bibigrid.yml] -c': creates the cluster (execute only if check was successful)
  3. Use BiBiGrid's create output to investigate the created cluster further. Especially connecting to the ide might be helpful. Otherwise, connect using ssh.
  4. While in ssh try sinfo to printing node info
  5. Run srun -x $(hostname) hostname to power up a worker and get its hostname.
  6. Run sinfo again to see the node powering up. After a while it will be terminated again.
  7. Use the terminate command from BiBiGrid's create output to shut down the cluster again. All floating-ips used will be released.

Great! You've just started and terminated your first cluster using BiBiGrid!

Troubleshooting

If your cluster doesn't start up, please first make sure your configurations file is valid (-ch). If it is not, try to modify the configurations file to make it valid. Use -v or -vv to get a more verbose output, so you can find the issue faster. Also double check if you have sufficient permissions to access the project. If you can't make your configurations file valid, please contact a developer. If that's the case, please contact a developer and/or manually check if your quotas are exceeded. Some quotas can currently not be checked by bibigrid.

Whenever you contact a developer, please send your logfile along.

Documentation

If you would like to learn more about BiBiGrid please follow a fitting link:

Differences to old Java BiBiGrid
  • BiBiGrid no longer uses RC- but cloud.yaml-files for cloud-specification data. Environment variables are no longer used (or supported). See Cloud Specification Data.
  • BiBiGrid has a largely reworked configurations file, because BiBiGrid core supports multiple providers this step was necessary. See Configuration
  • BiBiGrid currently only implements the provider OpenStack.
  • BiBiGrid only starts the master and will dynamically start workers using slurm when they are needed. Workers are powered down once they are not used for a longer period.
  • BiBiGrid lays the foundation for clusters that are spread over multiple providers, but Hybrid Clouds aren't fully implemented yet.

Development

Development-Guidelines

https://github.com/BiBiServ/Development-Guidelines

On implementing concrete providers

New concrete providers can be implemented very easily. Just copy the provider.py file and implement all methods for your cloud-provider. Also inherit from the provider class. After that add your provider to the providerHandler lists; giving it a associated name for the configuration files. By that, your provider is automatically added to BiBiGrid's tests and regular execution. By testing your provider first, you will see whether all provider methods are implemented as expected.