This repository provides tools and instructions for running nextflow pipelines on a Kubernetes cluster.
To get started, all you need is nextflow, kubectl, and access to a Kubernetes cluster (in the form of ~/.kube/config
). If you want to test Docker images on your local machine, you will also need docker and nvidia-docker (for GPU-enabled Docker images).
There are a few administrative tasks which must be done in order for nextflow to be able to run properly on the Kubernetes cluster. These tasks only need to be done once, but they may require administrative access to the cluster, so you may need your system administrator to handle this part for you.
- Nextflow needs a service account with the
edit
andview
cluster roles:
kubectl create rolebinding default-edit --clusterrole=edit --serviceaccount=<namespace>:default
kubectl create rolebinding default-view --clusterrole=view --serviceaccount=<namespace>:default
- Nextflow needs access to shared storage in the form of a Persistent Volume Claim (PVC) with
ReadWriteMany
access mode. The process for provisioning a PVC depends on what types of storage is available. Thekube-create-pvc.sh
script provides an example of creating a PVC for CephFS storage, but it may not apply to your particular cluster. Consult your system administrator for assistance if necessary. There may already be a PVC available for you. You can check using the following command:
kubectl get pvc
NOTE: If you are a user of the NRP from the Feltus lab, there is already a PVC available for you called deepgtex-prp
.
Consult the examples
folder for examples of running nextflow pipelines on a Kubernetes cluster. Consult the Nextflow Kubernetes documentation for more general information on using Nextflow and Kubernetes together.
This repository provides two scripts, kube-load.sh
and kube-save.sh
, for transferring data between your local machine and your Kubernetes cluster. In general, to run a nextflow pipeline with Kubernetes, you will need to transfer your input data beforehand using kube-load.sh
and transfer your output data afterward using kube-save.sh
:
./kube-load.sh <pvc-name> <input-dir>
nextflow [-C nextflow.config] kuberun <pipeline> -v <pvc-name> [options]
./kube-save.sh <pvc-name> <output-dir>
NOTE: If you use kube-load.sh
to upload a directory when that directory already exists remotely, kube-load.sh
will not overwrite the remote directory. Instead, it will copy the local directory into the remote directory. For example, if you try to upload a directory called input
and that directory already exists remotely, the local input
directory will be copied to input/input
. Keep this in mind whenever you try to update an existing directory! You must delete or rename the remote directory before copying the new directory.
The nextflow kuberun
command will automatically create a pod that runs your pipeline. Alternatively, you can provide your own pod spec. The kube-run.sh
script can generate a pod spec and launch it using the same parameters as nextflow kuberun
:
# transfer local nextflow.config if necessary
./kube-load.sh <pvc-name> nextflow.config
# run pipeline
./kube-run.sh <pvc-name> <pipeline> [options]
As you run pipelines, nextflow will create pods to perform the work. Some pods may not be properly cleaned up due to errors or other issues, therefore it is important to clean up your pods periodically. You can list all of the pods in your namespace using kubectl
:
kubectl get pods
You can use the kube-clean.sh
script in this repository to clean up dangling pods:
./kube-clean.sh
Lastly, there are a few additional scripts you can use to manage the pods in your namespace:
./kube-logs.sh
./kube-pods.sh
NOTE: Generally speaking, Docker requires admin privileges in order to run. On Linux, for example, you may need to run Docker commands with sudo
. Alternatively, if you add your user to the docker
group then you will be able to run docker
without sudo
.
Build a Docker image:
docker build -t <tag> <build-directory>
Run a Docker container:
docker run [--runtime=nvidia] --rm -it <tag> <command>
List the Docker images on your machine:
docker images
Push a Docker image to Docker Hub:
docker push <tag>
Remove old Docker data:
docker system prune
Test your Kubernetes configuration:
kubectl config view
Switch to a particular cluster (context):
kubectl config use-context <context>
Switch to a particular namespace:
kubectl config set-context --current --namespace=<namespace>
View the physical nodes on your cluster:
kubectl get nodes --show-labels
Check the status of your pods:
kubectl get pods -o wide
Get information on a pod:
kubectl describe pod <pod-name>
Get an interactive shell into a pod:
kubectl exec -it <pod-name> -- bash
Delete a pod:
kubectl delete pod <pod-name>
Create a pod with an interactive terminal on a Kubernetes cluster:
nextflow kuberun login -v <pvc-name>
Run a nextflow pipeline on a Kubernetes cluster:
nextflow [-C nextflow.config] kuberun <pipeline> -v <pvc-name>
NOTE: If you create your own nextflow.config
in your current directory then nextflow will use that config file instead of the default.