Due to changes in Docker policy, a subscription is necessary to build Docker images on DockerHub infrastructure. An individual personal subscription was taken see prices and this repository moved to https://github.com/mgalland/docker-for-teaching.
This repository is now archived 💾 Go to https://github.com/mgalland/docker-for-teaching for up-to-date Docker files and images.
These Docker files are used to build computational environments to teach students or train employees. For instance, these are used in:
- the Master Green Life Sciences. For the course entitled "Tools in Molecular Biology"
- the Carpentry-style courses "RNA-seq" and "shotgun-microbiome".
Docker images are automatically tested and built on Docker Hub.
All Docker images ready to use are available at the Dockerhub master-gls repository.
Docker containers can be run. For domain-specific instructions, see the instructions below.
To test it locally, you'll need to install Docker Desktop first: see instructions at docker desktop.
Based on the DockerHub rocker/tidyverse
image but with three added libraries (skimr
, plotly
and nycflights13
).
docker run --detach --name rstudio_instance -e PASSWORD=mypwd -p 8787:8787 scienceparkstudygroup/master-gls:openr-latest
Then navigate to http://localhost:8787 in your web browser. You should have an RStudio session running. Type rstudio
as the user name and your password.
Explanations:
The --detach
flag makes sure the container runs in the backgroupd and returns the interactive command-line prompt.
The --name
gives a name to the running container for easy retrieval.
The -p 8787:8787
follow the format -p host_port:container_port
. Therefore the port 8787 inside the container will be exposed to the outside port on the host machine. That way, the running instance of RStudio can be access through the :port format.
The phylogeny/
folder contains the Docker file used to build the image.
The Docker image contains:
- ncbi-blast version 2.9.0
- muscle version 3.8.31
- seqinr version 3.4_5
- iqtree=1.6.12
To use it locally on your machine:
- Open a Shell window (command-line interface).
- Navigate to your working directory where you have the files you want to work on for instance.
- Type
docker run --rm -it -v $PWD:/home/ scienceparkstudygroup/master-gls:phylogeny-latest
Explanations:
The --detach
flag makes sure the container runs in the backgroupd and returns the interactive command-line prompt.
The --it
starts an interactive session (so you enter the shell directly).
The -v
mounts your current working directory onto the /home/
folder inside your container. That way, you can access the files in your working directory from within the container.
A Dockerfile to follow the Carpentry-style microbiota data analysis lesson.
This Docker image contains:
- Three datasets called
data_loue_16S_nonnorm_grp.txt
,data_loue_16S_nonnorm_taxo.txt
anddata_loue_16S_nonnorm.txt
. vegan
version 2.5-6tidyverse
version 1.3.0pheatmap
version 1.0.12ade4
version 1.7-10multcomp
version 1.4-10patchwork
version 1.0.0agricolae
version 1.3-0FSA
version 0.8.27rcompanion
version 2.3.0phyloseq
Bioconductor version 3.10dada2
Bioconductor version 3.10
To use it locally on your machine:
To use it locally on your machine:
- Open a Shell window (command-line interface).
- Navigate to your working directory where you have the files you want to work on for instance.
- Type
docker run --detach --name rstudio -e PASSWORD=<choose a password> -p 8787:8787 scienceparkstudygroup/master-gls:microbiome-latest
. - In a web browser, open this link: http://localhost:8787.
- Finally enter
rstudio
as the user name and your select password.
A Dockerfile to perform the command-line parts of the Carpentry-style RNA-seq lesson: the "fastq NGS quality check" and the "fastq to counts" section.
In addition, it can also be used to teach the Carpentry Shell lesson.
The Docker image contains:
- The
data-shell/
dataset from the Shell novice lesson. - A genome consisting of a single chromosome.
- Four subsampled RNA-seq fastq files.
To use it locally on your machine:
- Open a Shell window (command-line interface).
- Navigate to your working directory where you have the files you want to work on for instance.
- Type
docker run -it scienceparkstudygroup/master-gls:fastq-2021
. - You will enter inside the container where you can execute bash commands.
2 vCPUs and 2GB of RAM are sufficient per machine to run the lesson.
A Dockerfile for the Carpentry-style RNA-seq lesson.
The image is based on a Docker Bioconductor image release 3.10.
The Docker image contains:
From CRAN
:
devtools
(from the MRAN microsoft CRAN mirror of 2020/01/01.pheatmap
version 1.0.12tidyverse
version 1.3.0pwr
version 1.2.0RColorBrewer
version 1.0.5BiocManager
version 1.30.1biomartr
version 0.9.0
From Bioconductor
release 3.10:
EnhancedVolcano
DESeq2
clusterProfiler
biomaRt
org.At.tair.db
Biostrings
vsn
Two datasets are included.
counts.tsv
experimental_design_modified.tsv
This dataset is included in the Docker image itself.
- NASA dataset: the raw and scaled counts from the NASA GeneLab GSL38 RNA-seq and proteomics experiment.
To use it locally on your machine:
- Open a Shell window (command-line interface).
- Navigate to your working directory where you have the files you want to work on for instance.
- Type
docker run --detach --name machine01 -e PASSWORD=<choose a password> -p 8787:8787 scienceparkstudygroup/master-gls:rnaseq-2021
. - In a web browser, open this link: http://localhost:8787.
- Finally enter
rstudio
as the user name and your select password.
Explanations:
The --detach
flag makes sure the container is still running in the background after the prompt returns.
The base image is built on a RStudio server that will ask you for two things: a user name that is always rstudio and a password which is one you have to create. You will be asked for a user name and a password.
A Dockerfile for the Carpentries incubator lesson on shotgun metagenomics.
The image is based on a Docker Bioconductor image release 3.10.
For now, I have relied on the Digital Ocean cloud computing platform to deploy Docker containers that in turn serve RStudio instances.
If you only want to run one RStudio virtual machine, then follow these steps:
- First, create a project to host a "Digital Ocean droplet" (virtual machine). This machine will serve to deploy N virtual machines (one VM per student).
- In the 'Marketplace' tab, choose the Docker apps which will starts a Virtual Machine with the Linux distribution Ubuntu 18.04 and Docker CE version VERSION 18.06.1 or higher. Find it here.
- Open a Shell terminal and connect through
ssh
to your machine e.g.ssh root@ip
and enter the Digital Ocean provided password. The IP address will be indicated in your "Droplets" sidebar. For example, usessh [email protected]
if your IP address is134.209.84.69
. - Start
screen
to make sure that your VM stays up and running when you log out / turn off your computer. See this help forum. - Run the appropriate Docker command e.g.
docker run --rm --name rstudio -e PASSWORD=mypwd -p 8787:8787 scienceparkstudygroup/master-gls:openr-latest
. (choose the appropriate Docker image). You can define your own username and password. - Your app should be running at its defined IP address.
If you want to run multiple containers (e.g. one per student), you need to perform the same steps as for a single machine (see above). The difference is that expose a different port on the host each time.
Here's an example for two students:
- Student 1 ("machine-01"):
docker run --detach --name machine-01 -v ~/machines/machine01/:/home/rstudio/ -e PASSWORD=student01 -p 8080:8787 scienceparkstudygroup/master-gls:openr-latest
- Student 2 ("machine-02"):
docker run --detach --name machine-02 -v ~/machines/machine02/:/home/rstudio/ -e PASSWORD=student02 -p 8081:8787 scienceparkstudygroup/master-gls:openr-latest
Notice that student 1 uses port 8080 while student 2 uses port 8081.
... etc ...
A small Python script called create_docker_commands_for_students.py
is available and can be used on a file called students.tsv
with tabulated-separated values with the following format:
student | machine | password | port |
---|---|---|---|
John Doe | machine-01 | john | 8787 |
Jane Doe | machine-02 | jane | 8788 |
In a Python virtual environment with pandas
available, run the following command:
python create_docker_commands_for_students.py students.tsv
This will create the Docker commands that can be copy-pasted in the shell of the cloud instance:
docker run --detach --name machine-01 -e PASSWORD=john -p 8787:8787 scienceparkstudygroup/master-gls:openr-latest
docker run --detach --name machine-02 -e PASSWORD=jane -p 8787:8788 scienceparkstudygroup/master-gls:openr-latest
In this way, you can have one machine per student, each one on its own port with its own password.
See this blog post: https://medium.com/@mccode/understanding-how-uid-and-gid-work-in-docker-containers-c37a01d01cf
- Stop all containers:
docker stop $(docker ps -a -q)
- To restart them, use this:
docker start <container_id or container_name>
- Remove stopped containers:
docker rm $(docker ps -a -q)
- Remove ALL containers: This will remove both stopped and running containers. Beware!
docker rm $(docker ps -a -q)
⚠️ Remove all images:docker rmi $(docker images -q) --force
Droplets are Linux-based Virtual Machines that can be created and deleted on demand.
Digital Ocean has a docl
command-line interface that can be used to perform actions programmatically.
This is useful when multiple droplets need to be created or modified for instance.
To use doctl
you will need to create an access token (DO > API).
Assign a bash environmental variable (that stays on your computer) to keep it secret when recording commands.
echo "export DO_TOKEN=you_secret_token" >> ~/.bash_profile # to create an env variable called $DO_TOKEN
source ~/.bash_profile # to activate your profile
This creates the $DO_TOKEN
environmental variable that you can use to authenticate with the doctl
command-line.
It is often useful to list the available droplets, their CPUs, etc. in order to create multiple ones with the same configuration for instance.
To list all available droplets, type:
doctl compute size list
Slug Memory VCPUs Disk Price Monthly Price Hourly
s-1vcpu-1gb 1024 1 25 5.00 0.007440
s-1vcpu-1gb-amd 1024 1 25 6.00 0.008930
s-1vcpu-1gb-intel 1024 1 25 6.00 0.008930
s-1vcpu-2gb 2048 1 50 10.00 0.014880
s-1vcpu-2gb-amd 2048 1 50 12.00 0.017860
s-1vcpu-2gb-intel 2048 1 50 12.00 0.017860
s-2vcpu-2gb 2048 2 60 15.00 0.022320
s-2vcpu-2gb-amd 2048 2 60 18.00 0.026790
s-2vcpu-2gb-intel 2048 2 60 18.00 0.026790
You can use a regular grep to filter this list. For instance, to get the memory-optimized droplets that start with an "m", grep them with:
doctl compute size list |grep "^m-"
The "slug" column contains the short description of the machine configuration. You will have to specify it when creating the machines.
To get a list of your running droplets, type:
doctl compute droplet list --format "ID,Name,PublicIPv4"
For instance, to create a Virtual Machine (droplet) with Docker 19.03.12
running on Ubuntu 20.04
, do:
doctl compute droplet create --image docker-20-04 \ # Ubuntu 20.04 with Docker installed
--enable-monitoring \ # Adds monitoring for CPU usage, RAM etc.
--region ams3 \ # region where the droplet is created (e.g. Amsterdam 3)
--tag-name rnaseq \ # tag for droplet management
--ssh-keys 27380279 \ # A list of SSH key fingerprints to embed upon creation
--size s-2vcpu-2gb \ # slug format to indicate compute size and resources
[my_droplet_name]
Important notes
- The
--tag-name
is useful to perform actions on multiple droplets at once. - The
--ssh-keys
will point to a SSH key ID to be used with the Droplet root account. This avoids the need for passwords. Here I give my custom public key ID27380279
. If this flag is not provided, you will get an email with the login credentials.
It is then rather easy to create a series of VMs called "machine-01", "machine-02" using a for loop.
Say you want to create 20 machines, then:
for i in {1..20}
do
echo "machine-${i}"
doctl compute droplet create --image docker-20-04 \ # Ubuntu 20.04 with Docker installed
--enable-monitoring \ # Adds monitoring for CPU usage, RAM etc.
--region ams3 \ # region where the droplet is created (e.g. Amsterdam 3)
--tag-name rnaseq \ # tag for droplet management
--ssh-keys 27380279 \ #
--size s-2vcpu-2gb \ # slug format to indicate compute size and resources
"machine-${i}" # creates a name with the machine number
done
Using the tag that was specified with --tag-name
it is possible to delete all machines at once:
doctl compute droplet delete --tag-name rnaseq
For the phylogeny course and the RNA-seq courses.
For the microbiome and RNA-seq courses.
- Docker containers for Bioconductor: https://www.bioconductor.org/help/docker/.
- The Rocker project.
- Tutorials on Docker images for R and RStudio.
- Docker run reference
- A clear overview of Docker best practices
- A collection of Docker images for bioinformatics: https://pegi3s.github.io/dockerfiles/
- Docker and conda interaction
- https://www.configserverfirewall.com/docker/start-container-docker-run-command/#port-mapping
- Best practices for Dockerfiles