As of version v0.6.0 doAzureParallel runs all workloads within a Docker container. This has several benefits including consistent immutable runtime, custom R version, environment and packages and improved testing before deploying to doAzureParallel
The documentation below builds on top of the standard Docker documentation. It is highly recommended you read up on Docker documentation, specifically their getting started guide.
Prerequisites
- Install Docker instructions
These are some of the common use cases for builing your own images in Docker.
If you have your own R runtime, or want to use something other than the default version of R that doAzureParallel uses, you can easily point to an existing Docker image or build one yourself. This allows for the flexibility to use any R version you need without being subjected to what defaults are used by this toolkit.
Installing packages is often complex and involved and takes a few tries to get right. Using docker you can make sure that your images are built correctly on your local machine without needing to try building and rebuilding doAzureParallel clusters trying to get it right. This also means that you can pull in your own custom packages and guarantee that the version of the package inside the container will never change and your runs will always produce the same results.
One issue with installing packages is that they can take time to load and install, and are subject to potential issues with repository access and network reliability. By pre-packaging everything into your container, you can guarantee that everything is already built and available and will be loaded correctly in the doAzureParallel cluster.
Building container images may seem a bit difficult to begin with, but they are really no harder than running commands in your command line. The following sections will go through how to build a container image that will install a few R packages and their operating system dependencies.
In the following example we will create an image that installs the popular web based packages jsonlite and httr. This example simply uses an image provided by the RStudio team 'r-ver' and installs a few packages into it. The benefit of using the r-ver package is that it has already done all the hard work of getting R installed, so all we need to do in add the packages we want to use and we should be good to go.
NOTE: Rocker has several great R container images available on Docker Hub. Take a quick look through them to see if any of them suit your needs.
Create a Dockerfile in a direcotry called 'demo'. Notice the Dockerfile has no extension.
mkdir demo
touch demo/Dockerfile
Open up the Dockerfile with your favorite editor and paste in the following code.
# Use rocker/r-ver as the base image
# This will inherit everyhing that was installed into base image already
# Documented at https://hub.docker.com/r/rocker/r-ver/~/dockerfile/
FROM rocker/r-ver
# Install any dependencies required for the R packages
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libxml2-dev \
libcurl4-openssl-dev \
libssl-dev
# Install the R Packages from CRAN
RUN Rscript -e 'install.packages(c("jsonlite", "httr"))'
Finally save the file and build the docker image.
# docker build takes the directory which contains the Dockerfile as the input
# -t is used to tag or name the image
docker build demo -t demo/custom-r-ver
Once the docker image is built locally, you can list it by running the below command.
docker images
And you should see the following
REPOSITORY TAG IMAGE ID CREATED SIZE
demo/custom-r-ver latest 55aefec47200 14 seconds ago 709MB
rocker/r-ver latest 503e3df4e322 21 hours ago 578MB
rocker/r-ver is the image that was downloaded to build the demo/custom-r-ver.
Once you have your images built, you can run it locally to test it out.
docker run --rm -it demo/custom-r-ver R
This will open up a conole version of R. To make sure the packages are insalled correctly, load them into the R session.
> library(httr)
> library(jsonlite)
> sessionInfo()
The output will show that these packages are now available to use
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] jsonlite_1.5 httr_1.3.1
loaded via a namespace (and not attached):
[1] compiler_3.4.2 R6_2.2.2
doAzureParallel will run your container and load in specific direcotories and environement varialbles.
We run the container as follows:
docker run --rm \
-v $AZ_BATCH_NODE_ROOT_DIR:$AZ_BATCH_NODE_ROOT_DIR \
-e AZ_BATCH_NODE_ROOT_DIR=$AZ_BATCH_NODE_ROOT_DIR \
-e AZ_BATCH_NODE_STARTUP_DIR=$AZ_BATCH_NODE_STARTUP_DIR \
-e AZ_BATCH_TASK_ID=$AZ_BATCH_TASK_ID \
-e AZ_BATCH_JOB_ID=$AZ_BATCH_JOB_ID \
-e AZ_BATCH_TASK_WORKING_DIR=$AZ_BATCH_TASK_WORKING_DIR \
-e AZ_BATCH_JOB_PREP_WORKING_DIR=$AZ_BATCH_JOB_PREP_WORKING_DIR
All files downloaded with resource files will be available at $AZ_BATCH_NODE_STARTUP_DIR/wd.
You can use these values to set up your local environment to look like it is running on a Batch node.
Once you are happy with your image, you can publish it to docker hub
docker login
...
docker push <username>/custom-r-ver
{
"name": "demo",
"vmSize": "Standard_F2",
"maxTasksPerNode": 2,
"poolSize": {
"dedicatedNodes": {
"min": 0,
"max": 0
},
"lowPriorityNodes": {
"min": 2,
"max": 2
},
"autoscaleFormula": "QUEUE"
},
"containerImage": "<username>/custom-r-ver",
"rPackages": {
"cran": [],
"github": [],
"bioconductor": []
},
"commandLine": []
}
To use a private docker registry simply add the docker registry information to the credentials object before creating your cluster.
Add the following section inside the credentials file
"sharedKey": {
"batchAccount": {
...
},
"storageAccount": {
...
}
},
"githubAuthenticationToken": "",
"dockerAuthentication": {
"username": "registry_username",
"password": "registry_password",
"registry": "registry_url"
}
Add the following list to your credentials object
credentials <- list(
"sharedKey" = list(
"batchAccount" = list(
...
),
"storageAccount" = list(
...
)
),
"githubAuthenticationToken" = "",
"dockerAuthentication" = list("username" = "registry_username",
"password" = "registry_password",
"registry" = "registry_url")
)