From 51f30105f73ffa19730bae9ae5637ca6199b5de6 Mon Sep 17 00:00:00 2001 From: Geraldine Van der Auwera Date: Tue, 21 Jan 2025 06:02:08 -0500 Subject: [PATCH] Hello Containers COMPLETE Required some adaptations to config --- docs/hello_nextflow/05_hello_containers.md | 366 +++++++++++++----- docs/hello_nextflow/06_hello_config.md | 60 ++- hello-nextflow/hello-containers.nf | 32 ++ hello-nextflow/nextflow.config | 1 + .../5-hello-containers/hello-containers-2.nf | 37 ++ .../5-hello-containers/modules/cowSay.nf | 21 + 6 files changed, 378 insertions(+), 139 deletions(-) create mode 100644 hello-nextflow/hello-containers.nf create mode 100644 hello-nextflow/solutions/5-hello-containers/hello-containers-2.nf create mode 100644 hello-nextflow/solutions/5-hello-containers/modules/cowSay.nf diff --git a/docs/hello_nextflow/05_hello_containers.md b/docs/hello_nextflow/05_hello_containers.md index 1dc3c807..a12ac081 100644 --- a/docs/hello_nextflow/05_hello_containers.md +++ b/docs/hello_nextflow/05_hello_containers.md @@ -1,6 +1,6 @@ # Part 5: Hello Containers -In Parts 1-4 of this training course, you learned how to use the basic building blocks of Nextflow to assemble a simple workflow capable of processing some text, parallelizing execution if there were multiple inputs and collecting the results for further processing. +In Parts 1-4 of this training course, you learned how to use the basic building blocks of Nextflow to assemble a simple workflow capable of processing some text, parallelizing execution if there were multiple inputs, and collecting the results for further processing. However, you were limited to basic UNIX tools available in your environment. Real-world tasks often require various tools and packages not included by default. @@ -40,20 +40,42 @@ executor > local (7) [7d/f7961c] collectGreetings [100%] 1 of 1 ✔ ``` -Our goal will be to add a step to this workflow that will use a container for execution. -However, we are first going to go over some basic concepts and operations to solidify your understanding of what containers are before we start using them in Nextflow. +As previously, you will find the output files in the `results` directory (specified by the `publishDir` directive). + +```console title="Directory contents" +results +├── Bonjour-output.txt +├── COLLECTED-output.txt +├── COLLECTED-test-batch-output.txt +├── COLLECTED-trio-output.txt +├── Hello-output.txt +├── Holà-output.txt +├── UPPER-Bonjour-output.txt +├── UPPER-Hello-output.txt +└── UPPER-Holà-output.txt +``` + +!!! note + + There may also be a file named `output.txt` left over if you worked through Part 2 in the same environment. + +If that worked for you, you're ready to learn how to use containers. --- ## 1. Use a container 'manually' +What we want to do is add a step to our workflow that will use a container for execution. + +However, we are first going to go over some basic concepts and operations to solidify your understanding of what containers are before we start using them in Nextflow. + ### 1.1. Pull the container image To use a container, you usually download or "pull" a container image from a container registry, and then run the container image to create a container instance. The general syntax is as follows: -```bash +```bash title="Syntax" docker pull '' ``` @@ -66,15 +88,33 @@ As an example, let's pull a container image that contains the [`cowsay` tool](ht There are various repositories where you can find published containers. We looked in the [Seqera Containers](https://seqera.io/containers/) repository and found this `cowsay` container: `'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65'`. -The pull command becomes: +Run the complete pull command: ```bash docker pull 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' ``` -Running this gives you the following console output as the system downloads the image: +This gives you the following console output as the system downloads the image: -TODO OUTPUT +```console title="Output" +131d6a1b707a8e65: Pulling from library/pip_cowsay +dafa2b0c44d2: Pull complete +dec6b097362e: Pull complete +f88da01cff0b: Pull complete +4f4fb700ef54: Pull complete +92dc97a3ef36: Pull complete +403f74b0f85e: Pull complete +10b8c00c10a5: Pull complete +17dc7ea432cc: Pull complete +bb36d6c3110d: Pull complete +0ea1a16bbe82: Pull complete +030a47592a0a: Pull complete +622dd7f15040: Pull complete +895fb5d0f4df: Pull complete +Digest: sha256:fa50498b32534d83e0a89bb21fec0c47cc03933ac95c6b6587df82aaa9d68db3 +Status: Downloaded newer image for community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65 +community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65 +``` Once the download is complete, you have a local copy of the container image. @@ -85,7 +125,7 @@ This is great for running one-off commands. The general syntax is as follows: -```bash +```bash title="Syntax" docker run --rm '' [tool command] ``` @@ -98,20 +138,22 @@ Here we will use `cowsay -t "Hello World"`. Fully assembled, the container execution command looks like this: ```bash -docker run --rm 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' cowsay -t "Hello World" +docker run --rm 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' cowsay -t "Hello Containers" ``` Run it to produce the following output: ```console title="Output" - _____________ -< Hello World > - ------------- - \ ^__^ - \ (oo)\_______ - (__)\ )\/\ - ||----w | - || || + ________________ +| Hello Containers | + ================ + \ + \ + ^__^ + (oo)\_______ + (__)\ )\/\ + ||----w | + || || ``` The system spun up the container, ran the `cowsay` command with the parameters we specified, sent the output to the console and finally, shut down the container instance. @@ -129,45 +171,51 @@ Optionally, we can specify the shell we want to use inside the container by appe docker run --rm -it 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' /bin/bash ``` -Notice that the prompt has changed to `(base) root@b645838b3314:/tmp#`, which indicates that you are now inside the container. +Notice that your prompt changes to something like `(base) root@b645838b3314:/tmp#`, which indicates that you are now inside the container. You can verify this by running `ls` to list directory contents: ```bash -ls +ls / ``` -You can see that the filesystem inside the container is different from the filesystem on your host system: - ```console title="Output" -(base) root@b645838b3314:/tmp# ls / bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var ``` +You can see that the filesystem inside the container is different from the filesystem on your host system. + +!!! note + +When you run a container, it is isolated from the host system by default. +This means that the container can't access any files on the host system unless you explicitly allow it to do so. + +You will learn how to do that in a minute. + #### 1.3.2. Run the desired tool command(s) Now that you are inside the container, you can run the `cowsay` command directly. ```bash -cowsay -t "Hello World" -c tux +cowsay -t "Hello Containers" -c tux ``` Now the output shows the Linux penguin, Tux, instead of the default cow, because we specified the `-c` parameter. ```console title="Output" - ___________ -| Hello World | - =========== - \ - \ - \ - .--. - |o_o | - |:_/ | - // \ \ - (| | ) - /'\_ _/`\ - \___)=(___/ + ________________ +| Hello Containers | + ================ + \ + \ + \ + .--. + |o_o | + |:_/ | + // \ \ + (| | ) + /'\_ _/`\ + \___)=(___/ ``` Because you're inside the container, you can run the cowsay command as many times as you like, varying the input parameters, without having to bother with docker commands. @@ -177,6 +225,11 @@ Because you're inside the container, you can run the cowsay command as many time Use the '-c' flag to pick a different character from this list: `beavis`, `cheese`, `cow`, `daemon`, `dragon`, `fox`, `ghostbusters`, `kitty`, `meow`, `miki`, `milk`, `octopus`, `pig`, `stegosaurus`, `stimpy`, `trex`, `turkey`, `turtle`, `tux` +This is neat. What would be even neater is if we could feed our `greetings.csv` as input into this. +But since we don't have access to the filesystem, we can't. + +Let's fix that. + #### 1.3.3. Exit the container To exit the container, you can type `exit` at the prompt or use the ++ctrl+d++ keyboard shortcut. @@ -187,19 +240,26 @@ exit Your prompt should now be back to what it was before you started the container. -#### 1.3.4. Mounting data into containers +#### 1.3.4. Mount data into the container When you run a container, it is isolated from the host system by default. -This means that the container can't access any files on the host system unless you explicitly tell it to. -One way to do this is to **mount** a **volume** from the host system into the container. +This means that the container can't access any files on the host system unless you explicitly allow it to do so. -To mount a volume, we add `-v :` to the `docker run command as follows: +One way to do this is to **mount** a **volume** from the host system into the container using the following syntax: + +```bash title="Syntax" +-v : +``` + +In our case `` will be the current working directory, so we can just use a dot (`.`), and `` is just a name we make up; let's call it `/data`. + +To mount a volume, we replace the paths and add the volume mounting argument to the docker run command as follows: ```bash -docker run --rm -it -v data:/data 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' /bin/bash +docker run --rm -it -v .:/data 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' /bin/bash ``` -This mounts the `data` directory (in the current working directory) as a volume that will be found under `/data` inside the container. +This mounts the current working directory as a volume that will be accessible under `/data` inside the container. You can check that it works by listing the contents of `/data`: @@ -207,19 +267,20 @@ You can check that it works by listing the contents of `/data`: ls /data ``` -You should now be able to see the contents of the `data` directory from inside the container: - ```console title="Output" -greetings.csv +demo-params.json hello-channels.nf hello-workflow.nf modules results +greetings.csv hello-modules.nf hello-world.nf nextflow.config work ``` +You can now see the contents of the `data` directory from inside the container, including the `greetings.csv` file. + This effectively established a tunnel through the container wall that you can use to access that part of your filesystem. #### 1.3.5. Use the mounted data Now that we have mounted the `data` directory into the container, we can use the `cowsay` command to display the contents of the `greetings.csv` file. -To do this, we'll use the syntax `-t "$(cat data/greetings.csv)"` to output the contents of the file into the `cowsay` command. +To do this, we'll use `-t "$(cat data/greetings.csv)"` to load the contents of the CSV file into the `cowsay` command. ```bash cowsay -t "$(cat /data/greetings.csv)" -c pig @@ -228,23 +289,27 @@ cowsay -t "$(cat /data/greetings.csv)" -c pig This produces the desired ASCII art of the pig rattling off our example greetings: ```console title="Output" - __________________ -| Hello,Bonjour,Holà | - ================== - \ - \ - \ - \ - ,. - (_|,. - ,' /, )_______ _ - __j o``-' `.'-)' - (") \' - `-j | - `-._( / - |_\ |--^. / - /_]'|_| /_)_/ - /_]' /_]' + _______ + / \ +| Hello | +| Bonjour | +| Holà | + \ / + ======= + \ + \ + \ + \ + ,. + (_|,. + ,' /, )_______ _ + __j o``-' `.'-)' + (") \' + `-j | + `-._( / + |_\ |--^. / + /_]'|_| /_)_/ + /_]' /_]' ``` Feel free to play around with this command. @@ -258,7 +323,7 @@ You will find yourself back in your normal shell. ### Takeaway -You know how to pull a container and run it either as a one-off or interactively. You also know how to make your data accessible from within your container, which lets you try any tool you're interested in without having to install any software on your system. +You know how to pull a container and run it either as a one-off or interactively. You also know how to make your data accessible from within your container, which lets you try any tool you're interested in on real data without having to install any software on your system. ### What's next? @@ -280,14 +345,14 @@ To demonstrate this, we are going to add a `cowsay` step to the pipeline we've b Create an empty file for the module called `cowSay.nf`. ```bash -touch modules/cowsay.nf +touch modules/cowSay.nf ``` This gives us a place to put the process code. #### 2.1.2. Copy the `cowSay` process code in the module file -We can model our `cowSay` process off of the processes we've written previously. +We can model our `cowSay` process on the other processes we've written previously. ```groovy title="modules/cowSay.nf" linenums="1" #!/usr/bin/env nextflow @@ -314,13 +379,17 @@ process cowSay { The output will be a new text file containing the ASCII art generated by the `cowsay` tool. -### 2.2. Import the `cowSay` process into `hello-containers.nf` +### 2.2. Add cowSay to the workflow + +Now we need to import the module and call the process. + +#### 2.2.1. Import the `cowSay` process into `hello-containers.nf` Insert the import declaration above the workflow block and fill it out appropriately. _Before:_ -```groovy title="hello-containers.nf" linenums="73" +```groovy title="hello-containers.nf" linenums="9" // Include modules include { sayHello } from './modules/sayHello.nf' include { convertToUpper } from './modules/convertToUpper.nf' @@ -331,7 +400,7 @@ workflow { _After:_ -```groovy title="hello-containers.nf" linenums="73" +```groovy title="hello-containers.nf" linenums="9" // Include modules include { sayHello } from './modules/sayHello.nf' include { convertToUpper } from './modules/convertToUpper.nf' @@ -341,7 +410,7 @@ include { cowSay } from './modules/cowSay.nf' workflow { ``` -### 2.3 Add a call to the `cowSay` process in the workflow +#### 2.2.2. Add a call to the `cowSay` process in the workflow Let's connect the `cowSay()` process to the output of the `collectGreetings()` process, which as you may recall produces two outputs: @@ -352,7 +421,7 @@ In the workflow block, make the following code change: _Before:_ -```groovy title="hello-containers.nf" linenums="82" +```groovy title="hello-containers.nf" linenums="28" // collect all the greetings into one file collectGreetings(convertToUpper.out.collect(), params.batch) @@ -362,7 +431,7 @@ _Before:_ _After:_ -```groovy title="hello-containers.nf" linenums="82" +```groovy title="hello-containers.nf" linenums="28" // collect all the greetings into one file collectGreetings(convertToUpper.out.collect(), params.batch) @@ -375,7 +444,34 @@ _After:_ Notice that we include a new CLI parameter, `params.character`, in order to specify which character we want to have say the greetings. -### 2.4. Run the workflow to verify that it works +#### 2.2.3. Set a default value for `params.character` + +We like to be lazy and skip typing parameters in our command lines. + +_Before:_ + +```groovy title="hello-containers.nf" linenums="3" +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' +params.batch = 'test-batch' +``` + +_After:_ + +```groovy title="hello-containers.nf" linenums="3" +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' +params.batch = 'test-batch' +params.character = 'pig' +``` + +That should be all we need to make this work. + +#### 2.2.4. Run the workflow to verify that it works Run this with the `-resume` flag. @@ -386,18 +482,37 @@ nextflow run hello-containers.nf -resume Oh no, there's an error! ```console title="Output" -TODO + N E X T F L O W ~ version 24.10.0 + +Launching `hello-containers.nf` [special_lovelace] DSL2 - revision: 028a841db1 + +executor > local (1) +[f6/cc0107] sayHello (1) | 3 of 3, cached: 3 ✔ +[2c/67a06b] convertToUpper (3) | 3 of 3, cached: 3 ✔ +[1a/bc5901] collectGreetings | 1 of 1, cached: 1 ✔ +[b2/488871] cowSay | 0 of 1 +There were 3 greetings in this batch +ERROR ~ Error executing process > 'cowSay' + +Caused by: + Process `cowSay` terminated with an error exit status (127) ``` -Of course, we're calling the `cowsay` tool but we haven't actually specified a container. +This error code, `error exit status (127)` means the executable we asked for was not found. + +Of course, since we're calling the `cowsay` tool but we haven't actually specified a container yet. + +### 2.3. Use a container to run it -### 2.5. Specify a container for the process to use +We need to specify a container and tell Nextflow to use it for the `cowSay()` process. -Edit the `cowSay.nf` module to add the `container` directive as follows: +#### 2.3.1. Specify a container for the `cowSay` process to use + +Edit the `cowSay.nf` module to add the `container` directive to the process definition as follows: _Before:_ -```groovy title="modules/cowSay.nf" +```groovy title="modules/cowSay.nf" linenums="4" process cowSay { publishDir 'containers/results', mode: 'copy' @@ -405,52 +520,97 @@ process cowSay { _After:_ -```groovy title="modules/cowSay.nf" +```groovy title="modules/cowSay.nf" linenums="4" process cowSay { publishDir 'containers/results', mode: 'copy' + container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' ``` -This tells Nextflow that if Docker is available, it should use the container image specified here to execute the process. +This tells Nextflow that if the use of Docker is enabled, it should use the container image specified here to execute the process. -### 2.6. Run the workflow again to verify that it works this time +#### 2.3.2. Enable use of Docker via the `nextflow.config` file -Run this with the `-resume` flag and with `-with-docker`, which enables Docker execution on a per-command basis. -We will cover persistent configuration in the next section of this course. +Here we are going to slightly anticipate the topic of the next and last part of this course (Part 6), which covers configuration. -```bash -nextflow run hello-containers.nf -resume -with-docker +One of the main ways Nextflow offers for configuring workflow execution is to use a `nextflow.config` file. When such a file is present in the current directory, Nextflow will automatically load it in and apply any configuration it contains. + +We provided a `nextflow.config` file with a single line of code that disables Docker: `docker.enabled = false`. + +Now, let's switch that to `true` to enable Docker: + +_Before:_ + +```console title="nextflow.config" linenums="1" +docker.enabled = false ``` -This time it does indeed work! +_After:_ -```console title="Output" -TODO +```console title="nextflow.config" linenums="1" +docker.enabled = true +``` + +!!! note + +It is possible to enable Docker execution from the command-line, on a per-run basis, using the `-with-docker ` parameter. +However that only allows us to specify one container for the entire workflow, whereas the approach we just showed you allows us to specify a different container per process, which is better for modularity, code maintenance and reproducibility. + +#### 2.3.3. Run the workflow with Docker enabled + +Run the workflow with the `-resume` flag: + +```bash +nextflow run hello-containers.nf -resume ``` -You should find the cowsay'ed output in the `results` directory. +This time it does indeed work. + +```console title="Output" linenums="1" + N E X T F L O W ~ version 24.10.0 + +Launching `hello-containers.nf` [elegant_brattain] DSL2 - revision: 028a841db1 + +executor > local (1) +[95/fa0bac] sayHello (3) | 3 of 3, cached: 3 ✔ +[92/32533f] convertToUpper (3) | 3 of 3, cached: 3 ✔ +[aa/e697a2] collectGreetings | 1 of 1, cached: 1 ✔ +[7f/caf718] cowSay | 1 of 1 ✔ +There were 3 greetings in this batch +``` -TODO UPDATE FILENAME AND CONTENT +You can find the cowsay'ed output in the `results` directory. -```console title="results/cowsay-output-Bonjour.txt" +```console title="results/cowsay-COLLECTED-test-batch-output.txt" _______ -| Bonjour | + / \ +| HELLO | +| HOLà | +| BONJOUR | + \ / ======= \ \ - ^__^ - (oo)\_______ - (__)\ )\/\ - ||----w | - || || + \ + \ + ,. + (_|,. + ,' /, )_______ _ + __j o``-' `.'-)' + (") \' + `-j | + `-._( / + |_\ |--^. / + /_]'|_| /_)_/ + /_]' /_]' ``` -You see that the character is saying all the greetings. +You see that the character is saying all the greetings, just as it did when we ran the `cowsay` command on the `greetings.csv` file from inside the container. - + -### 2.7. Inspect how Nextflow launched the containerized task +#### 2.3.4. Inspect how Nextflow launched the containerized task Let's take a look at the work subdirectory for one of the `cowSay` process calls to get a bit more insight on how Nextflow works with containers under the hood. @@ -462,13 +622,14 @@ Open the `.command.run` file and search for `nxf_launch`; you should see somethi ```bash nxf_launch() { - docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspace/gitpod/nf-training/hello-nextflow/work:/workspace/gitpod/nf-training/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65 /bin/bash -ue /workspace/gitpod/nf-training/hello-nextflow/work/8c/738ac55b80e7b6170aa84a68412454/.command.sh + docker run -i --cpu-shares 1024 -e "NXF_TASK_WORKDIR" -v /workspace/gitpod/hello-nextflow/work:/workspace/gitpod/hello-nextflow/work -w "$NXF_TASK_WORKDIR" --name $NXF_BOXID community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65 /bin/bash -ue /workspace/gitpod/hello-nextflow/work/7f/caf7189fca6c56ba627b75749edcb3/.command.sh } ``` As you can see, Nextflow is using the `docker run` command to launch the process call. It also mounts the corresponding work subdirectory into the container, sets the working directory inside the container accordingly, and runs our templated bash script in the `.command.sh` file. -All the hard work we learned about in the previous sections is done for us by Nextflow! + +All the hard work we had to do manually in the previous section is done for us by Nextflow! ### Takeaway @@ -478,3 +639,4 @@ You know how to use containers in Nextflow to run processes. Take a break! When you're ready, move on to Part 6 to learn how to configure the execution of your pipeline to fit your infrastructure as well as manage configuration of inputs and parameters. +It's the very last part and then you're done! diff --git a/docs/hello_nextflow/06_hello_config.md b/docs/hello_nextflow/06_hello_config.md index cf07d0cb..f801c9ef 100644 --- a/docs/hello_nextflow/06_hello_config.md +++ b/docs/hello_nextflow/06_hello_config.md @@ -10,22 +10,28 @@ Whenever there is a file named `nextflow.config` in the current directory, Nextf Anything you put into the `nextflow.config` can be overridden at runtime by providing the relevant process directives or parameters and values on the command line, or by importing another configuration file, according to the order of precedence described [here](https://www.nextflow.io/docs/latest/config.html). In this part of the training, we're going to use the `nextflow.config` file to demonstrate essential components of Nextflow configuration such as process directives, executors, profiles, and parameter files. + By learning to utilize these configuration options effectively, you can enhance the flexibility, scalability, and performance of your pipelines. --- -## 0. Warmup: Run the Hello Config workflow +## 0. Warmup: Check that Docker is enabled and run the Hello Config workflow -[TODO] [usual run workflow to check everything works] +First, a quick check. There is a `nextflow.config` file in the current directory that contains the line `docker.enabled = `, where `` is either `true` or `false` depending on whether or not you've worked through Part 5 of this course in the same environment. -Verify that the initial workflow runs properly: +If it is set to `true`, you don't need to do anything. +If it is set to `false`, switch it to `true` now. + +```console title="nextflow.config" linenums="1" +docker.enabled = true +``` + +Once you've done that, verify that the initial workflow runs properly: ```bash nextflow run hello-config.nf ``` -This should run successfully: - ```console title="Output" Nextflow 24.09.2-edge is available - Please consider updating your version to it @@ -46,39 +52,17 @@ The first step toward adapting your workflow configuration to your compute envir Are they already installed in the local compute environment? Do we need to retrieve images and run them via a container system? Or do we need to retrieve Conda packages and build a local Conda environment? In the very first part of this training course (Parts 1-4) we just used locally installed software in our workflow. -Then in Part 5, we introduced Docker containers, using the `-with-docker` command-line argument. - -Now let's look at how we can configure Nextflow to use Docker or other container systems without having to specify that every time, using a `nextflow.config` file. - -### 1.1. Enable Docker in the config file - -There is a `nextflow.config` file in the current directory but it's a stub; there's nothing in it. - -Let's add the line `docker.enabled = true` to the file. - -```console title="nextflow.config" linenums="1" -docker.enabled = true -``` - -This instruction specifies that Nextflow should use Docker to run process calls that specify a Docker container image. - -### 1.2. Run the workflow without the Docker CLI argument - -```bash -nextflow run hello-config.nf -``` +Then in Part 5, we introduced Docker containers and the `nextflow.config` file, which we used to enable the use of Docker containers. -This should produce the following output: +In the warmup to this section, you checked that Docker was enabled in `nextflow.config` file and ran the workflow, which used a Docker container to execute the `cowSay()` process. -```console title="Output" -TODO add updated output -``` +_If that doesn't sound familiar, you should probably go back and work through Part 5 before continuing._ -This shows how you can get Nextflow to use Docker for any processes that specify a container with stating so everytime on the command line. +Now let's see what other software packaging options we can configure via the `nextflow.config` file. -### 1.3. Disable Docker and enable Conda in the config file +### 1.1. Disable Docker and enable Conda in the config file -Now, let's pretend we're working on an HPC cluster and the admin doesn't allow the use of Docker for security reasons. +Let's pretend we're working on an HPC cluster and the admin doesn't allow the use of Docker for security reasons. Fortunately for us, Nextflow supports multiple other container technologies such as including Singularity (which is more widely used on HPC), and software package managers such as Conda. @@ -100,7 +84,7 @@ conda.enabled = true This should allow Nextflow to create and utilize Conda environments for processes that have Conda packages specified. Which means we now need to add one to our `cowSay` process! -### 1.4. Specify a Conda package in the process definition +### 1.2. Specify a Conda package in the process definition We've already retrieved the URI for a Conda package containing the `cowsay` tool: @@ -136,7 +120,7 @@ process cowSay { To be clear, we're not _replacing_ the `docker` directive, we're _adding_ an alternative option. -### 1.5. Run the workflow to verify that it can use Conda +### 1.3. Run the workflow to verify that it can use Conda Let's try it out. @@ -378,10 +362,11 @@ You know how to manage parameter defaults and override them at runtime using a p Learn how to use profiles to conveniently switch between alternative configurations. --- + ## 3. Determine what executor(s) should be used to do the work Until now, we have been running our pipeline with the local executor. -This executes each task on the machine that Nextflow is running on. When Nextflow begins, it looks at the available CPUs and memory. If the resources of the tasks ready to run exceed the avialable resources, Nextflow will hold the last tasks back from execution until one or more of the earlier tasks have finished, freeing up the necessary resources. +This executes each task on the machine that Nextflow is running on. When Nextflow begins, it looks at the available CPUs and memory. If the resources of the tasks ready to run exceed the avialable resources, Nextflow will hold the last tasks back from execution until one or more of the earlier tasks have finished, freeing up the necessary resources. For very large workloads, you may discover that your local machine is a bottleneck, either because you have a single task that requires more resources than you have available, or because you have so many tasks that waiting for a single machine to run them would take too long. The local executor is convenient and efficient, but is limited to that single machine. Nextflow support [many different execution backends](https://www.nextflow.io/docs/latest/executor.html), including HPC schedulers (Slurm, LSF, SGE, PBS, Moab, OAR, Bridge, HTCondor and others) as well as cloud execution backends such (AWS Batch, Google Cloud Batch, Azure Batch, Kubernetes and more). @@ -396,6 +381,7 @@ Each of these systems use different technologies, synaxes and configurations for ``` If I wanted to make the workflow available to a colleague running on PBS, I'd need to remember to use a different submission program `qsub` and I'd need to change the my scripts to use a new sytax for resouces: + ```bash #PBS -o /path/to/my/task/directory/my-task-1.log #PBS -j oe @@ -405,6 +391,7 @@ If I wanted to make the workflow available to a colleague running on PBS, I'd ne ``` If I wanted to use SGE, the configuration would be slightly different again + ```bash #$ -o /path/to/my/task/directory/my-task-1.log #$ -j y @@ -449,7 +436,6 @@ Learn how to control the resources allocated for executing processes. --- - ## 4. Use profiles to select preset configurations You may want to switch between alternative settings depending on what computing infrastructure you're using. For example, you might want to develop and run small-scale tests locally on your laptop, then run full-scale workloads on HPC or cloud. diff --git a/hello-nextflow/hello-containers.nf b/hello-nextflow/hello-containers.nf new file mode 100644 index 00000000..f9f31995 --- /dev/null +++ b/hello-nextflow/hello-containers.nf @@ -0,0 +1,32 @@ +#!/usr/bin/env nextflow + +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' +params.batch = 'test-batch' + +// Include modules +include { sayHello } from './modules/sayHello.nf' +include { convertToUpper } from './modules/convertToUpper.nf' +include { collectGreetings } from './modules/collectGreetings.nf' + +workflow { + + // create a channel for inputs from a CSV file + greeting_ch = Channel.fromPath(params.greeting) + .splitCsv() + .map { line -> line[0] } + + // emit a greeting + sayHello(greeting_ch) + + // convert the greeting to uppercase + convertToUpper(sayHello.out) + + // collect all the greetings into one file + collectGreetings(convertToUpper.out.collect(), params.batch) + + // emit a message about the size of the batch + collectGreetings.out.count.view { "There were $it greetings in this batch" } +} diff --git a/hello-nextflow/nextflow.config b/hello-nextflow/nextflow.config index e69de29b..0a5fd46b 100644 --- a/hello-nextflow/nextflow.config +++ b/hello-nextflow/nextflow.config @@ -0,0 +1 @@ +docker.enabled = false diff --git a/hello-nextflow/solutions/5-hello-containers/hello-containers-2.nf b/hello-nextflow/solutions/5-hello-containers/hello-containers-2.nf new file mode 100644 index 00000000..39f28adf --- /dev/null +++ b/hello-nextflow/solutions/5-hello-containers/hello-containers-2.nf @@ -0,0 +1,37 @@ +#!/usr/bin/env nextflow + +/* + * Pipeline parameters + */ +params.greeting = 'greetings.csv' +params.batch = 'test-batch' +params.character = 'pig' + +// Include modules +include { sayHello } from './modules/sayHello.nf' +include { convertToUpper } from './modules/convertToUpper.nf' +include { collectGreetings } from './modules/collectGreetings.nf' +include { cowSay } from './modules/cowSay.nf' + +workflow { + + // create a channel for inputs from a CSV file + greeting_ch = Channel.fromPath(params.greeting) + .splitCsv() + .map { line -> line[0] } + + // emit a greeting + sayHello(greeting_ch) + + // convert the greeting to uppercase + convertToUpper(sayHello.out) + + // collect all the greetings into one file + collectGreetings(convertToUpper.out.collect(), params.batch) + + // emit a message about the size of the batch + collectGreetings.out.count.view { "There were $it greetings in this batch" } + + // generate ASCII art of the greetings with cowSay + cowSay(collectGreetings.out.outfile, params.character) +} diff --git a/hello-nextflow/solutions/5-hello-containers/modules/cowSay.nf b/hello-nextflow/solutions/5-hello-containers/modules/cowSay.nf new file mode 100644 index 00000000..532b0741 --- /dev/null +++ b/hello-nextflow/solutions/5-hello-containers/modules/cowSay.nf @@ -0,0 +1,21 @@ +#!/usr/bin/env nextflow + +// Generate ASCII art with cowsay +process cowSay { + + publishDir 'results', mode: 'copy' + + container 'community.wave.seqera.io/library/pip_cowsay:131d6a1b707a8e65' + + input: + path input_file + val character + + output: + path "cowsay-${input_file}" + + script: + """ + cowsay -c "$character" -t "\$(cat $input_file)" > cowsay-${input_file} + """ +}