Skip to content

Commit

Permalink
Merge branch 'roy'
Browse files Browse the repository at this point in the history
  • Loading branch information
royfrancis committed Oct 29, 2024
2 parents df0f67e + 1d34ca3 commit c0cc4fb
Show file tree
Hide file tree
Showing 62 changed files with 653 additions and 383 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ to-do.md
/.quarto/
.DS_Store
[0-9]*/
_freeze
site_libs
21 changes: 14 additions & 7 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ project:
- "*.qmd"

website:
image: "assets/images/featured.jpg"
image: "assets/images/featured.webp"
favicon: "assets/favicon.png"
navbar:
logo: "assets/logos/nbis-scilifelab.png"
Expand All @@ -31,6 +31,8 @@ website:
href: "home_syllabus.html"
- text: "Info"
href: "home_info.html"
- icon: "github"
href: "https://github.com/NBISweden/workshop-ngsintro/"
page-footer:
border: false
left: "{{< meta current_year >}} [NBIS](https://nbis.se) | [GPL-3 License](https://choosealicense.com/licenses/gpl-3.0/)"
Expand All @@ -55,11 +57,12 @@ format:
df-print: paged
standalone: false
fig-align: left
title-block-banner: "assets/images/banner.jpg"
title-block-banner: "assets/images/banner.webp"
callout-icon: true
date: last-modified
date-format: "DD-MMM-YYYY"
image: "assets/images/featured.png"
image: "assets/images/featured.webp"
lightbox: auto
revealjs:
quarto-required: ">=1.4.0"
include-in-header: "assets/include_head.html"
Expand All @@ -80,10 +83,10 @@ format:
fig-align: left
chalkboard: true
callout-icon: true
image: "/assets/images/featured.jpg"
hero: "/assets/images/slide-hero.png"
image: "/assets/images/featured.webp"
hero: "/assets/images/slide-hero.webp"
title-slide-attributes:
data-background-image: "/assets/images/cover.jpg"
data-background-image: "/assets/images/cover.webp"
data-background-size: "cover"
data-background-opacity: "1"
header-logo-left: /assets/logos/nbis.png
Expand All @@ -107,7 +110,7 @@ execute:
echo: true
warning: false
message: false
freeze: false
freeze: true

filters:
- assets/custom.lua
Expand All @@ -122,6 +125,10 @@ id_project_backup: "naiss2024-22-1375"
path_workspace: "~/ngsintro"
# path to course resources on cluster
path_resources: "/sw/courses/ngsintro"
# url to active cluster (dardel)
url_cluster: "dardel.pdc.kth.se"
# url to backup cluster (rackham)
url_cluster_backup: "rackham.uppmax.uu.se"

# location options are linkoping, lund, umea, uppsala or online. For rendering the info page.
# one or more separated by commas or semicolon. online doesn't display any location info.
Expand Down
2 changes: 1 addition & 1 deletion assets/css/home.css
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
body {
background-image: url("../images/home.jpg");
background-image: url("../images/home.webp");
-webkit-background-size: cover;
-moz-background-size: cover;
background-size: cover;
Expand Down
Binary file removed assets/images/IGVlogo.png
Binary file not shown.
Binary file removed assets/images/banner.jpg
Binary file not shown.
Binary file added assets/images/banner.webp
Binary file not shown.
Binary file removed assets/images/cover.jpg
Binary file not shown.
Binary file added assets/images/cover.webp
Binary file not shown.
Binary file removed assets/images/cover_43_bg.png
Binary file not shown.
Binary file added assets/images/cover_43_bg.webp
Binary file not shown.
Binary file removed assets/images/end.png
Binary file not shown.
Binary file added assets/images/end.webp
Binary file not shown.
Binary file removed assets/images/featured.jpg
Binary file not shown.
Binary file added assets/images/featured.webp
Binary file not shown.
Binary file removed assets/images/filezilla.png
Binary file not shown.
Binary file added assets/images/filezilla.webp
Binary file not shown.
Binary file removed assets/images/hero.png
Binary file not shown.
Binary file added assets/images/hero.webp
Binary file not shown.
Binary file removed assets/images/home.jpg
Binary file not shown.
Binary file removed assets/images/home.png
Binary file not shown.
Binary file added assets/images/home.webp
Binary file not shown.
Binary file added assets/images/igv.webp
Binary file not shown.
Binary file removed assets/images/mobaxterm.png
Binary file not shown.
Binary file added assets/images/mobaxterm.webp
Binary file not shown.
Binary file removed assets/images/r.png
Binary file not shown.
Binary file added assets/images/r.webp
Binary file not shown.
Binary file removed assets/images/slide-hero.png
Binary file not shown.
Binary file added assets/images/slide-hero.webp
Binary file not shown.
Binary file removed assets/images/supr-login.jpg
Binary file not shown.
Binary file added assets/images/supr-login.webp
Binary file not shown.
Binary file removed assets/images/supr-projects.jpg
Binary file not shown.
Binary file added assets/images/supr-projects.webp
Binary file not shown.
Binary file removed assets/images/supr-request.jpg
Binary file not shown.
Binary file added assets/images/supr-request.webp
Binary file not shown.
Binary file removed assets/images/supr-tetralith.jpg
Binary file not shown.
Binary file added assets/images/supr-tetralith.webp
Binary file not shown.
Binary file removed assets/images/thinlinc.png
Binary file not shown.
Binary file added assets/images/thinlinc.webp
Binary file not shown.
Binary file removed assets/images/versioning.png
Binary file not shown.
Binary file removed assets/images/xquartz.png
Binary file not shown.
Binary file added assets/images/xquartz.webp
Binary file not shown.
Binary file removed assets/images/zoom.png
Binary file not shown.
Binary file added assets/images/zoom.webp
Binary file not shown.
17 changes: 8 additions & 9 deletions home_precourse.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
title: "Precourse"
subtitle: "These are steps to be completed before the workshop"
date: ""
toc: false
sidebar: false
format: html
---
Expand All @@ -11,7 +10,7 @@ format: html
#| include: false
library(yaml)
library(here)
id_project <- yaml::read_yaml(here("_quarto.yml"))$id_project
id_project <- yaml::read_yaml(here("_quarto.yml"))$id_project
id_project_backup <- yaml::read_yaml(here("_quarto.yml"))$id_project_backup
```

Expand All @@ -23,7 +22,7 @@ A SUPR/NAISS account is needed to create the accounts for the computers we will

If you do not already have one, create an account at [SUPR/NAISS](https://supr.naiss.se). Then, Log in to [SUPR/NAISS](https://supr.naiss.se/), preferably using the SWAMID.

![](assets/images/supr-login.jpg){width="70%"}
![](assets/images/supr-login.webp){width="70%"}

Before proceeding with applying for project membership and user accounts, we have to accept the NAISS User Agreement. Do this by clicking the [Personal Information](https://supr.naiss.se/person/) link in the left sidebar menu. The scroll down a bit until you reach the section **User Agreements**. If you already have accepted it the State will be a green box with the text Accepted in it. If it is anything else, click it to start the accepting process.

Expand All @@ -36,11 +35,11 @@ This is where you might run into trouble if you don't have a SWAMID connected ac

Remote computing cluster UPPMAX will be use as a fallback cluster, if there should be any problems at PDC. After making sure you have an accepted user agreement, go to the [**SUPR/NAISS Projects**](https://supr.naiss.se/project/) page and request membership to the project ID: [**`r id_project_backup`**]{.badge}

![](assets/images/supr-request.jpg){width="70%"}
![](assets/images/supr-request.webp){width="70%"}

Once you are accepted to a project, you should see that project listed under your active projects.

![](assets/images/supr-projects.jpg){width="70%"}
![](assets/images/supr-projects.webp){width="70%"}


Finally you need to request a login account to UPPMAX. This will be the account you use to log in to the actual computers, so it is not the same as your SUPR account. Login to SUPR and go to the [Accounts page](https://supr.naiss.se/account/). Under the **Possible Resource Account Requests** heading click on **Request Account on Rackham @ UPPMAX** button and confirm it on the next page. If it is missing from this page, it could be because you already have a login account created (only 1 account per person allowed), or that you have not yet gotten your project memberships approved.
Expand Down Expand Up @@ -72,25 +71,25 @@ Please make sure you have a working [Eduroam](https://eduroam.org/) wifi connect

### ThinLinc

[![](assets/images/thinlinc.png){height="50px"}]((https://www.cendio.com/thinlinc/download))
[![](assets/images/thinlinc.webp){height="50px"}]((https://www.cendio.com/thinlinc/download))

ThinLinc allows graphical connection to UPPMAX. Download and install from [https://www.cendio.com/thinlinc/download](https://www.cendio.com/thinlinc/download). It can be used directly from the browser but it is recommended to download and install the client for better copy/paste operation.

### XQuartz

[![](assets/images/xquartz.png){height="60px"}](https://www.xquartz.org/)
[![](assets/images/xquartz.webp){height="60px"}](https://www.xquartz.org/)

Mac users will need to download and install [XQuartz](https://www.xquartz.org/) for X11 forwarding. *ie*; to forward remotely opened windows to local machine.

### MobaXterm (Optional)

[![](assets/images/mobaxterm.png){height="60px"}](http://mobaxterm.mobatek.net)
[![](assets/images/mobaxterm.webp){height="60px"}](http://mobaxterm.mobatek.net)

If you are on a Windows system, and you want to open graphical applications from the terminal, we recommend [MobaXterm](http://mobaxterm.mobatek.net). It is recommended that you INSTALL the program and not use the portable version. MobaXterm also has an integrated SFTP file browser.

### Filezilla (Optional)

[![](assets/images/filezilla.png){height="60px"}](https://filezilla-project.org/)
[![](assets/images/filezilla.webp){height="60px"}](https://filezilla-project.org/)

When you need to transfer data between the remote cluster and your computer, you can use the tools SCP or SFTP through the terminal. Windows users can use the SFTP browser available with MobaXterm. If you prefer a GUI to upload and download files from the remote cluster, we recommend installing [FileZilla](https://filezilla-project.org/).

Expand Down
3 changes: 2 additions & 1 deletion index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ format:
number-sections: false
title-block-banner: false
page-layout: full
anchor-sections: false
execute:
freeze: false
---
Expand Down Expand Up @@ -38,7 +39,7 @@ Updated: {{< meta current_date >}} at {{< meta current_time >}}.
:::
:::{.home-grid-child-right}

![](assets/images/hero.png){.nolightbox}
![](assets/images/hero.webp){.nolightbox}

:::
:::
Expand Down
37 changes: 18 additions & 19 deletions topics/linux/lab_linux_advanced.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,12 @@ format: html
```{r,eval=TRUE,include=FALSE}
library(yaml)
library(here)
upid <- yaml::read_yaml(here("_quarto.yml"))$uppmax_project
id_project <- yaml::read_yaml(here("_quarto.yml"))$id_project
path_resources <- file.path(yaml::read_yaml(here("_quarto.yml"))$path_resources, "linux")
path_linux <- file.path(yaml::read_yaml(here("_quarto.yml"))$path_workspace, "linux")
path_linux_adv <- file.path(path_linux, "linux_advanced")
```

::: {.callout-note}
In code blocks, the dollar sign (`$`) is not to be printed. The dollar sign is usually an indicator that the text following it should be typed in a terminal window.
:::

## Connect to UPPMAX

The first step of this lab is to open a ssh connection to UPPMAX. Please refer to [**Connecting to UPPMAX**](../other/lab_connect.html) for instructions. Once connected to UPPMAX, return here and continue reading the instructions below.
Expand Down Expand Up @@ -48,7 +47,7 @@ ssh -Y r292
If the list is empty you can run the allocation command again and it should be in the list:

```{r,echo=FALSE,comment="",class.output="bash"}
cat(paste0("salloc -A ", upid, " -t 03:30:00 -p core -n 1 --no-shell &"))
cat(paste0("salloc -A ", id_project, " -t 03:30:00 -p shared -n 1 --no-shell &"))
```

{{< fa lightbulb >}} There is a UPPMAX specific tool called `jobinfo` that supplies the same kind of information as `squeue` that you can use as well (`$ jobinfo -u username`).
Expand Down Expand Up @@ -89,15 +88,15 @@ Next, copy the lab files from this folder. `-r` means recursively, which means a
#| echo: false
#| class-output: bash
cat("cp -r <source> <destination>\n")
cat(paste0("cp -r /sw/courses/ngsintro/linux/linux_advanced /proj/", upid, "/nobackup/username"))
cat(paste0("cp -r ", path_resources, "linux_advanced ", path_linux_adv))
```

Have a look in ``r paste0("/proj/",upid,"/nobackup/username/linux_advanced")``.
Have a look in ``r path_linux_adv``.

```{r}
#| echo: false
#| class-output: bash
cat(paste0("cd /proj/", upid, "/nobackup/username/linux_advanced"))
cat(paste0("cd ", path_linux_adv))
```

```bash
Expand Down Expand Up @@ -160,7 +159,7 @@ echo The volume of the rectangular cuboid with the sides $x,$y,$z is $(($x*$y*$z

First off, let's open another terminal to UPPMAX so that you have 2 of them open. Scripting is a lot easier if you have one terminal on the command line ready to run commands and test things, and another one with a text editor where you write the actual code. That way you will never have to close down the text editor when you want to run the script you are writing on, and then open it up again when you want to continue editing the code.

So open a new terminal window, connect it to UPPMAX and then connect it to the node you have booked. Make sure both terminals are in the ``r paste0("/proj/",upid,"/nobackup/username/linux_advanced")`` directory, and start editing a new file with gedit or nano where you write your script. Name the file whatever you want, but in the examples I will refer to it as `loop_01.sh`. Write your loops to this file (or create a new file for each new example) and test run it in the other terminal.
So open a new terminal window, connect it to UPPMAX and then connect it to the node you have booked. Make sure both terminals are in the ``r path_linux_adv`` directory, and start editing a new file with gedit or nano where you write your script. Name the file whatever you want, but in the examples I will refer to it as `loop_01.sh`. Write your loops to this file (or create a new file for each new example) and test run it in the other terminal.

**NOTE:** If you get error messages like `(gedit:27463): dconf-WARNING **: 10:59:00.575: failed to commit changes to dconf: Failed to execute child process “dbus-launch” (No such file or directory)`, and if you can't change any preferences, you can try starting gedit through the graphical menu in ThinLic instead. If you are using the Xfce desktop environment you should have a start-menu-like button at the top-left of the screen named `Applications`, or if you right-click somewhere on the desktop you should find it in the context menu that pops up. In the `Applications` menu, look in the category `Accessories` and you should find a program called `Text editor` which will start gedit *(hopefully without the errors).

Expand Down Expand Up @@ -229,7 +228,7 @@ echo Happy New Year everyone!!"))
Let's try to do something similar to the example in the lecture slides, to run the same commands on multiple files. In the Introduction to UPPMAX, we learned how to use samtools to convert BAM files to SAM files so that humans can read them.
In real life you will never do this, instead you will most likely always do it the other way around. SAM files take up ~4x more space on the hard drive compared to the same file in BAM format, so as soon as you see a SAM file you should convert it to a BAM file instead to conserve hard drive space. If you have many SAM files that needs converting you don't want to sit there and type all the commands by hand like a pleb.

{{< fa clipboard-list >}} Write a script that converts all the SAM files in a specified directory to BAM files. Incidentally, you can find 50 SAM files in need of conversion in the folder called `sam` in the folder you copied to your folder earlier in this lab (``r paste0("/proj/",upid,"/nobackup/username/linux_advanced/sam")``). Bonus points if you make the program take the specified directory as an argument, and another bonus point if you get the program to name the resulting BAM file to the same name as the SAM file but with a .bam ending instead.
{{< fa clipboard-list >}} Write a script that converts all the SAM files in a specified directory to BAM files. Incidentally, you can find 50 SAM files in need of conversion in the folder called `sam` in the folder you copied to your folder earlier in this lab (``r file.path(path_linux_adv, "sam")``). Bonus points if you make the program take the specified directory as an argument, and another bonus point if you get the program to name the resulting BAM file to the same name as the SAM file but with a .bam ending instead.

::: {.callout-tip}
Remember that you have to load the samtools module to be able to run it. The way you get samtools to convert a SAM file to a BAM file is by typing the following command:
Expand Down Expand Up @@ -452,7 +451,7 @@ When the analysis is done, only fastq files and sorted and indexed BAM files sho

{{< fa lightbulb >}} Read more about the `$SNIC_TMP` variable in the [disk storage guide](http://www.uppmax.uu.se/support/user-guides/disk-storage-guide/) on the UPPMAX homepage.

There is a bunch of fastq files in the directory ``r paste0("/proj/",upid,"/nobackup/username/linux_advanced/fastq/")`` that is to be used for this exercise.
There is a bunch of fastq files in the directory ``r paste0(path_linux_adv, "/fastq/")`` that is to be used for this exercise.

Basic solution:

Expand All @@ -464,7 +463,7 @@ cat(paste0("# make the dummy pipeline available
export PATH=$PATH:/sw/courses/ngsintro/linux/uppmax_pipeline_exercise/dummy_scripts
# index the reference genome
reference_indexer -r /proj/", upid, "/nobackup/username/filetypes/0_ref/ad2.fa
reference_indexer -r ", path_linux_adv, "/filetypes/0_ref/ad2.fa
# go to the input files
cd $1
Expand All @@ -474,7 +473,7 @@ for file in *.fastq;
do
# align the reads
align_reads -r /proj/", upid, "/nobackup/username/filetypes/0_ref/ad2.fa -i $file -o $file.sam
align_reads -r ", path_linux_adv, "/filetypes/0_ref/ad2.fa -i $file -o $file.sam
# convert the sam file to a bam file
sambam_tool -f bam -i $file.sam -o $file.bam
Expand All @@ -498,9 +497,9 @@ cat(paste0("# make the dummy pipeline available in this script
export PATH=$PATH:/sw/courses/ngsintro/linux/uppmax_pipeline_exercise/dummy_scripts
# index the reference genome once, only if needed
if [ ! -f /proj/", upid, "/nobackup/username/filetypes/0_ref/ad2.fa.idx ];
if [ ! -f ", path_linux, "/filetypes/0_ref/ad2.fa.idx ];
then
reference_indexer -r /proj/", upid, '/nobackup/username/filetypes/0_ref/ad2.fa
reference_indexer -r ", path_linux, '/filetypes/0_ref/ad2.fa
fi
Expand Down Expand Up @@ -529,8 +528,8 @@ do
# print a temporary script file that will be submitted to slurm
echo "#!/bin/bash -l
#SBATCH -A ', upid, '
#SBATCH -p core
#SBATCH -A ', id_project, '
#SBATCH -p shared
#SBATCH -n 1
#SBATCH -t 00:05:00
#SBATCH -J $file_basename
Expand All @@ -543,7 +542,7 @@ do
# You have to escape the dollar sign in SNIC_TMP to keep bash from resolving
# it to its value in the submitter script already.
echo "Copying data to node local hard drive"
cp /proj/', upid, '/nobackup/username/filetypes/0_ref/ad2.fa* $file $SNIC_TMP/
cp ', path_linux, '/filetypes/0_ref/ad2.fa* $file $SNIC_TMP/
# go the the nodes local hard drive
echo "Changing directory to node local hard drive"
Expand Down
18 changes: 8 additions & 10 deletions topics/linux/lab_linux_filetypes.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,11 @@ format: html
```{r,eval=TRUE,include=FALSE}
library(yaml)
library(here)
upid <- yaml::read_yaml(here("_quarto.yml"))$uppmax_project
id_project <- yaml::read_yaml(here("_quarto.yml"))$id_project
path_resources <- file.path(yaml::read_yaml(here("_quarto.yml"))$path_resources, "linux")
path_linux <- file.path(yaml::read_yaml(here("_quarto.yml"))$path_workspace, "linux")
```

::: {.callout-note}
In code blocks, the dollar sign (`$`) is not to be printed. The dollar sign is usually an indicator that the text following it should be typed in a terminal window.
:::

## Connect to UPPMAX

The first step of this lab is to open a ssh connection to UPPMAX. Please refer to [**Connecting to UPPMAX**](../other/lab_connect.html) for instructions. Once connected to UPPMAX, return here and continue reading the instructions below.
Expand All @@ -24,7 +22,7 @@ The first step of this lab is to open a ssh connection to UPPMAX. Please refer t
Usually you would do most of the work in this lab directly on one of the login nodes at UPPMAX, but we have arranged for you to have one core each for better performance. This was covered briefly in the lecture notes.

```{r,echo=FALSE,comment="",class.output="bash"}
cat(paste0("salloc -A ", upid, " -t 07:00:00 -p core -n 1 --no-shell &"))
cat(paste0("salloc -A ", id_project, " -t 07:00:00 -p shared -n 1 --no-shell &"))
```

check which node you got (replace **username** with your UPPMAX username)
Expand Down Expand Up @@ -61,13 +59,13 @@ Next, copy the lab files from this folder. `-r` means recursively, which means a

```{r,echo=FALSE,comment="",class.output="bash"}
cat("cp -r <source> <destination>\n")
cat(paste0("cp -r /sw/courses/ngsintro/linux/filetypes /proj/", upid, "/nobackup/username/"))
cat(paste0("cp -r ",path_resources,"/filetypes ", path_linux))
```

Have a look in **`r paste0("/proj/",upid,"/nobackup/username/")`**.
Have a look in **`r path_linux`**.

```{r,echo=FALSE,comment="",class.output="bash"}
cat(paste0("cd /proj/", upid, "/nobackup/username/filetypes\n"))
cat(paste0("cd ", path_linux, "/filetypes\n"))
cat("tree")
```

Expand Down Expand Up @@ -318,7 +316,7 @@ If you notice that IGV over Xforwarding is excruciatingly slow, you can try to u

There are 3 files we have to load in IGV.

The first is the reference genome. Press the menu button located at **"Genomes - Load Genome from File..."** and find your reference genome in **0_ref/ad2.fa**. If you are having trouble finding your files, note that IGV always starts in your home directory. Use the dropdown menu at the top to navigate to **`r paste0("/proj/",upid,"/nobackup/...")`**.
The first is the reference genome. Press the menu button located at **"Genomes - Load Genome from File..."** and find your reference genome in **0_ref/ad2.fa**. If you are having trouble finding your files, note that IGV always starts in your home directory. Use the dropdown menu at the top to navigate to **`r paste0(path_linux,"...")`**.

The second file you have to load is the reads. Press the menu button **"File - Load from File..."** and select your **3_sorted/ad2.sorted.bam**.

Expand Down
4 changes: 0 additions & 4 deletions topics/linux/lab_linux_intro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,6 @@ site_url <- yaml::read_yaml(here("_quarto.yml"))$website$`site-url`
output_dir <- yaml::read_yaml(here("_quarto.yml"))$project$`output-dir`
```

::: {.callout-note}
In code blocks, the dollar sign (`$`) is not to be printed. The dollar sign is usually an indicator that the text following it should be typed in a terminal window.
:::

## Connect to PDC

The first step of this lab is to open a ssh connection to PDC. Please refer to [**Connecting to PDC**](../other/lab_connect_pdc.html) for instructions. Once connected to PDC, return here and continue reading the instructions below.
Expand Down
Loading

0 comments on commit c0cc4fb

Please sign in to comment.