Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
SamueleSoraggi committed Jun 25, 2024
1 parent 6d81466 commit df863db
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 43 deletions.
98 changes: 55 additions & 43 deletions access/genomedk.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,16 @@ hide:

If you are using GenomeDK, you have two options. One is to use a pre-packaged Docker container, which contains jupyterlab and the necessary packages you need to run all the notebooks. GenomeDK comes with `singularity`, which can import and execute Docker containers (with some perks, such as not showing system folders in the container, but we are going to take care about it by running a simple script) and is able to ensure full reproducibility of the analysis. The second option is to download the github repository of the course and create your own conda environment: this solution works also on any computing cluster where you can have `conda` installed and is shown [in the page dedicate to the access with any computing cluster](./otherHPC.qmd).

:::{.callout-warning title="Technical prerequisites"}

- if you do not yet have an account on GenomeDK, please get one [Click on this link to get to the account request.](https://console.genome.au.dk/user-requests/create/) and follow the instructions for the 2-factor authentication.

- you need to have (or be part of) an active project on GenomeDK. This ensures you can get some computing resources to run the course material. [Follow these instructions to request a project.](https://genome.au.dk/docs/projects-and-accounting/#requesting-a-project).

- In Windows and the Powershell command line, commands might need `.exe` at the end, such as `ssh.exe` instead of `ssh`. Newer versions of Windows do not require that, though.

:::

## Singularity container

**1.** Log into the cluster using the command line, and substituting `USERNAME` with your actual user name:
Expand All @@ -19,15 +29,14 @@ If you are using GenomeDK, you have two options. One is to use a pre-packaged Do
ssh [email protected]
```

:::{.callout-warning title="Technical prerequisites"}

- if you do not yet have an account on GenomeDK, please get one [Click on this link to get to the account request.](https://console.genome.au.dk/user-requests/create/) and follow the instructions for the 2-factor authentication.
and be sure to run those two commands to remove space-filling cache data

- you need to have (or be part of) an active project on GenomeDK. This ensures you can get some computing resources to run the course material. [Follow these instructions to request a project.](https://genome.au.dk/docs/projects-and-accounting/#requesting-a-project).
```{.bash}
- In Windows and the Powershell command line, commands might need `.exe` at the end, such as `ssh.exe` instead of `ssh`. Newer versions of Windows do not require that, though.
rm -rf ~/.apptainer/cache/*
rm -rf ~/.singularity/cache/*
:::
```

**2.** Get into a folder inside your project, for example

Expand All @@ -39,16 +48,15 @@ cd MYPROJECT/ngsSummerSchool

:::{.callout-warning title="NGS summer school 2024"}

Get instead into the folder for the course:
If you are at the NGS summer school 2024, run the following commands to create your own work folder inside the course project folder. Always go back to that in your future sessions:

```{.bash}
cd NGS_summer_school/USERNAME
mkdir -p ngssummer2024/`whoami`
cd ngssummer2024/`whoami`
```

where you substitute `USERNAME` with your own user id.

:::

**3.** Use `singularity` to download the container of the course. This will take some time and show a lot of text, and at the end a file called `course.sif` is created into the folder.
Expand All @@ -59,70 +67,83 @@ singularity pull course.sif docker://hdssandbox/ngssummerschool:2024.07
```

**4.** Now it's time to get a few resources to run all the material. We suggest one CPU and 32GB of RAM for the first three modules, and 2 CPUs and 64GB of RAM for the single-cell analysis. For the first configuration suggested, you get resources using
:::{.callout-warning}

You need to do this step only once!

:::

**4.** Now it's time to get a few resources to run all the material. We suggest one CPU and 32GB of RAM for the first three modules, and 2 CPUs and 64GB of RAM for the single-cell analysis. For the first configuration suggested, for example, you get resources using

```{.bash}
srun --mem=32g --cores=1 --time=8:0:0 --account=MYPROJECT --pty /bin/bash
srun --mem=32g --cores=1 --time=4:0:0 --account=MYPROJECT --pty /bin/bash
```

and very similarly for the second configuration, when you want instead to work on the single cell analysis.
:::{.callout-warning title="NGS summer school 2024"}

:::{.callout-warning}
If you are at the NGS summer school 2024, use `ngssummer2024` instead of `MYPROJECT`

```{.bash}
srun --mem=32g --cores=1 --time=4:0:0 --account=ngssummer2024 --pty /bin/bash
Note you need your project name, and you can also choose for how long you want the resources to be available to you. **Asking for resources means waiting for some time in a queue before they are assigned.**
```

:::

**5.** Once resources are assigned, note down the node name. This is on the left side of the command line: for example, in the figure below, the node is `s21n33`
:::{.callout-note}

![](../images/genomedkNode.png){fig-align="center" width="400px"}
Note you always need your project name, and you can also choose for how long you want the resources to be available to you. **Asking for resources means waiting for some time in a queue before they are assigned.**

In the example above `time` is 4 hours. After this time, whatever you are doing will be closed, so be sure to save your work in progress.

**6.** execute the container with
:::

**5.** execute the container with

```{.bash}
singularity exec course.sif /bin/bash
```

Note that the command line shows now `Apptainer>` on its left. We are *inside* the container and the tools we need are now available into it.

**7.** Now we need to run a configuration script, which will setup jupyterlab so that the packages are detected correctly. This is downloaded from the internet and runs immediately, downloading also the necessary data. If a folder called `Data` exists, it will not download the data again (also meaning that you can use our container with your own data folder for your own analysis in future)
**6.** Now we need to run a configuration script, which will setup jupyterlab so that the packages are detected correctly. This is downloaded from the internet and runs immediately, downloading also the necessary data. If a folder called `Data` exists, it will not download the data again (also meaning that you can use our container with your own data folder for your own analysis in future)

```{.bash}
git config --global http.sslVerify false
wget -qO- https://raw.githubusercontent.com/hds-sandbox/NGS_summer_course_Aarhus/docker/scripts/courseMaterial.sh | bash
```

:::{.callout-warning}

You need to create the file `course.dif` only once. Next time, you only need the configuration script.

:::

**8.** We are ready to go. Activate the environment and start jupyterLab with the following:
**6.** We are ready to go. Activate the environment and start jupyterLab with the following:

```{.bash}
unset XDG_RUNTIME_DIR
conda activate /opt/conda/envs/NGS_aarhus_py
jupyter-lab --no-browser --port=$UID --ip=0.0.0.0
jupyter-lab --no-browser --port=$UID --ip=0.0.0.0 --NotebookApp.token='' --NotebookApp.password=''
```

you will see a lot of messages, which is normal. You need also to create a tunnel between your computer and genomeDK to be able to see jupyterlab in your browser. Now you need to use the node name you wrote down before! **Open a new terminal window** and write
you will see a lot of messages, which is normal. At the end of the messages, you are provided two links looking as in the image below. Write down the node name and the user id highlighted in the circles. You need them in the following steps.

![](../images/nodeAndUsername.png){width=600px}


**7.** You need also to create a tunnel between your computer and genomeDK to be able to see jupyterlab in your browser. Now you need to use the node name and the user id you wrote down before! **Open a new terminal window** and write

```{.bash}
ssh -L6835:NODENAME:6835 [email protected]
ssh -L USERID:NODENAME:USERID [email protected]
```

where you substitute `NODENAME` with the correct depiction,and USERNAME with your own user id.
where you substitute `USERID` and `NODENAME` as you wrote down before, and then USERNAME is your account name on GenomeDK. For example `ssh -L 6835:s21n81:6835 [email protected]` according to the figure above for a user with name `samuele`.

**9.** Open your browser and go to the address [http://127.0.0.1:6835/lab](http://127.0.0.1:6835/lab). Jupyterlab opens
**8.** Open your browser and go to the address http://127.0.0.1:USERID/lab, where you need your user id again instead of USERID. For example `http://127.0.0.1:6835/lab` from the figure above. Jupyterlab opens in your browser.


**10.** Now you are ready to use JupyterLab for coding. Use the file browser (on the left-side) to find the folder `Notebooks`. Select one of the four tutorials of the course. You will see that the notebook opens on the right-side pane. Read the text of the tutorial and execute each code cell starting from the first. You will see results showing up directly on the notebook!
**9.** Now you are ready to use JupyterLab for coding. Use the file browser (on the left-side) to find the folder `Notebooks`. Select one of the four tutorials of the course. You will see that the notebook opens on the right-side pane. Read the text of the tutorial and execute each code cell starting from the first. You will see results showing up directly on the notebook!

![](../images/startNotebook.gif)

Expand All @@ -132,18 +153,9 @@ Right click on a notebook or a saved results file, and use the download option t

:::

**11.** At the end of your session, it is a good idea to empty the cache of `singularity`. This will fill up your home folder very quickly (size limit is 100GB). Simply run these two commands:

```{.bash}
rm -rf /home/samuele/.apptainer/cache/*
rm -rf ~/.singularity/cache/*
```

### Recovering the material from your previous session

Everything is saved in the folder you are working in. Next time, follow the whole procedure again - the download script will only link the packages to jupyterlab and avoid downloading new data, notebooks and scripts, because the folders will be detected as existing!
Everything is saved in the folder you are working in. Next time, follow the whole procedure again (without step number **3.**) - the download script will only link the packages to jupyterlab and avoid downloading new data, notebooks and scripts, because the folders will be detected as existing!



Binary file added images/nodeAndUsername.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit df863db

Please sign in to comment.