Skip to content

Commit

Permalink
Merge branch 'main' into rewrite-linux-tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
lbarraga authored Aug 19, 2024
2 parents a540e97 + 0b26872 commit 177b2f3
Show file tree
Hide file tree
Showing 28 changed files with 281 additions and 340 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ cd $PBS_O_WORKDIR
# load the environment

module purge
module load intel
module load foss

mpirun ./mpihello
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#PBS -l walltime=5:0:0
#PBS -l nodes=1:ppn=quarter:gpus=1

module load TensorFlow/2.6.0-foss-2021a-CUDA-11.3.1
module load TensorFlow/2.11.0-foss-2022a-CUDA-11.7.0

cd $PBS_O_WORKDIR
python example.py
2 changes: 1 addition & 1 deletion intro-HPC/examples/Job-script-examples/multi_core.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#PBS -N mpi_hello ## job name
#PBS -l nodes=2:ppn=all ## 2 nodes, all cores per node
#PBS -l walltime=2:00:00 ## max. 2h of wall time
module load intel/2017b
module load foss/2023a
module load vsc-mympirun ## We don't use a version here, this is on purpose
# go to working directory, compile and run MPI hello world
cd $PBS_O_WORKDIR
Expand Down
2 changes: 1 addition & 1 deletion intro-HPC/examples/Job-script-examples/single_core.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#PBS -N count_example ## job name
#PBS -l nodes=1:ppn=1 ## single-node job, single core
#PBS -l walltime=2:00:00 ## max. 2h of wall time
module load Python/3.6.4-intel-2018a
module load Python/3.11.3-GCCcore-12.3.0
# copy input data from location where job was submitted from
cp $PBS_O_WORKDIR/input.txt $TMPDIR
# go to temporary working directory (on local disk) & run
Expand Down
2 changes: 1 addition & 1 deletion intro-HPC/examples/MATLAB/jobscript.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
#

# make sure the MATLAB version matches with the one used to compile the MATLAB program!
module load MATLAB/2018a
module load MATLAB/2022b-r5

# use temporary directory (not $HOME) for (mostly useless) MATLAB log files
# subdir in $TMPDIR (if defined, or /tmp otherwise)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ cd $PBS_O_WORKDIR

# load the environment

module load intel
module load foss

mpirun ./mpi_hello
4 changes: 2 additions & 2 deletions intro-HPC/examples/OpenFOAM/OpenFOAM_damBreak.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#PBS -l walltime=1:0:0
#PBS -l nodes=1:ppn=4
# check for more recent OpenFOAM modules with 'module avail OpenFOAM'
module load OpenFOAM/6-intel-2018a
module load OpenFOAM/11-foss-2023a
source $FOAM_BASH
# purposely not specifying a particular version to use most recent mympirun
module load vsc-mympirun
Expand All @@ -15,7 +15,7 @@ export MYMPIRUN_VARIABLESPREFIX=WM_PROJECT,FOAM,MPI
export WORKDIR=$VSC_SCRATCH_NODE/$PBS_JOBID # for single-node jobs
mkdir -p $WORKDIR
# damBreak tutorial, see also https://cfd.direct/openfoam/user-guide/dambreak
cp -r $FOAM_TUTORIALS/multiphase/interFoam/laminar/damBreak/damBreak $WORKDIR
cp -r $FOAM_TUTORIALS/incompressibleVoF/damBreakLaminar/damBreak $WORKDIR
cd $WORKDIR/damBreak
echo "working directory: $PWD"
# pre-processing: generate mesh
Expand Down
2 changes: 1 addition & 1 deletion intro-HPC/examples/Program-examples/04_MPI_C/mpihello.pbs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ cd $PBS_O_WORKDIR
# load the environment

module purge
module load intel
module load foss

mpirun ./mpihello
51 changes: 50 additions & 1 deletion mkdocs/docs/HPC/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ It is possible to use the modules without specifying a version or toolchain. How
this will probably cause incompatible modules to be loaded. Don't do it if you use multiple modules.
Even if it works now, as more modules get installed on the HPC, your job can suddenly break.

## Troubleshooting jobs
## Troubleshooting

### My modules don't work together

Expand Down Expand Up @@ -226,6 +226,29 @@ information, see .

{% endif %}


### Why do I get a "No space left on device" error, while I still have storage space left?

When trying to create files, errors like this can occur:

```shell
No space left on device
```

The error "`No space left on device`" can mean two different things:

- all available *storage quota* on the file system in question has been used;
- the *inode limit* has been reached on that file system.

An *inode* can be seen as a "file slot", meaning that when the limit is reached, no more additional files can be created.
There is a standard inode limit in place that will be increased if needed.
The number of inodes used per file system can be checked on [the VSC account page](https://account.vscentrum.be).

Possible solutions to this problem include cleaning up unused files and directories or
[compressing directories with a lot of files into zip- or tar-files](linux-tutorial/manipulating_files_and_directories.md#zipping-gzipgunzip-zipunzip).

If the problem persists, feel free to [contact support](FAQ.md#i-have-another-questionproblem).

## Other

### Can I share my account with someone else?
Expand Down Expand Up @@ -350,6 +373,32 @@ See also: [Your UGent home drive and shares](running_jobs_with_input_output_data
{% endif %}


### My home directory is (almost) full, and I don't know why

Your home directory might be full without looking like it due to hidden files.
Hidden files and subdirectories have a name starting with a dot and do not show up when running `ls`.
If you want to check where the storage in your home directory is used, you can make use of the [`du` command](running_jobs_with_input_output_data.md#check-your-quota) to find out what the largest files and subdirectories are:

```shell
du -h --max-depth 1 $VSC_HOME | egrep '[0-9]{3}M|[0-9]G'
```

The `du` command returns the size of every file and subdirectory in the $VSC_HOME directory. This output is then piped into an [`egrep`](linux-tutorial/beyond_the_basics.md#searching-file-contents-grep) to filter the lines to the ones that matter the most.

The `egrep` command will only let entries that match with the specified regular expression `[0-9]{3}M|[0-9]G` through, which corresponds with files that consume more than 100 MB.


### How can I get more storage space?


[By default](running_jobs_with_input_output_data.md#quota) you get 3 GB of storage space for your home directory and 25 GB in your personal directories on both the data (`$VSC_DATA`) and scratch (`$VSC_SCRATCH`) filesystems.
It is not possible to expand the storage quota for these personal directories.

You can get more storage space through a [Virtual Organisation (VO)](running_jobs_with_input_output_data.md#virtual-organisations),
which will give you access to the [additional directories](running_jobs_with_input_output_data.md#vo-directories) in a subdirectory specific to that VO (`$VSC_DATA_VO` and `$VSC_SCRATCH_VO`).
The moderators of a VO can [request more storage](running_jobs_with_input_output_data.md#requesting-more-storage-space) for their VO.


### Why can't I use the `sudo` command?

When you attempt to use sudo, you will be prompted for a password.
Expand Down
8 changes: 4 additions & 4 deletions mkdocs/docs/HPC/MATLAB.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ license, licenses would quickly run out.

Compiling MATLAB code can only be done from the login nodes, because
only login nodes can access the MATLAB license server, workernodes on
clusters can not.
clusters cannot.

To access the MATLAB compiler, the `MATLAB` module should be loaded
first. Make sure you are using the same `MATLAB` version to compile and
Expand Down Expand Up @@ -93,7 +93,7 @@ with:
<pre><code>$ <b>export _JAVA_OPTIONS="-Xmx64M"</b>
</code></pre>

The MATLAB compiler spawns multiple Java processes, and because of the
The MATLAB compiler spawns multiple Java processes. Because of the
default memory limits that are in effect on the login nodes, this might
lead to a crash of the compiler if it's trying to create to many Java
processes. If we lower the heap size, more Java processes will be able
Expand Down Expand Up @@ -122,7 +122,7 @@ controlled via the `parpool` function: `parpool(16)` will use 16
workers. It's best to specify the amount of workers, because otherwise
you might not harness the full compute power available (if you have too
few workers), or you might negatively impact performance (if you have
too much workers). By default, MATLAB uses a fixed number of workers
too many workers). By default, MATLAB uses a fixed number of workers
(12).

You should use a number of workers that is equal to the number of cores
Expand Down Expand Up @@ -163,7 +163,7 @@ You should remove the directory at the end of your job script:
## Cache location

When running, MATLAB will use a cache for performance reasons. This
location and size of this cache can be changed trough the
location and size of this cache can be changed through the
`MCR_CACHE_ROOT` and `MCR_CACHE_SIZE` environment variables.

The snippet below would set the maximum cache size to 1024MB and the
Expand Down
8 changes: 4 additions & 4 deletions mkdocs/docs/HPC/alphafold.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,9 @@ export ALPHAFOLD_DATA_DIR={{directory}}/{{version}}

### Running AlphaFold

AlphaFold provides run script called [run_alphafold.py](https://raw.githubusercontent.com/deepmind/alphafold/main/run_alphafold.py)
AlphaFold provides a script called [run_alphafold.py](https://raw.githubusercontent.com/deepmind/alphafold/main/run_alphafold.py)

A symbolic link named *alphafold* that points to the this script is included,
A symbolic link named *alphafold* that points to this script is included,
so you can just use `alphafold` instead of `run_alphafold.py` or `python run_alphafold.py` after loading the AlphaFold module.

The `run_alphafold.py` script has also been slightly modified such that defining the `$ALPHAFOLD_DATA_DIR` (see [above](./#setting-up-the-environment)) is sufficient to pick up all the data provided in that location,
Expand Down Expand Up @@ -158,7 +158,7 @@ This highlights the difference between CPU and GPU performance even more.
The following example comes from the official [Examples section]({{readme}}#examples) in the Alphafold [README]({{readme}}).
The run command is slightly different (see above: [Running AlphaFold](./running-alphafold)).

Do not forget to setup the environment (see above: [Setting up the environment](./setting-up-the-environment)).
Do not forget to set up the environment (see above: [Setting up the environment](./setting-up-the-environment)).

### Folding a monomer

Expand Down Expand Up @@ -193,7 +193,7 @@ The main difference between using a GPU or CPU in a job script is what module to
For running AlphaFold on GPU, use an AlphaFold module that mentions `CUDA` (or `cuda`),
for example `AlphaFold/2.3.1-foss-2022a-CUDA-11.7.0`.

To run the jobs cripts you need to create a file named `T1050.fasta` with the following content:
To run the job scripts you need to create a file named `T1050.fasta` with the following content:

```fasta
>T1050 A7LXT1, Bacteroides Ovatus, 779 residues|
Expand Down
3 changes: 1 addition & 2 deletions mkdocs/docs/HPC/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@
9. Submit small jobs by grouping them together. See chapter [Multi-job submission](multi_job_submission.md) for
how this is done.

10. The runtime is limited by the maximum walltime of the queues. For
longer walltimes, use checkpointing.
10. The runtime is limited by the maximum walltime of the queues.

11. Requesting many processors could imply long queue times. It's
advised to only request the resources you'll be able to use.
Expand Down
Loading

0 comments on commit 177b2f3

Please sign in to comment.