Skip to content

Commit

Permalink
Merge main
Browse files Browse the repository at this point in the history
  • Loading branch information
lbarraga committed Sep 5, 2024
1 parent 8fb0f37 commit c72b4db
Show file tree
Hide file tree
Showing 120 changed files with 4,584 additions and 2,116 deletions.
3 changes: 3 additions & 0 deletions config/templates/hpc.template
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ nav:
- Troubleshooting: troubleshooting.md
- HPC Policies: sites/hpc_policies.md
- Advanced topics:
- Torque frontend via jobcli: torque_frontend_via_jobcli.md
- Fine-tuning Job Specifications: fine_tuning_job_specifications.md
- Multi-job submission: multi_job_submission.md
- Compiling and testing your software on the HPC: compiling_your_software.md
Expand All @@ -49,9 +50,11 @@ nav:
- AlphaFold: alphafold.md
- Apptainer/Singularity: apptainer.md
- EasyBuild: easybuild.md
- Jupyter notebook: jupyter.md
- MATLAB: MATLAB.md
- mympirun: mympirun.md
- OpenFOAM: openFOAM.md
- Python virtual environments: setting_up_python_virtual_environments.md
- FAQ:
- Frequently Asked Questions: FAQ.md
- Appendices:
Expand Down
1 change: 0 additions & 1 deletion intro-HPC/examples/MATLAB/jobscript.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
#PBS -l walltime=1:0:0
#
# Example (single-core) MATLAB job script
# see http://hpcugent.github.io/vsc_user_docs/
#

# make sure the MATLAB version matches with the one used to compile the MATLAB program!
Expand Down
28 changes: 18 additions & 10 deletions mkdocs/docs/HPC/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Overview of HPC-UGent Tier-2 [infrastructure]({{ hpc_infrastructure_url }})

### How many cores/nodes should I request?

An important factor in this question is how well your task is being parallellized:
An important factor in this question is how well your task is being parallelized:
does it actually run faster with more resources? You can test this yourself:
start with 4 cores, then 8, then 16... The execution time should each time be reduced to
around half of what it was before. You can also try this with full nodes: 1 node, 2 nodes.
Expand Down Expand Up @@ -171,7 +171,7 @@ Not all file locations perform the same. In particular, the `$VSC_HOME` and `$VS
directories are, relatively, very slow to access. Your jobs should rather use the
`$VSC_SCRATCH` directory, or other fast locations (depending on your needs), described
in [Where to store your data on the HPC](../running_jobs_with_input_output_data/#where-to-store-your-data-on-the-hpc).
As an example how do this: The job can copy the input to the scratch directory, then execute
As an example how to do this: The job can copy the input to the scratch directory, then execute
the computations, and lastly copy the output back to the data directory.
Using the home and data directories is especially a problem when UGent isn't your home institution:
your files may be stored, for example, in Leuven while you're running a job in Ghent.
Expand Down Expand Up @@ -217,12 +217,13 @@ See the explanation about how jobs get prioritized in [When will my job start](.

{% else %}

In practice it's
In practice, it's
impossible to predict when your job(s) will start, since most currently
running jobs will finish before their requested walltime expires, and
new jobs by may be submitted by other users that are assigned a higher
priority than your job(s). You can use the `showstart` command. For more
information, see .
running jobs will finish before their requested walltime expires.
New jobs may be submitted by other users that are assigned a higher
priority than your job(s).
You can use the `squeue --start` command to get an estimated start time for your jobs in the queue.
Keep in mind that this is just an estimate.

{% endif %}

Expand Down Expand Up @@ -282,10 +283,11 @@ of files so other users can access the data. For example, the following
command will enable a user named "otheruser" to read the file named
`dataset.txt`. See

<pre><code>$ <b>setfacl -m u:otheruser:r dataset.txt</b>
$ <b>ls -l dataset.txt</b>
```
$ setfacl -m u:otheruser:r dataset.txt
$ ls -l dataset.txt
-rwxr-x---+ 2 {{userid}} mygroup 40 Apr 12 15:00 dataset.txt
</code></pre>
```

For more information about `chmod` or `setfacl`, see
[Linux tutorial](linux-tutorial/manipulating_files_and_directories.md#changing-permissions-chmod).
Expand Down Expand Up @@ -317,6 +319,12 @@ Please send an e-mail to {{hpcinfo}} that includes:

{% endif %}

If the software is a Python package, you can manually install it in a virtual environment.
More information can be found [here](./setting_up_python_virtual_environments.md).
Note that it is still preferred to submit a software installation request,
as the software installed by the HPC team will be optimized for the HPC environment.
This can lead to dramatic performance improvements.

### Is my connection compromised? Remote host identification has changed

On Monday 25 April 2022, the login nodes received an update to RHEL8.
Expand Down
34 changes: 19 additions & 15 deletions mkdocs/docs/HPC/HOD.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ Before using HOD, you first need to load the `hod` module. We don't
specify a version here (this is an exception, for most other modules you
should, see [Using explicit version numbers](../running_batch_jobs/#using-explicit-version-numbers)) because newer versions might include important bug fixes.

<pre><code>$ <b>module load hod</b>
</code></pre>
```
module load hod
```

### Compatibility with login nodes

Expand All @@ -31,16 +32,17 @@ cluster module before loading the `hod` module and subsequently running

For example, this will work as expected:

<pre><code>$ <b>module swap cluster/{{othercluster}}</b>
$ <b>module load hod</b>
$ <b>hod</b>
```
$ module swap cluster/{{othercluster}}
$ module load hod
$ hod
hanythingondemand - Run services within an HPC cluster
usage: hod <subcommand> [subcommand options]
Available subcommands (one of these must be specified!):
batch Submit a job to spawn a cluster on a PBS job controller, run a job script, and tear down the cluster when it's done
clean Remove stale cluster info.
...
</code></pre>
```

Note that also modules named `hanythingondemand/*` are available. These
should however not be used directly, since they may not be compatible
Expand All @@ -52,13 +54,14 @@ for).
The `hod` module will also put a basic configuration in place for HOD,
by defining a couple of `$HOD_*` environment variables:

<pre><code>$ <b>module load hod</b>
$ <b>env | grep HOD | sort</b>
```
$ module load hod
$ env | grep HOD | sort
HOD_BATCH_HOD_MODULE=hanythingondemand/3.2.2-intel-2016b-Python-2.7.12
HOD_BATCH_WORKDIR=$VSC_SCRATCH/hod
HOD_CREATE_HOD_MODULE=hanythingondemand/3.2.2-intel-2016b-Python-2.7.12
HOD_CREATE_WORKDIR=$VSC_SCRATCH/hod
</code></pre>
```

By defining these environment variables, we avoid that you have to
specify `--hod-module` and `--workdir` when using `hod batch` or
Expand All @@ -85,26 +88,27 @@ will be marked as `<job-not-found>`.

You should occasionally clean this up using `hod clean`:

<pre><code>$ <b>module list</b>
```
$ module list
Currently Loaded Modulefiles:
1) cluster/{{defaultcluster}}(default) 2) pbs_python/4.6.0 3) vsc-base/2.4.2 4) hod/3.0.0-cli
$ <b>hod list</b>
$ hod list
Cluster label Job ID State Hosts
example1 {{jobid}} &lt;job-not-found&gt; &lt;none&gt;
$ <b>hod clean</b>
$ hod clean
Removed cluster localworkdir directory /user/scratch/gent/vsc400/vsc40000/hod/hod/{{jobid}} for cluster labeled example1
Removed cluster info directory /user/home/gent/vsc400/vsc40000/.config/hod.d/wordcount for cluster labeled example1
$ <b>module swap cluster/{{othercluster}}</b>
$ module swap cluster/{{othercluster}}
Cluster label Job ID State Hosts
example2 98765.master19.{{othercluster}}.gent.vsc &lt;job-not-found&gt; &lt;none&gt;
$ <b>hod clean</b>
$ hod clean
Removed cluster localworkdir directory /user/scratch/gent/vsc400/vsc40000/hod/hod/98765.master19.{{othercluster}}.gent.vsc for cluster labeled example2
Removed cluster info directory /user/home/gent/vsc400/vsc40000/.config/hod.d/wordcount for cluster labeled example2
</code></pre>
```
Note that **only HOD clusters that were submitted to the currently loaded `cluster` module will be cleaned up**.

## Getting help
Expand Down
62 changes: 35 additions & 27 deletions mkdocs/docs/HPC/MATLAB.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,12 @@ To access the MATLAB compiler, the `MATLAB` module should be loaded
first. Make sure you are using the same `MATLAB` version to compile and
to run the compiled MATLAB program.

<pre><code>$ <b>module avail MATLAB/</b>
```
$ module avail MATLAB/
----------------------/apps/gent/RHEL8/zen2-ib/modules/all----------------------
MATLAB/2021b MATLAB/2022b-r5 (D)
$ <b>module load MATLAB/2021b</b>
</code></pre>
$ module load MATLAB/2021b
```

After loading the `MATLAB` module, the `mcc` command can be used. To get
help on `mcc`, you can run `mcc -?`.
Expand All @@ -53,12 +54,14 @@ flag means verbose output). To show how `mcc` can be used, we use the
First, we copy the `magicsquare.m` example that comes with MATLAB to
`example.m`:

<pre><code>$ <b>cp $EBROOTMATLAB/extern/examples/compiler/magicsquare.m example.m</b>
</code></pre>
```
cp $EBROOTMATLAB/extern/examples/compiler/magicsquare.m example.m
```

To compile a MATLAB program, use `mcc -mv`:

<pre><code><b>mcc -mv example.m</b>
```
mcc -mv example.m
Opening log file: {{homedir}}/java.log.34090
Compiler version: 8.3 (R2021b)
Dependency analysis by REQUIREMENTS.
Expand All @@ -67,7 +70,7 @@ Parsing file "{{homedir}}/example.m"
Deleting 0 temporary MEX authorization files.
Generating file "{{homedir}}/readme.txt".
Generating file "run\_example.sh".
</code></pre>
```

### Libraries

Expand All @@ -90,8 +93,9 @@ MATLAB program on the login nodes, consider tweaking the default maximum
heap size (128M) of Java using the `_JAVA_OPTIONS` environment variable
with:

<pre><code>$ <b>export _JAVA_OPTIONS="-Xmx64M"</b>
</code></pre>
```
export _JAVA_OPTIONS="-Xmx64M"
```

The MATLAB compiler spawns multiple Java processes. Because of the
default memory limits that are in effect on the login nodes, this might
Expand All @@ -102,14 +106,16 @@ to fit in memory.
Another possible issue is that the heap size is too small. This could
result in errors like:

<pre><code>Error: Out of memory
</code></pre>
```
Error: Out of memory
```

A possible solution to this is by setting the maximum heap size to be
bigger:

<pre><code>$ <b>export _JAVA_OPTIONS="-Xmx512M"</b>
</code></pre>
```
export _JAVA_OPTIONS="-Xmx512M"
```

## Multithreading

Expand All @@ -130,8 +136,7 @@ you requested when submitting your job script (the `ppn` value, see [Generic res
You can determine the right number of workers to use via the following
code snippet in your MATLAB program:

<div style="text-align: center;">-- parpool.m --</div>
```matlab
```matlab title="parpool.m"
{% include "./examples/MATLAB/parpool.m" %}
```

Expand All @@ -143,22 +148,25 @@ documentation](https://nl.mathworks.com/help/distcomp/parpool.html).
Each time MATLAB is executed, it generates a Java log file in the users
home directory. The output log directory can be changed using:

<pre><code>$ <b>MATLAB_LOG_DIR=<i>&lt;OUTPUT_DIR&gt;</i></b>
</code></pre>
```
MATLAB_LOG_DIR=<OUTPUT_DIR>
```

where `<OUTPUT_DIR>` is the name of the desired output directory. To
create and use a temporary directory for these logs:

<pre><code># create unique temporary directory in $TMPDIR (or /tmp/$USER if
```
# create unique temporary directory in $TMPDIR (or /tmp/$USER if
$TMPDIR is not defined)
# instruct MATLAB to use this directory for log files by setting $MATLAB_LOG_DIR
$ <b> export MATLAB_LOG_DIR=$ (mktemp -d -p $TMPDIR:-/tmp/$USER)</b>
</code></pre>
$ export MATLAB_LOG_DIR=$ (mktemp -d -p $TMPDIR:-/tmp/$USER)
```

You should remove the directory at the end of your job script:

<pre><code>$ <b> rm -rf $MATLAB_LOG_DIR</b>
</code></pre>
```
rm -rf $MATLAB_LOG_DIR
```

## Cache location

Expand All @@ -169,9 +177,10 @@ location and size of this cache can be changed through the
The snippet below would set the maximum cache size to 1024MB and the
location to `/tmp/testdirectory`.

<pre><code>$ <b>export MATLAB_CACHE_ROOT=/tmp/testdirectory </b>
$ <b>export MATLAB_CACHE_SIZE=1024M </b>
</code></pre>
```
export MATLAB_CACHE_ROOT=/tmp/testdirectory
export MATLAB_CACHE_SIZE=1024M
```

So when MATLAB is running, it can fill up to 1024MB of cache in
`/tmp/testdirectory`.
Expand All @@ -182,7 +191,6 @@ All of the tweaks needed to get MATLAB working have been implemented in
an example job script. This job script is also available on the HPC.
<!-- %TODO: where? -->

<div style="text-align: center;">-- jobscript.sh --</div>
```bash
```bash title="jobscript.sh"
{% include "./examples/MATLAB/jobscript.sh" %}
```
Loading

0 comments on commit c72b4db

Please sign in to comment.