Skip to content

Commit

Permalink
Update Alps, remove Todi (#233)
Browse files Browse the repository at this point in the history
* Update Alps and vClusters

* GitHub Action: Apply external link format

* Update to current status, remove under construction

* Add news post

* Update docs/alps/vclusters.md

Co-authored-by: Annika Lauber <[email protected]>

* Update docs/alps/vclusters.md

Co-authored-by: Annika Lauber <[email protected]>

* Update docs/alps/vclusters.md

Co-authored-by: Annika Lauber <[email protected]>

* Update docs/alps/vclusters.md

Co-authored-by: Annika Lauber <[email protected]>

* Update docs/posts/2024-12-17_alps_update.md

Co-authored-by: Annika Lauber <[email protected]>

* Update docs/alps/vclusters.md

Co-authored-by: Annika Lauber <[email protected]>

* Update vclusters

* Unify vCluster names

* Remove Tödi from install instructions

* Use todi for spack sysconfig

* Rename todi to alps

* Add instructions to create missing folders

* Swap santis with daint

* mkdir -p

---------

Co-authored-by: mjaehn <[email protected]>
Co-authored-by: Annika Lauber <[email protected]>
  • Loading branch information
3 people authored Dec 18, 2024
1 parent fe28b86 commit ef4360a
Show file tree
Hide file tree
Showing 5 changed files with 133 additions and 129 deletions.
27 changes: 5 additions & 22 deletions docs/alps/index.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,26 @@
!!! construction "Page under construction - last update: 2024-09-06"

Information in this page is not yet complete nor final. It will be updated following the progress of

- the Alps system deployment at CSCS
- C2SM's adaptation to this new system

# The Alps System

Alps is a distributed HPC infrastructure managed by CSCS. Contrary to traditional HPC, it is composed of several logical units called vClusters (versatile clusters). From the users perspective, they play the role of a traditional HPC machine, each vCluster being tailored to the needs of a specific community. This setup also enables geographical distribution of vClusters which facilitates geo-redundancy. The main physical piece of Alps is hosted at CSCS in Lugano and a detailed description can be found at [their website :material-open-in-new:](https://www.cscs.ch/computers/alps){:target="_blank"}.

## vClusters

The following table shows the current plan for the final vClusters distribution on Alps at CSCS (not the current situation).
The following table shows current vClusters distribution on Alps at CSCS (only C2SM relevant vClusters are shown).

| vCluster | Activity | Share |
|----------|-------------------|------------------|
| Daint | User Lab | ~ 800 GH nodes |
| Santis | Weather & Climate | ~ 500 GH nodes |
| Daint | User Lab | ~ 600 GH nodes |
| Clariden | Machine Learning | ~ 800 GH nodes |
| Santis | Weather & Climate | ~ 400 GH nodes |
| Tödi | Testing | few GH nodes |
| Eiger | | multi-core nodes |

*GH = Grace Hopper*

## Early Access

For getting access to the vCluster dedicated to testing ([Tödi](vclusters.md/#todi){:target="_blank"}), CSCS offers [Preparatory Projects :material-open-in-new:](https://www.cscs.ch/user-lab/allocation-schemes/preparatory-projects){:target="_blank"}.

## Support by CSCS

To contact CSCS staff directly, users can join their dedicated [Slack channel :material-open-in-new:](https://cscs-users.slack.com){:target="_blank"}.

## File Systems
General information about access, file systems, vClusters, user environments and much more can be found at the [CSCS Knowledge Base :material-open-in-new:](https://confluence.cscs.ch/display/KB){:target="_blank"}.

!!! note "TODO"
To contact CSCS staff directly, users can join their dedicated [Slack workspace :material-open-in-new:](https://cscs-users.slack.com){:target="_blank"}, with dedicated channels for each vCluster.

- [ ] `/users`, `/store` and `/scratch`
- [ ] reserved space per vClsuter vs shared space
- [ ] ...

## Introductory Workshop Material

Expand Down
9 changes: 1 addition & 8 deletions docs/alps/uenvs.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
!!! construction "Page under construction - last update: 2024-09-06"

Information in this page is not yet complete nor final. It will be updated following the progress of

- the Alps system deployment at CSCS
- C2SM's adaptation to this new system

# User Environments

Software stacks at CSCS are now accessible through the so-called User Environments (uenv). They replace the previous monolithic software stack containing everything from which one could load any module, with all the infinite potential conflicts it involves. User environments contain the minimal software stack required for a certain activity, say, building and running ICON. They are generated by `spack`, packed into single `squashfs` file and then mounted by the user. In a way, they can be considered as poor man's containers.
Expand All @@ -29,7 +22,7 @@ The user environments provided by CSCS are registered in a central database. In

!!! warning

Old software stack images didn't have a mount point in the metadata which is now required for the new versions of the `uenv` tool and its `--uenv` slurm plugin counterpart. If you have images in your local repository that are older than roughly September 5th, please pull them again. It will only update the metadata
Old software stack images didn't have a mount point in the metadata which is now required for the new versions of the `uenv` tool and its `--uenv` slurm plugin counterpart. If you have images in your local repository that are older than roughly September 5th, please pull them again - it will only update the metadata.

## The `uenv` command line tool

Expand Down
85 changes: 49 additions & 36 deletions docs/alps/vclusters.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,3 @@
!!! construction "Page under construction - last update: 2024-09-06"

Information in this page is not yet complete nor final. It will be updated following the progress of

- the Alps system deployment at CSCS
- C2SM's adaptation to this new system

# Supported vClusters

This page is hosting information about C2SM supported vClusters (not all CSCS vClusters).
Expand All @@ -27,56 +20,76 @@ Host balfrin* daint* santis* todi*
ProxyJump ela
```

This would allow standard connections like `ssh santis` but also specifying the login node like `ssh santis-ln002` if needed. Replace `cscsusername` with your actual user name.
This allows standard connections like `ssh santis`, but you can also specify a login node if needed, e.g., `ssh santis-ln002`. Replace `cscsusername` with your actual username.

## Daint

Daint (Alps) is the vCluster dedicated to the User Lab. It is currently accessible at `daint.alps.cscs.ch` (until the current Piz Daint gets decommissioned), so connect with `ssh daint.alps` with the `ssh` settings above.
## Santis

Even though Weather and Climate also has the dedicated vCluster Santis (see [below](#santis)), traditional projects might also land on Daint.
The vCluster `santis` is dedicated to **Climate and Weather** and may initially host only [EXCLAIM :material-open-in-new:](https://c2sm.ethz.ch/research/exclaim.html){:target="_blank"} and related projects.

### Uenvs
### Deployment Status

List of currently supported Uenvs on Daint:
Currently, the deployment is approximately 95% complete.

| uenv | activity | Remark |
|--------------------------|--------------------------------|---------------------|
| icon-vx:rcy | build and run ICON | Not deployed (yet?) |
| netcdf-tools/2024:v1-rc1 | pre- and post-processing tools | |
### Differences to the Environment on `todi`

### Storage
- `$HOME` is now on a new NFS file system
- Your folder `/users/$USER` will initially be mostly empty
- The NFS system still requires fine-tuning, and file system performance may be low.
- We recommend running tasks, especially heavy ones, on $SCRATCH.
- `todi`'s $HOME is mounted as `/users.OLD/$USER`.
- ⚠️ The mount is read-only!
- You are responsible for copying your data from `/users.OLD/$USER` to `/users/$USER/...`.
- The mount is temporary and will be removed by the end of January 2025.

!!! note "TODO"
!!! info

- [ ] Storage
Despite the need to work on the deployment in the upcoming days, users are invited to already access the system and start familiarising themselves with it and they might also start the data migration of their old home.

## Santis
The activities on CSCS side should not require any reboot, however, some services might need to be restarted, e.g., SLURM. This could lead to short interruptions or even failing jobs. CSCS will provide more information in the upcoming days and will try to minimise the risk of interferences by consolidating changes.

!!! warning "Santis has not been deployed yet."
### Uenvs

Santis is dedicated to Weather and Climate. It might, at the beginning, only host [EXCLAIM :material-open-in-new:](https://c2sm.ethz.ch/research/exclaim.html){:target="_blank"} and related projects.
To find and use already existing uenvs from `todi`, you need to modify the `CLUSTER_NAME` environment variable.

### Uenvs
```shell
export CLUSTER_NAME=todi
uenv image find
```

| uenv | activity |
|--------------------------|--------------------------------|
| icon-vx:rcy | build and run ICON |
| netcdf-tools/tag:version | pre- and post-processing tools |
| uenv | activity |
|----------------------------|--------------------------------|
| `icon-wcp/v1:rc4` | build and run ICON |
| `netcdf-tools/2024:v1-rc1` | pre- and post-processing tools |

### Storage

!!! note "TODO"

- [ ] Storage
The migration of the previous storage is not yet finished. Once there is an update from CSCS, we will inform you here. Also note that the environment variables `$STORE` and `$PROJECT` are not yet set.

## Daint

## Tödi
Daint (Alps) is the vCluster dedicated to the **User Lab**. It is currently accessible at `daint.alps.cscs.ch` (until the current Piz Daint gets decommissioned), so connect with `ssh daint.alps` with the `ssh` settings above.

Tödi is the testing vCluster and is currently deployed on the most of the Alps system.
Even though Climate and Weather also has the dedicated vCluster `santis` (see [below](#santis)), traditional projects might also land on Daint.

### Uenvs

| uenv | activity |
|--------------------------|--------------------------------|
| icon-wcp/v1:rc4 | build and run ICON |
| netcdf-tools/2024:v1-rc1 | pre- and post-processing tools |
As on `santis`, you can access the uenvs from `todi`:

```shell
export CLUSTER_NAME=todi
uenv image find
```

| uenv | activity |
|----------------------------|--------------------------------|
| `icon-wcp/v1:rc4` | build and run ICON |
| `netcdf-tools/2024:v1-rc1` | pre- and post-processing tools |

### Storage

!!! note "TODO"

The migration of the previous storage is not yet finished. Once there is an update from CSCS, we will inform you here.

127 changes: 64 additions & 63 deletions docs/models/icon/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,77 +13,31 @@ Once you have access, clone the repository from GitHub using the SSH protocol:

## Configure and compile

### Piz Daint
Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON.

**1. Set up a Spack instance**

To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/cscs/SPACK_TAG_C2SM :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/cscs/SPACK_TAG_C2SM){:target="_blank"} and load it into your command line.

**2. Build ICON**

Refer to the official spack-c2sm documentation for [installing ICON using Spack :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#icon){:target="_blank"}.

After the first compilation, you need to create a `setting` file (the following example is for Piz Daint, please adapt the lines according to the machine you are using):

=== "daint_gpu_nvhpc"
```shell
# Get SPACK_TAG used on machine
SPACK_TAG=$(cat "config/cscs/SPACK_TAG_C2SM")
# Set the name of the environment, which should be equal to the builder
ENV_NAME=daint_gpu_nvhpc
# Load probtest environment (only needed if you want to run check files)
source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest
# Ensure CDO is loaded on your machine
module load daint-gpu CDO
# Remove and create setting file with the following two commands
rm -f setting
./config/cscs/create_sh_env $SPACK_TAG $ENV_NAME
```

### Euler
Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON.

**1. Set up a Spack instance**

To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/ethz/SPACK_TAG_EULER :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/ethz/SPACK_TAG_EULER){:target="_blank"} and load it into your command line.
### Säntis

!!! construction "Under construction - last update: 2024-12-18"

**2. Build ICON**

Activate the Spack environment for Euler:
```bash
SPACK_TAG=$(cat "config/ethz/SPACK_TAG_EULER")
spack env activate -d config/ethz/spack/$SPACK_TAG/euler_cpu_gcc
```
Information on this section is not yet complete nor final. It will be updated following the progress of the Alps system deployment at CSCS and C2SM's adaptation to this new system. Please use the [C2SM support forum :material-open-in-new:](https://github.com/C2SM/Tasks-Support/discussions){:target="_blank"} in case of questions regarding building ICON on Alps.

Euler Support recommends to compile code on compute-nodes. Unfortunately [internet-access on Euler compute-nodes is restricted :material-open-in-new:](https://scicomp.ethz.ch/wiki/Accessing_the_clusters#Internet_Security){:target="_blank"}.
Therefore a two-step install needs to be performed:
Currently, the same ICON user environment used on `todi` is being used. Since the environment is still linked to `todi`, you need to export the `CLUSTER_NAME` to `todi` for now:

```bash
# fetch and install cosmo-eccodes-definitions on login-node
spack install cosmo-eccodes-definitions

# compile ICON on compute-nodes
srun -N 1 -c 12 --mem-per-cpu=20G spack install -v -j 12
export CLUSTER_NAME=todi
```


### Todi

!!! construction "Under construction - last update: 2024-09-20"

Information on this section is not yet complete nor final. It will be updated following the progress of the Alps system deployment at CSCS and C2SM's adaptation to this new system. Please use the [C2SM support forum :material-open-in-new:](https://github.com/C2SM/Tasks-Support/discussions){:target="_blank"} in case of questions regarding building ICON on Alps.

On Todi, Spack is also used to build ICON. However, there is no suitable `spack.yaml` file present for the Spack environment. Therefore, create a `spack.yaml` file and use the software stack upstream provided by the user environment.
Next, follow the instructions to build ICON using Spack below.

**1. Create a `spack.yaml` file**

Create the following files from the ICON build folder (different to the ICON root folder in case of a out-of-source build).
Create the following files from the ICON build folder (different to the ICON root folder in case of a out-of-source build). For that, you will have to create the missing folders first:
```bash
mkdir -p config/cscs/spack/v0.21.1.3/alps_cpu_nvhpc
mkdir -p config/cscs/spack/v0.21.1.3/alps_gpu_nvhpc
```

For CPU compilation:

=== "config/cscs/spack/v0.21.1.3/todi_cpu_nvhpc/spack.yaml"
=== "config/cscs/spack/v0.21.1.3/alps_cpu_nvhpc/spack.yaml"

```yaml
spack:
Expand All @@ -101,7 +55,7 @@ For CPU compilation:
For GPU compilation:
=== "config/cscs/spack/v0.21.1.3/todi_gpu_nvhpc/spack.yaml"
=== "config/cscs/spack/v0.21.1.3/alps_gpu_nvhpc/spack.yaml"
```yaml
spack:
Expand Down Expand Up @@ -131,18 +85,65 @@ git clone --depth 1 --recurse-submodules --shallow-submodules -b ${SPACK_TAG} ht

# Build ICON
cd /path/to/icon-build-folder
spack env activate -d config/cscs/spack/${SPACK_TAG}/todi_gpu_nvhpc
spack env activate -d config/cscs/spack/${SPACK_TAG}/alps_gpu_nvhpc
spack install
```

### Euler
Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON.

**1. Set up a Spack instance**

To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/ethz/SPACK_TAG_EULER :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/ethz/SPACK_TAG_EULER){:target="_blank"} and load it into your command line.


### Santis
Please follow the instructions for Todi, but run the following before loading the ICON user-environment:
**2. Build ICON**

Activate the Spack environment for Euler:
```bash
export CLUSTER_NAME=todi
SPACK_TAG=$(cat "config/ethz/SPACK_TAG_EULER")
spack env activate -d config/ethz/spack/$SPACK_TAG/euler_cpu_gcc
```

Euler Support recommends to compile code on compute-nodes. Unfortunately [internet-access on Euler compute-nodes is restricted :material-open-in-new:](https://scicomp.ethz.ch/wiki/Accessing_the_clusters#Internet_Security){:target="_blank"}.
Therefore a two-step install needs to be performed:

```bash
# fetch and install cosmo-eccodes-definitions on login-node
spack install cosmo-eccodes-definitions

# compile ICON on compute-nodes
srun -N 1 -c 12 --mem-per-cpu=20G spack install -v -j 12
```

### Piz Daint
Spack is used to build ICON. Please follow the steps below to set up Spack and build ICON.

**1. Set up a Spack instance**

To [set up a Spack instance :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#at-cscs-daint-tsa-balfrin){:target="_blank"}, ensure that you clone the repository using the Spack tag provided in the ICON repository at [config/cscs/SPACK_TAG_C2SM :material-open-in-new:](https://github.com/C2SM/icon/blob/main/config/cscs/SPACK_TAG_C2SM){:target="_blank"} and load it into your command line.

**2. Build ICON**

Refer to the official spack-c2sm documentation for [installing ICON using Spack :material-open-in-new:](https://c2sm.github.io/spack-c2sm/latest/QuickStart.html#icon){:target="_blank"}.

After the first compilation, you need to create a `setting` file (the following example is for Piz Daint, please adapt the lines according to the machine you are using):

=== "daint_gpu_nvhpc"
```shell
# Get SPACK_TAG used on machine
SPACK_TAG=$(cat "config/cscs/SPACK_TAG_C2SM")
# Set the name of the environment, which should be equal to the builder
ENV_NAME=daint_gpu_nvhpc
# Load probtest environment (only needed if you want to run check files)
source /project/g110/icon/probtest/conda/miniconda/bin/activate probtest
# Ensure CDO is loaded on your machine
module load daint-gpu CDO
# Remove and create setting file with the following two commands
rm -f setting
./config/cscs/create_sh_env $SPACK_TAG $ENV_NAME
```

## Run test case
In the *run* folder, you find many prepared test cases, which you can convert into run scripts. To generate the runscript of one of the experiment files, e.g. *mch_ch_lowres*, you can use the `make_runscripts` function.

Expand Down
14 changes: 14 additions & 0 deletions docs/posts/2024-12-17_alps_update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
date:
created: 2024-12-17
categories:
- Alps
---

# Update on the current status on Alps

Last week, CSCS deployed the Climate and Weather vCluster `santis`. As some fine-tuning is still ongoing, the [Santis section](../alps/vclusters.md#santis) provides an overview of how to transition from `todi` to `santis`.

<!-- more -->

Additionally, all information about the new vClusters as well as how to use User Environments (uenvs) have been updated.

0 comments on commit ef4360a

Please sign in to comment.