The Spack Infrastructure Project makes use of spack package manager to install spack software stack on NERSC systems. This project contains spack configuration (spack.yaml
) required to build the spack stacks. The spack stack is based on Extreme-Scale Scientific Software Stack (E4S) where we install spack packages provided by E4S and use the recommended spack branch. We leverage Gitlab CI to automate deployment to ensure reproducible and automated builds. For more details about this project you can see the documentation at https://nersc-spack-infrastructure.rtfd.io
The software deployment consist of the following steps
- Acquire Spack Configuration from E4S project https://github.com/E4S-Project/e4s
- Create one or more spack configuration files (spack.yaml) with list of E4S packages and integrate spack configuration for NERSC system
- Create a Gitlab Job to trigger the pipeline for TDS and Deployment system
- Create a Modulefile as entry point to stack
- Write User Documentation
- Share spack configuration with open-source community
- Send announcement to all NERSC users
At NERSC, we plan our software deployment with E4S releases which is typically every 3 months however we perform deployment every 6 months. Once E4S has released the spack configuration we acquire the spack configuration which is typically found in https://github.com/E4S-Project/e4s/tree/master/environments. We also acquire the spack branch used by E4S team as our baseline, this would be documented in the release notes. The name of branch map to the E4S version so version 23.05 will have a branch e4s-23.05.
Next, we copy the packages into our project and create the spack configuration
In this step we create the spack configuration. First we create a sub-directory in spack-configs with the naming convention to distinguish E4S version. This typically includes the
name of the system such as cori
or perlmutter
followed by name of e4s version such as e4s-23.05
.
$ tree -L 1 spack-configs
spack-configs
├── cori-e4s-20.10
├── cori-e4s-21.02
├── cori-e4s-21.05
├── cori-e4s-22.02
├── perlmutter-e4s-21.11
├── perlmutter-e4s-22.05
├── perlmutter-e4s-22.11
├── perlmutter-e4s-23.05
├── perlmutter-spack-develop
└── perlmutter-user-spack
10 directories, 0 files
Inside one of the stacks, you will see several sub-directories that are used for defining a sub-stack. These sub-stacks correspond to spack environments. The prod
directory is used for production deployment to install from the buildcache.
$ tree -L 3 spack-configs/perlmutter-e4s-22.11
spack-configs/perlmutter-e4s-22.11
├── cce
│ └── spack.yaml
├── cuda
│ └── spack.yaml
├── definitions.yaml
├── gcc
│ └── spack.yaml
├── nvhpc
│ └── spack.yaml
└── prod
├── cce
│ └── spack.yaml
├── cuda
│ └── spack.yaml
├── gcc
│ └── spack.yaml
└── nvhpc
└── spack.yaml
9 directories, 9 files
We create a special file named definitions.yaml
that is used for declaring definitions that is referenced in spack.yaml
. This file is appended to all spack configuration. We do this
to ensure all specs are defined in one place.
During this step, we will create the spack configuration and specify our preferred compilers and package preference. We install software in buildcache so it can be relocated to production path. In order to accomplish this task, we use spack pipelines that uses spack ci generate
and spack ci rebuild
to perform parallel pipeline execution. During this step, we determine which packages to install from E4S and add our own packages to comply with our site preference.
Once spack configuration is written, we create a gitlab job to trigger the pipeline. This can be done by specifying a job in .gitlab-ci.yml.
The gitlab job can be triggered through scheduled pipelines, web-interface, or merge request to the project. A typical gitlab job will look something like this. Shown below is for E4S 23.05 generate job. We make use of gitlab feature named extends which allows us to reuse configuration. The spack ci generate
command will be the same for each substack. There is two jobs, first is the generate step performed by spack ci generate
and this triggers the downstream job created by spack.
.perlmutter-e4s-23.05-generate:
stage: generate
needs: ["perlmutter:check_spack_dependencies"]
tags: [perlmutter-e4s]
interruptible: true
allow_failure: true
rules:
- if: ($CI_PIPELINE_SOURCE == "schedule" || $CI_PIPELINE_SOURCE == "web") && ($PIPELINE_NAME == "PERLMUTTER_E4S_23.05")
- if: ($CI_PIPELINE_SOURCE == "merge_request_event")
changes:
- spack-configs/perlmutter-e4s-23.05/$STACK_NAME/spack.yaml
- spack-configs/perlmutter-e4s-23.05/definitions.yaml
before_script:
- *copy_perlmutter_settings
- *startup_modules
script:
- *e4s_23_05_setup
- cd $CI_PROJECT_DIR/spack-configs/perlmutter-e4s-23.05/$STACK_NAME
- cat $CI_PROJECT_DIR/spack-configs/perlmutter-e4s-23.05/definitions.yaml >> spack.yaml
- spack env activate --without-view .
- spack env st
#- spack -d concretize -f | tee $CI_PROJECT_DIR/concretize.log
- spack -d ci generate --check-index-only --artifacts-root "$CI_PROJECT_DIR/jobs_scratch_dir" --output-file "${CI_PROJECT_DIR}/jobs_scratch_dir/pipeline.yml"
artifacts:
paths:
- ${CI_PROJECT_DIR}/jobs_scratch_dir
perlmutter-e4s-23.05-cce-generate:
extends: .perlmutter-e4s-23.05-generate
variables:
STACK_NAME: cce
perlmutter-e4s-23.05-cce-build:
stage: build
needs: ["perlmutter:check_spack_dependencies", "perlmutter-e4s-23.05-cce-generate"]
allow_failure: true
rules:
- if: ($CI_PIPELINE_SOURCE == "schedule" || $CI_PIPELINE_SOURCE == "web") && ($PIPELINE_NAME == "PERLMUTTER_E4S_23.05")
- if: ($CI_PIPELINE_SOURCE == "merge_request_event")
changes:
- spack-configs/perlmutter-e4s-23.05/cce/spack.yaml
- spack-configs/perlmutter-e4s-23.05/definitions.yaml
trigger:
include:
- artifact: jobs_scratch_dir/pipeline.yml
job: perlmutter-e4s-23.05-cce-generate
strategy: depend
In this step, we create a modulefile as entry point to software stack and setup spack
. We do not create spack generated modules for spack packages, instead one is expected to use spack load
. Shown below are the modulefiles available on NERSC system, they are typically called e4s/<version>
with a symbolic link to module spack/e4s-<version>
siddiq90@login37> ml -t av e4s
/global/common/software/nersc/pm-2022.12.0/extra_modulefiles:
e4s/22.05
e4s/22.11
spack/e4s-22.05
spack/e4s-22.11
Shown below is the content of our modulefile, the setup is subject to change
siddiq90@login37> ml --raw show e4s
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/global/common/software/nersc/pm-2022.12.0/extra_modulefiles/e4s/22.11.lua:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
whatis([[
The Extreme-scale Scientific Software Stack (E4S) is a collection of open source software packages for running scientific applications on high-performance computing (HPC) platforms.
]])
help([[ The Extreme-scale Scientific Software Stack (E4S) is a community effort to provide open source software packages for developing, deploying and running scientific applications on high-performance computing (HPC) platforms. E4S provides from-source builds and containers of a broad collection of HPC software packages.
References:
- E4S User Docs: https://e4s.readthedocs.io/en/latest/index.html
- E4S 22.11 Docs: https://docs.nersc.gov/applications/e4s/perlmutter/22.11/
- E4S Homepage: https://e4s-project.github.io/
- E4S GitHub: https://github.com/E4S-Project/e4s
]])
local root = "/global/common/software/spackecp/perlmutter/e4s-22.11/default/spack"
setenv("SPACK_GNUPGHOME", pathJoin(os.getenv("HOME"), ".gnupg"))
setenv("SPACK_SYSTEM_CONFIG_PATH", "/global/common/software/spackecp/perlmutter/spack_settings")
-- setup spack shell functionality
local shell = myShellType()
if (mode() == "load") then
local spack_setup = ''
if (shell == "sh" or shell == "bash" or shell == "zsh") then
spack_setup = pathJoin(root, "share/spack/setup-env.sh")
elseif (shell == "csh") then
spack_setup = pathJoin(root, "share/spack/setup-env.csh")
elseif (shell == "fish") then
spack_setup = pathJoin(root, "share/spack/setup-env.fish")
end
-- If we are unable to find spack setup script let's terminate now.
if not isFile(spack_setup) then
LmodError("Unable to find spack setup script " .. spack_setup .. "\n")
end
execute{cmd="source " .. spack_setup, modeA={"load"}}
LmodMessage([[
_______________________________________________________________________________________________________
The Extreme-Scale Scientific Software Stack (E4S) is accessible via the Spack package manager.
In order to access the production stack, you will need to load a spack environment. Here are some tips to get started:
'spack env list' - List all Spack environments
'spack env activate gcc' - Activate the "gcc" Spack environment
'spack env status' - Display the active Spack environment
'spack load amrex' - Load the "amrex" Spack package into your user environment
For additional support, please refer to the following references:
NERSC E4S Documentation: https://docs.nersc.gov/applications/e4s/
E4S Documentation: https://e4s.readthedocs.io
Spack Documentation: https://spack.readthedocs.io/en/latest/
Spack Slack: https://spackpm.slack.com
______________________________________________________________________________________________________
]])
-- To remove spack from shell we need to remove a few environment variables, alias and remove $SPACK_ROOT/bin from $PATH
elseif (mode() == "unload" or mode() == "purge") then
if (shell == "sh" or shell == "bash" or shell == "zsh") then
execute{cmd="unset SPACK_ENV",modeA={"unload"}}
execute{cmd="unset SPACK_ROOT",modeA={"unload"}}
execute{cmd="unset -f spack",modeA={"unload"}}
elseif (shell == "csh") then
execute{cmd="unsetenv SPACK_ENV",modeA={"unload"}}
execute{cmd="unsetenv SPACK_ROOT",modeA={"unload"}}
execute{cmd="unalias spack",modeA={"unload"}}
end
-- Need to remove $SPACK_ROOT/bin from $PATH which removes the 'spack' command
remove_path("PATH", pathJoin(root, "bin"))
-- Remove alias spacktivate. Need to pipe to /dev/null as invalid alias can report error to stderr
execute{cmd="unalias spacktivate > /dev/null",modeA={"unload"}}
end
User documentation is fundamental to help assist users with using E4S at NERSC. We document every E4S release with its Release Date and End of Support date along with a documentation page outlining the software stack. Our E4S documentation is available at https://docs.nersc.gov/applications/e4s/. The release date is when documentation is live. We perform this action in conjunction with release of modulefile so that user gain access to software stack.
Upon completion of this task, we are ready to make announcement to our NERSC users
In this step, we share our spack configuration with open-source community that may benefit the wider community. We share our spack configuration at https://github.com/spack/spack-configs. In addition, we update the E4S Facility Dashboard that shows all the E4S deployments across all the facilities.
This is the final step of the deployment process, where we make a public announcement in NERSC weekly email, along with various slack channels such as Nersc User Group (NUG), Spack, ECP and E4S slack.
There are several challenges with building spack stack at NERSC which can be summarized as follows
-
System OS + Cray Programming Environment (CPE) changes: A system upgrade such as change to
glibc
or upgrades in CPE can lead to full software stack rebuild, especially if you have external packages set for packages likecray-mpich
,cray-libsci
which generally change between versions -
Incompatibile compilers: Some packages can't be built with certain compilers (
nvhpc
,aocc
) which could be due to several factors.- An application doesn't have support though it was be added in newer version but you don't have it in your spack release used for deployment
- Lack of support in spack package recipe or spack-core base including spack-cray detection. This may require getting fix and cherry-pick commit or waiting for new version
- Spack Cray detection is an important part in build errors including how one specifies externals via
modules
vsprefix
both could be provided and it requires experimentation. An example of this is trying to getcray-mpich
external one could set something like this with modules or prefix
cray-mpich: buildable: false externals: - spec: [email protected] %[email protected] prefix: /opt/cray/pe/mpich/8.1.11/ofi/gnu/9.1 modules: - cray-mpich/8.1.11 - cudatoolkit/21.9_11.4
- Spack concretizer prevent one from chosing a build configration for a spec. This requires a few troubleshooting step but usually boils down to:
- Read the spack package file
spack edit <package>
for conflicts and tryspack spec
to see concretized spec. - Try different version, different compiler, different dependency. Some packages have conflicting variant for instance one can't enable
+openmp
and+pthread
it is mutually exclusive.
- Read the spack package file
There is a document Spack E4S Issues on Permlutter outlining current issues with spack. If you need access to document please contact Shahzeb Siddiqui.
If you need elevated privledge or assistance with this project please contact one of the maintainers:
- Shahzeb Siddiqui - [email protected]
- Erik Palmer - [email protected]
- Justin Cook - [email protected]
- E4S Team: Sameer Shende ([email protected]), Christopher Peyralans ([email protected]), Wyatt Spear ([email protected]), Nicholas Chaimov ([email protected])