Skip to content

NVHPC hackathon

Seth R. Johnson edited this page Aug 16, 2022 · 6 revisions

Systems

Environment

TODO: add to scripts/env/saturn.sh and add cmake presets done

module load cmake nvhpc 
export PATH=/storage/users/s3j/spack/var/spack/environments/tools/.spack-env/view/bin:$PATH
source /home/users/s3j/spack/share/spack/setup-env.sh
spack env activate celeritas-nvhpc

Setup

Spack installation

Config:

config:
  install_tree:
    root: $spack/opt/spack
    projections:
      all: "{architecture}/{name}/{version}/{hash:7}"
  deprecated: true
  build_jobs: 26

Find externals:

$ module load cmake cuda nvhpc
$ spack external find --scope=site
==> The following specs have been detected on this system and added to /storage/users/s3j/spack/etc/spack/packages.yaml
[email protected]    [email protected]   [email protected]    [email protected]   [email protected]   [email protected]      [email protected]
[email protected]  [email protected]  [email protected]  [email protected]   [email protected]  [email protected]     [email protected]
[email protected]    [email protected]  [email protected]       [email protected]  [email protected]      [email protected]  [email protected]
$ spack compiler find --scope=site
==> Added 3 new compilers to /storage/users/s3j/spack/etc/spack/compilers.yaml
    [email protected]  [email protected]  [email protected]
==> Compilers are defined in the following files:
    /storage/users/s3j/spack/etc/spack/compilers.yaml

(also module load llvm and spack external find llvm python, and run ./scripts/dev/install-commit-hooks.sh)

Add the spack environment and trim it down:

$ cat spack.yaml
spack:
  specs:
  - geant4@11 %nvhpc
  - googletest %nvhpc
  - hepmc3 %nvhpc
  - nlohmann-json %nvhpc
  view: true
  concretizer:
    unify: true
  packages:
    xerces-c:
      variants: netaccessor=none
    all:
      compiler: [nvhpc]
      providers:
        blas: [openblas]
        lapack: [openblas]
      variants: cxxstd=17

Manual build of geant4

Append /storage/users/s3j/spack/share/spack/modules/linux-ubuntu20.04-cascadelake to MODULEPATH, run spack module tcl refresh -y to regenerate.

Check out geant4 11.0.2

$ module load xerces-c-3.2.3-nvhpc-22.5-q6gsmka expat-2.4.8-nvhpc-22.5-3fw46xw geant4-data-11.0.0-nvhpc-22.5-thqh346
$ mkdir -p /home/users/s3j/geant4/share/Geant4-11.0.2/
$ ln -s /storage/users/s3j/spack/opt/spack/linux-ubuntu20.04-cascadelake/geant4-data/11.0.0/thqh346/share/geant4-data-11.0.0 /home/users/s3j/geant4/share/Geant4-11.0.2/data
$ initial-cmake.sh -f -b /storage/warpspeed/scratch/s3j/build-geant4 -p /home/users/s3j/geant4 . -DGEANT4_USE_GDML:BOOL=ON
+ exec cmake -G Ninja -DCMAKE_INSTALL_PREFIX:PATH=/home/users/s3j/geant4 -DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=ON -DGEANT4_USE_GDML:BOOL=ON /home/users/s3j/.local/src/geant4/.
...

Errors

FIXED: Geant4 build

Bad thread-local attributes:

"/tmp/s3j/spack-stage/spack-stage-geant4-11.0.2-7dcvmghd6n6rnnglmgvigabdhifmk2cs/spack-src/source/processes/electromagnetic/dna/management/include/G4Octree.icc", line 35: error: thread-local declaration follows non-thread-local declaration (declared at line 163 of "/tmp/s3j/spack-stage/spack-stage-geant4-11.0.2-7dcvmghd6n6rnnglmgvigabdhifmk2cs/spack-src/source/processes/electromagnetic/dna/management/include/G4Octree.hh")
  G4ThreadLocal G4Allocator<OCTREE>* OCTREE::fgAllocator = nullptr;
                                             ^

"/tmp/s3j/spack-stage/spack-stage-geant4-11.0.2-7dcvmghd6n6rnnglmgvigabdhifmk2cs/spack-src/source/processes/electromagnetic/dna/management/include/G4Octree.icc", line 35: error: thread-local storage class is not valid here
  G4ThreadLocal G4Allocator<OCTREE>* OCTREE::fgAllocator = nullptr;

Worked around by disabling threads.

Excessive recursion from a questionable hack to add "extensible enums":

"/tmp/s3j/spack-stage/spack-stage-geant4-11.0.2-7dcvmghd6n6rnnglmgvigabdhifmk2cs/spack-src/source/processes/electromagnetic/dna/management/include/G4CTCounter.hh", line 79: error: excessive recursion at instantiation of class "G4Number<191>"
  struct G4Number: public G4Number<N-1>{
                          ^
          detected during:
            instantiation of class "G4Number<N> [with N=192]" at line 79
            instantiation of class "G4Number<N> [with N=193]" at line 79
            instantiation of class "G4Number<N> [with N=194]" at line 79
            instantiation of class "G4Number<N> [with N=195]" at line 79
            instantiation of class "G4Number<N> [with N=196]" at line 79
            [ 54 instantiation contexts not shown ]
            instantiation of class "G4Number<N> [with N=251]" at line 79
            instantiation of class "G4Number<N> [with N=252]" at line 79
            instantiation of class "G4Number<N> [with N=253]" at line 79
            instantiation of class "G4Number<N> [with N=254]" at line 79
            instantiation of class "G4Number<N> [with N=255]" at line 79 of "/tmp/s3j/spack-stage/spack-stage-geant4-11.0.2-7dcvmghd6n6rnnglmgvigabdhifmk2cs/spack-src/source/processes/electromagnetic/dna/molecules/management/include/G4VMolecularDissociationDisplacer.hh"

Fixed by adding -Wc,--pending_instantiations

See https://github.com/spack/spack/pull/32185

FIXED: Runtime PTX error

[  FAILED  ] Device failed to initialize: /home/users/s3j/.local/src/celeritas/src/corecel/sys/Device.cc:237:
celeritas: cuda error: the provided PTX was compiled with an unsupported toolchain.
    cudaFree(nullptr)

(this was because I had the wrong arch and had CUDA enabled)

FIXED: uniform grid test failure

The tests were just a bit too picky.

    In that test UniformGridData::from_bounds(log(1e1.0, log(1e5), 6);
    the delta ends up between approximately log(10) but on one compiler (pgi) it is
    2.3025850929940455 while on another compiler (gcc) we get
    2.3025850929940459 (final bit is different by one).

see 2e04478ea9831b5222d6ac53374f333d1cfa7677

Offloading launchers

Changing all the OpenMP loops to use std::for_each(std::execution::par_unseq, ...) gives a compiler error on InitTracks.cc:

[26/375] Building CXX object src/CMakeFiles/celeritas.dir/celeritas/track/generated/InitTracks.cc.o
FAILED: src/CMakeFiles/celeritas.dir/celeritas/track/generated/InitTracks.cc.o 
/packages/nvhpc/22.5_cuda11.7/Linux_x86_64/22.5/compilers/bin/nvc++ -Dceleritas_EXPORTS -I/home/users/romano/celeritas/src -I/storage/warpspeed/scratch/romano/build-base/src -isystem /storage/packages/nvhpc/22.5_cuda11.7/Linux_x86_64/22.5/cuda/11.7/include -isystem /home/users/s3j/spack/var/spack/environments/celeritas-nvhpc/.spack-env/view/include -stdpar -Minfo -g -O0 -fPIC --c++14 -MD -MT src/CMakeFiles/celeritas.dir/celeritas/track/generated/InitTracks.cc.o -MF src/CMakeFiles/celeritas.dir/celeritas/track/generated/InitTracks.cc.o.d -o src/CMakeFiles/celeritas.dir/celeritas/track/generated/InitTracks.cc.o -c /home/users/romano/celeritas/src/celeritas/track/generated/InitTracks.cc
celeritas::generated::init_tracks(const celeritas::CoreRef<(celeritas::MemSpace)0> &, const celeritas::TrackInitStateData<(celeritas::Ownership)1, (celeritas::MemSpace)0> &, unsigned int):
     31, stdpar: Generating NVIDIA GPU code
         31, std::for_each with std::execution::par_unseq policy parallelized on GPU
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unsupported procedure (/home/users/romano/celeritas/src/celeritas/track/generated/InitTracks.cc: 1)
NVC++/x86-64 Linux 22.5-0: compilation aborted