Merge branch 'develop' into adds-comment

QMCPACK · Aug 25, 2023 · 42c5498 · 42c5498
2 parents 6910bd7 + e018e82
commit 42c5498
Show file tree

Hide file tree

Showing 43 changed files with 1,885 additions and 182 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,9 +2,73 @@
 
 Notable changes to QMCPACK are documented in this file.
 
-## [Unreleased]
+## [3.17.1] - 2023-08-25
 
-The legacy CUDA implementation, the version built with QMC_CUDA=1, has been removed from the codebase.
+This minor release is recommended for all users and includes a couple of build fixes and a NEXUS improvement.
+
+* Improved HDF5 detection. Fixes cases where HDF5 was not identified by CMake, including on FreeBSD (thanks @yurivict for the report). [#4708](https://github.com/QMCPACK/qmcpack/pull/4708)
+* Fix for building with BUILD_UNIT_TESTS=OFF. [#4709](https://github.com/QMCPACK/qmcpack/pull/4709)
+* Add timer for orbital rotations. [#4706](https://github.com/QMCPACK/qmcpack/pull/4706)
+
+### NEXUS
+
+* NEXUS: Support for spinor inputs. [#4707](https://github.com/QMCPACK/qmcpack/pull/4707)
+## [3.17.0] - 2023-08-18
+
+This is a recommended release for all users. Thanks to everyone who contributed directly, reported an issue, or suggested an
+improvement. There are many quality of life improvements, bug fixes throughout the application, and updates to the associated
+testing. As previously announced, the legacy CUDA support (QMC_CUDA=1) is removed in this version. For GPU support, users should
+transition to the offload code which is more capable and fully usable in production on NVIDIA GPUs.
+
+This version is intended for long-term support of v3 of QMCPACK. Development effort is now focused towards v4. Contributions of
+tests, fixes, and features from users and developers are still welcome to v3 for a potential future release. However, these will not
+be ported towards v4 by the core QMCPACK developers without prior arrangement. Please discuss options with QMCPACK developers.
+
+* Simplified checkpointing and enabled it in the batched drivers. Users now only need specify checkpoint={-1,0,N} to checkpoint
+  between blocks. [#4646](https://github.com/QMCPACK/qmcpack/pull/4646)
+* NERSC Perlmutter build recipe. [#4698](https://github.com/QMCPACK/qmcpack/pull/4698)
+* qmc-fit: Now supports parameter fitting with jackknife for e.g. DFT+U, EXX scans
+  [#4475](https://github.com/QMCPACK/qmcpack/pull/4475) and for equation of states and morse fits
+  [#4518](https://github.com/QMCPACK/qmcpack/pull/4518)
+* Improved error checking including NaN checks to protect against potentially unreliable compilers and libraries,
+  [#4697](https://github.com/QMCPACK/qmcpack/pull/4697), and checks on GPU matrix inversion
+  [#4693](https://github.com/QMCPACK/qmcpack/pull/4693)
+* Significant advances in orbital optimization capability, focusing on LCAO wavefunctions. Development is ongoing for
+  multideterminant support and for spline wavefunctions. See e.g. the Be atom orbital optimization test
+  [#4626](https://github.com/QMCPACK/qmcpack/pull/4626), [#4619](https://github.com/QMCPACK/qmcpack/pull/4619), reading and writing
+  of orbital rotation parameters [#4580](https://github.com/QMCPACK/qmcpack/pull/4580), support for disabled/frozen parameters
+  [#4581](https://github.com/QMCPACK/qmcpack/pull/4581). 
+* Magnetization Density Estimator for non-collinear wavefunctions [#4531](https://github.com/QMCPACK/qmcpack/pull/4531)
+* Pathak-Wagner regularizer for forces [#4477](https://github.com/QMCPACK/qmcpack/pull/4477)
+* The legacy CUDA implementation, the version built with QMC_CUDA=1, has been removed from the codebase,
+  [#4431](https://github.com/QMCPACK/qmcpack/pull/4431),
+  [#4632](https://github.com/QMCPACK/qmcpack/pull/4632),[#4499](https://github.com/QMCPACK/qmcpack/pull/4499),
+  [#4442](https://github.com/QMCPACK/qmcpack/pull/4442).
+* For increased performance with current AMD GPU support, new QMC_DISABLE_HIP_HOST_REGISTER option is enabled by default for
+  ROCm/HIP builds. [#4674](https://github.com/QMCPACK/qmcpack/pull/4674)
+* Bugfix: J1Spin indexing was wrong [#4612](https://github.com/QMCPACK/qmcpack/pull/4612)
+* Bugfix: 1RDM estimator data written to stat.h5 was incorrect [#4568](https://github.com/QMCPACK/qmcpack/pull/4568)
+* Introduced ENABLE_PPCONVERT option and skip ppconvert compilation when cross compiling. [#4601](https://github.com/QMCPACK/qmcpack/pull/4601)
+* Faster builds compared to v3.16.0 due to code refactoring [#4682](https://github.com/QMCPACK/qmcpack/pull/4682)
+* Many refinements throughout the codebase, cleanup, improved testing.
+
+### NEXUS
+
+* Nexus: Equilibration detection algorithm is now deterministic [#4557](https://github.com/QMCPACK/qmcpack/pull/4557)
+* Nexus: Support for Kagayaki cluster at JAIST [#4598](https://github.com/QMCPACK/qmcpack/pull/4598)
+* Nexus: GPU support fix for NERSC/Perlmutter [#4699](https://github.com/QMCPACK/qmcpack/pull/4699)
+* Nexus: Use simplices in convex_hull to support newer scipy versions [#4671](https://github.com/QMCPACK/qmcpack/pull/4671)
+* Nexus: Add pdos flag for Projwfc [#4655](https://github.com/QMCPACK/qmcpack/pull/4655)
+* Nexus: Adding crowds_serialize_walkers tag to dmc input list [#4651](https://github.com/QMCPACK/qmcpack/pull/4651)
+* Nexus: Qdens handles batched driver input/output [#4645](https://github.com/QMCPACK/qmcpack/pull/4645)
+* Nexus: Fix namelist read for Projwfc input [#4644](https://github.com/QMCPACK/qmcpack/pull/4644)
+
+### Known problems
+
+* When offload builds are compiled with CUDA toolkit versions above 11.2 using LLVM, multideterminant tests and functionality will
+  fail, seemingly due to an issue with the toolkit. This is discussed in https://github.com/llvm/llvm-project/issues/54633 . All
+  other functionality appears to work as expected. As a workaround, the CUDA toolkit 11.2 can be used. The actual NVIDIA drivers can
+  be more recent.
 
 ## [3.16.0] - 2023-01-31
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -15,7 +15,7 @@ endif()
 ######################################################################
 project(
   qmcpack
-  VERSION 3.16.9
+  VERSION 3.17.9
   LANGUAGES C CXX)
 
 # add the automatically determined parts of the RPATH
@@ -658,9 +658,15 @@ else()
   set(HDF5_USE_STATIC_LIBRARIES off)
 endif()
 
-find_package(HDF5 1.10 COMPONENTS C)
+find_package(HDF5 COMPONENTS C) # Note: minimum version check is done below to bypass find_package
+                                # and HDF5 version compatibility subtleties
 
 if(HDF5_FOUND)
+  if(HDF5_VERSION)
+    if (HDF5_VERSION VERSION_LESS 1.10.0)
+      message(FATAL_ERROR "QMCPACK requires HDF5 version >= 1.10.0")
+    endif()
+  endif(HDF5_VERSION)
   if(HDF5_IS_PARALLEL)
     if(HAVE_MPI)
       message(STATUS "Parallel HDF5 library found")

diff --git a/config/build_alcf_polaris_Clang.sh b/config/build_alcf_polaris_Clang.sh
@@ -57,12 +57,16 @@ if [[ $name == *"_MP"* ]]; then
   CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_MIXED_PRECISION=ON"
 fi
 
+if [[ $name == *"offload"* || $name == *"cuda"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_GPU_ARCHS=sm_80"
+fi
+
 if [[ $name == *"offload"* ]]; then
-  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON -DUSE_OBJECT_TARGET=ON -DOFFLOAD_ARCH=sm_80"
+  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON"
 fi
 
 if [[ $name == *"cuda"* ]]; then
-  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=80"
+  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON"
 fi
 
 folder=build_${Machine}_${Compiler}_${name}

diff --git a/config/build_nersc_perlmutter_Clang.sh b/config/build_nersc_perlmutter_Clang.sh
@@ -0,0 +1,99 @@
+#!/bin/bash
+# This recipe is intended for NERSC Perlmutter https://docs.nersc.gov/systems/perlmutter
+# It builds all the varaints of QMCPACK in the current directory
+# last revision: Aug 12th 2023
+#
+# How to invoke this script?
+# build_nersc_perlmutter_Clang.sh # build all the variants assuming the current directory is the source directory.
+# build_nersc_perlmutter_Clang.sh <source_dir> # build all the variants with a given source directory <source_dir>
+# build_nersc_perlmutter_Clang.sh <source_dir> <install_dir> # build all the variants with a given source directory <source_dir> and install to <install_dir>
+
+module load PrgEnv-gnu
+module load cray-libsci
+CRAY_LIBSCI_LIB=$CRAY_LIBSCI_PREFIX_DIR/lib/libsci_gnu_mp.so
+
+module load PrgEnv-llvm/0.1 llvm/16
+module load cray-fftw/3.3.10.3
+module load cray-hdf5-parallel/1.12.2.3
+module load cmake/3.24.3
+
+
+echo "**********************************"
+echo '$ clang -v'
+clang -v
+echo "**********************************"
+
+TYPE=Release
+Machine=perlmutter
+Compiler=Clang16
+
+if [[ $# -eq 0 ]]; then
+  source_folder=`pwd`
+elif [[ $# -eq 1 ]]; then
+  source_folder=$1
+else
+  source_folder=$1
+  install_folder=$2
+fi
+
+if [[ -f $source_folder/CMakeLists.txt ]]; then
+  echo Using QMCPACK source directory $source_folder
+else
+  echo "Source directory $source_folder doesn't contain CMakeLists.txt. Pass QMCPACK source directory as the first argument."
+  exit
+fi
+
+for name in offload_cuda_real_MP offload_cuda_real offload_cuda_cplx_MP offload_cuda_cplx \
+            cpu_real_MP cpu_real cpu_cplx_MP cpu_cplx
+do
+
+CMAKE_FLAGS="-DCMAKE_BUILD_TYPE=$TYPE -DBLAS_LIBRARIES=$CRAY_LIBSCI_LIB"
+
+if [[ $name == *"cplx"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_COMPLEX=ON"
+fi
+
+if [[ $name == *"_MP"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_MIXED_PRECISION=ON"
+fi
+
+if [[ $name == *"offload"* || $name == *"cuda"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_GPU_ARCHS=sm_80"
+fi
+
+if [[ $name == *"offload"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON"
+fi
+
+if [[ $name == *"cuda"* ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON"
+fi
+
+folder=build_${Machine}_${Compiler}_${name}
+
+if [[ -v install_folder ]]; then
+  CMAKE_FLAGS="$CMAKE_FLAGS -DCMAKE_INSTALL_PREFIX=$install_folder/$folder"
+fi
+
+echo "**********************************"
+echo "$folder"
+echo "$CMAKE_FLAGS"
+echo "**********************************"
+
+mkdir $folder
+cd $folder
+
+if [ ! -f CMakeCache.txt ] ; then
+cmake $CMAKE_FLAGS -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx $source_folder
+fi
+
+if [[ -v install_folder ]]; then
+  make -j16 install && chmod -R -w $install_folder/$folder
+else
+  make -j16
+fi
+
+cd ..
+
+echo
+done
diff --git a/config/docker/dependencies/ubuntu22/openmpi/Dockerfile b/config/docker/dependencies/ubuntu22/openmpi/Dockerfile
@@ -11,11 +11,11 @@ RUN wget https://apt.kitware.com/kitware-archive.sh &&\
     sh kitware-archive.sh
 
 RUN export DEBIAN_FRONTEND=noninteractive &&\
-    apt-get install gcc g++ \ 
-    clang \
-    clang-format \
-    clang-tidy \
-    libomp-dev \
+    apt-get install gcc-9 g++-9 \ 
+    clang-14 \
+    clang-format-14 \
+    clang-tidy-14 \
+    libomp-14-dev \
     gcovr \
     python3 \
     cmake \
@@ -49,6 +49,18 @@ RUN export DEBIAN_FRONTEND=noninteractive &&\
 RUN export DEBIAN_FRONTEND=noninteractive &&\
     pip3 install cif2cell
 
+RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 100 && \
+    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 100
+
+# add clang-14 as clang
+RUN update-alternatives --install /usr/bin/clang clang /usr/bin/clang-14 100 && \
+    update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-14 100
+
+# add clang-format and clang-tidy as well as libomp
+RUN update-alternatives --install /usr/bin/clang-format clang-format /usr/bin/clang-format-14 100 && \
+    update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-14 100 && \
+    update-alternatives --install /usr/bin/clang-tidy-diff.py clang-tidy-diff.py /usr/bin/clang-tidy-diff-14.py 100
+
 # must add a user different from root 
 # to run MPI executables
 RUN useradd -ms /bin/bash user

diff --git a/nexus/lib/machines.py b/nexus/lib/machines.py
@@ -2314,7 +2314,7 @@ def write_job_header(self,job):
 echo $SLURM_SUBMIT_DIR
 cd $SLURM_SUBMIT_DIR
 '''
-        if job.threads>1:
+        if (job.threads>1) and ('cpu' in job.constraint):
             c+='''
 export OMP_PROC_BIND=true
 export OMP_PLACES=threads

diff --git a/nexus/lib/qmcpack_input.py b/nexus/lib/qmcpack_input.py
@@ -1810,10 +1810,10 @@ class simulationcell(QIxml):
 #end class simulationcell
 
 class particleset(QIxml):
-    attributes = ['name','size','random','random_source','randomsrc','charge','source']
+    attributes = ['name','size','random','random_source','randomsrc','charge','source','spinor']
     elements   = ['group','simulationcell']
     attribs    = ['ionid','position']
-    write_types= obj(random=yesno)
+    write_types= obj(random=yesno,spinor=yesno)
     identifier = 'name'
 #end class particleset
 
@@ -2319,15 +2319,15 @@ class dm1b(QIxml): # legacy
     tag         = 'estimator'
     identifier  = 'type'
     attributes  = ['type','name','reuse']#reuse is a temporary dummy keyword
-    parameters  = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed']
+    parameters  = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed','samples']
     write_types = obj(energy_matrix=yesno,check_overlap=yesno,check_derivatives=yesno,acceptance_ratio=yesno,rstats=yesno,normalized=yesno,volume_normed=yesno)
 #end class dm1b
 
 class onebodydensitymatrices(QIxml): # batched
     tag         = 'estimator'
     identifier  = 'type'
     attributes  = ['type','name','reuse']#reuse is a temporary dummy keyword
-    parameters  = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed']
+    parameters  = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed','samples']
     write_types = obj(energy_matrix=yesno,check_overlap=yesno,check_derivatives=yesno,acceptance_ratio=yesno,rstats=yesno,normalized=yesno,volume_normed=yesno)
 #end class onebodydensitymatrices
 
@@ -2536,6 +2536,7 @@ class vmc(QIxml):
                   'blocks','steps','substeps','timestep','maxcpusecs','rewind',
                   'storeconfigs','checkproperties','recordconfigs','current',
                   'stepsbetweensamples','samplesperthread','samples','usedrift',
+                  'spinmass',
                   'walkers','nonlocalpp','tau','walkersperthread','reconfiguration', # legacy - batched
                   'dmcwalkersperthread','current','ratio','firststep',
                   'minimumtargetwalkers','max_seconds']
@@ -2558,6 +2559,7 @@ class dmc(QIxml):
                   'stepsbetweensamples','samplesperthread','samples','reconfiguration',
                   'nonlocalmoves','maxage','alpha','gamma','reserve','use_nonblocking',
                   'branching_cutoff_scheme','feedback','sigmabound',
+                  'spinmass',
                   'walkers','nonlocalmove','pop_control','targetwalkers',               # legacy - batched
                   'minimumtargetwalkers','energybound','feedback','recordwalkers',
                   'fastgrad','popcontrol','branchinterval','usedrift','storeconfigs',
@@ -2812,6 +2814,7 @@ class gen(QIxml):
     l2_diffusion     = 'L2_diffusion',
     maxage           = 'MaxAge',
     sigmabound       = 'sigmaBound',
+    spinmass         = 'spinMass',
     )
 # afqmc names
 Names.set_afqmc_expanded_names(

diff --git a/src/Estimators/CMakeLists.txt b/src/Estimators/CMakeLists.txt
@@ -47,4 +47,6 @@ endif()
 target_include_directories(qmcestimators PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}")
 target_link_libraries(qmcestimators PUBLIC containers qmcham qmcparticle qmcutil)
 
-add_subdirectory(tests)
+if(BUILD_UNIT_TESTS)
+  add_subdirectory(tests)
+endif()
diff --git a/src/Platforms/CUDA/CMakeLists.txt b/src/Platforms/CUDA/CMakeLists.txt
@@ -9,12 +9,14 @@
 #// File created by: Ye Luo, [email protected], Argonne National Laboratory
 #//////////////////////////////////////////////////////////////////////////////////////
 
-set(CUDA_RT_SRCS CUDAfill.cpp CUDAallocator.cpp CUDAruntime.cpp)
+set(CUDA_RT_SRCS CUDAfill.cpp CUDAallocator.cpp CUDAruntime.cpp CUDADeviceManager.cpp)
 set(CUDA_LA_SRCS cuBLAS_missing_functions.cu)
 
 add_library(platform_cuda_runtime ${CUDA_RT_SRCS})
 add_library(platform_cuda_LA ${CUDA_LA_SRCS})
 
+target_link_libraries(platform_cuda_runtime PRIVATE platform_host_runtime)
+
 if(NOT QMC_CUDA2HIP)
   target_link_libraries(platform_cuda_runtime PUBLIC CUDA::cudart)
   target_link_libraries(platform_cuda_LA PUBLIC CUDA::cublas CUDA::cusolver)

diff --git a/src/Platforms/CUDA/CUDADeviceManager.cpp b/src/Platforms/CUDA/CUDADeviceManager.cpp
@@ -0,0 +1,50 @@
+//////////////////////////////////////////////////////////////////////////////////////
+// This file is distributed under the University of Illinois/NCSA Open Source License.
+// See LICENSE file in top directory for details.
+//
+// Copyright (c) 2023 QMCPACK developers.
+//
+// File developed by: Ye Luo, [email protected], Argonne National Laboratory
+//
+// File created by: Ye Luo, [email protected], Argonne National Laboratory
+//
+//////////////////////////////////////////////////////////////////////////////////////
+
+
+#include "CUDADeviceManager.h"
+#include <stdexcept>
+#include "CUDAruntime.hpp"
+#include "OutputManager.h"
+#include "determineDefaultDeviceNum.h"
+
+namespace qmcplusplus
+{
+CUDADeviceManager::CUDADeviceManager(int& default_device_num, int& num_devices, int local_rank, int local_size)
+    : cuda_default_device_num(-1), cuda_device_count(0)
+{
+  cudaErrorCheck(cudaGetDeviceCount(&cuda_device_count), "cudaGetDeviceCount failed!");
+  if (num_devices == 0)
+    num_devices = cuda_device_count;
+  else if (num_devices != cuda_device_count)
+    throw std::runtime_error("Inconsistent number of CUDA devices with the previous record!");
+  if (cuda_device_count > local_size)
+    app_warning() << "More CUDA devices than the number of MPI ranks. "
+                  << "Some devices will be left idle.\n"
+                  << "There is potential performance issue with the GPU affinity. "
+                  << "Use CUDA_VISIBLE_DEVICE or MPI launcher to expose desired devices.\n";
+  if (num_devices > 0)
+  {
+    cuda_default_device_num = determineDefaultDeviceNum(cuda_device_count, local_rank, local_size);
+    if (default_device_num < 0)
+      default_device_num = cuda_default_device_num;
+    else if (default_device_num != cuda_default_device_num)
+      throw std::runtime_error("Inconsistent assigned CUDA devices with the previous record!");
+
+#pragma omp parallel
+    {
+      cudaErrorCheck(cudaSetDevice(cuda_default_device_num), "cudaSetDevice failed!");
+      cudaErrorCheck(cudaFree(0), "cudaFree failed!");
+    }
+  }
+}
+} // namespace qmcplusplus