Skip to content

Commit

Permalink
Merge branch 'develop' into adds-comment
Browse files Browse the repository at this point in the history
  • Loading branch information
prckent authored Aug 25, 2023
2 parents 6910bd7 + e018e82 commit 42c5498
Show file tree
Hide file tree
Showing 43 changed files with 1,885 additions and 182 deletions.
68 changes: 66 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,73 @@

Notable changes to QMCPACK are documented in this file.

## [Unreleased]
## [3.17.1] - 2023-08-25

The legacy CUDA implementation, the version built with QMC_CUDA=1, has been removed from the codebase.
This minor release is recommended for all users and includes a couple of build fixes and a NEXUS improvement.

* Improved HDF5 detection. Fixes cases where HDF5 was not identified by CMake, including on FreeBSD (thanks @yurivict for the report). [#4708](https://github.com/QMCPACK/qmcpack/pull/4708)
* Fix for building with BUILD_UNIT_TESTS=OFF. [#4709](https://github.com/QMCPACK/qmcpack/pull/4709)
* Add timer for orbital rotations. [#4706](https://github.com/QMCPACK/qmcpack/pull/4706)

### NEXUS

* NEXUS: Support for spinor inputs. [#4707](https://github.com/QMCPACK/qmcpack/pull/4707)
## [3.17.0] - 2023-08-18

This is a recommended release for all users. Thanks to everyone who contributed directly, reported an issue, or suggested an
improvement. There are many quality of life improvements, bug fixes throughout the application, and updates to the associated
testing. As previously announced, the legacy CUDA support (QMC_CUDA=1) is removed in this version. For GPU support, users should
transition to the offload code which is more capable and fully usable in production on NVIDIA GPUs.

This version is intended for long-term support of v3 of QMCPACK. Development effort is now focused towards v4. Contributions of
tests, fixes, and features from users and developers are still welcome to v3 for a potential future release. However, these will not
be ported towards v4 by the core QMCPACK developers without prior arrangement. Please discuss options with QMCPACK developers.

* Simplified checkpointing and enabled it in the batched drivers. Users now only need specify checkpoint={-1,0,N} to checkpoint
between blocks. [#4646](https://github.com/QMCPACK/qmcpack/pull/4646)
* NERSC Perlmutter build recipe. [#4698](https://github.com/QMCPACK/qmcpack/pull/4698)
* qmc-fit: Now supports parameter fitting with jackknife for e.g. DFT+U, EXX scans
[#4475](https://github.com/QMCPACK/qmcpack/pull/4475) and for equation of states and morse fits
[#4518](https://github.com/QMCPACK/qmcpack/pull/4518)
* Improved error checking including NaN checks to protect against potentially unreliable compilers and libraries,
[#4697](https://github.com/QMCPACK/qmcpack/pull/4697), and checks on GPU matrix inversion
[#4693](https://github.com/QMCPACK/qmcpack/pull/4693)
* Significant advances in orbital optimization capability, focusing on LCAO wavefunctions. Development is ongoing for
multideterminant support and for spline wavefunctions. See e.g. the Be atom orbital optimization test
[#4626](https://github.com/QMCPACK/qmcpack/pull/4626), [#4619](https://github.com/QMCPACK/qmcpack/pull/4619), reading and writing
of orbital rotation parameters [#4580](https://github.com/QMCPACK/qmcpack/pull/4580), support for disabled/frozen parameters
[#4581](https://github.com/QMCPACK/qmcpack/pull/4581).
* Magnetization Density Estimator for non-collinear wavefunctions [#4531](https://github.com/QMCPACK/qmcpack/pull/4531)
* Pathak-Wagner regularizer for forces [#4477](https://github.com/QMCPACK/qmcpack/pull/4477)
* The legacy CUDA implementation, the version built with QMC_CUDA=1, has been removed from the codebase,
[#4431](https://github.com/QMCPACK/qmcpack/pull/4431),
[#4632](https://github.com/QMCPACK/qmcpack/pull/4632),[#4499](https://github.com/QMCPACK/qmcpack/pull/4499),
[#4442](https://github.com/QMCPACK/qmcpack/pull/4442).
* For increased performance with current AMD GPU support, new QMC_DISABLE_HIP_HOST_REGISTER option is enabled by default for
ROCm/HIP builds. [#4674](https://github.com/QMCPACK/qmcpack/pull/4674)
* Bugfix: J1Spin indexing was wrong [#4612](https://github.com/QMCPACK/qmcpack/pull/4612)
* Bugfix: 1RDM estimator data written to stat.h5 was incorrect [#4568](https://github.com/QMCPACK/qmcpack/pull/4568)
* Introduced ENABLE_PPCONVERT option and skip ppconvert compilation when cross compiling. [#4601](https://github.com/QMCPACK/qmcpack/pull/4601)
* Faster builds compared to v3.16.0 due to code refactoring [#4682](https://github.com/QMCPACK/qmcpack/pull/4682)
* Many refinements throughout the codebase, cleanup, improved testing.

### NEXUS

* Nexus: Equilibration detection algorithm is now deterministic [#4557](https://github.com/QMCPACK/qmcpack/pull/4557)
* Nexus: Support for Kagayaki cluster at JAIST [#4598](https://github.com/QMCPACK/qmcpack/pull/4598)
* Nexus: GPU support fix for NERSC/Perlmutter [#4699](https://github.com/QMCPACK/qmcpack/pull/4699)
* Nexus: Use simplices in convex_hull to support newer scipy versions [#4671](https://github.com/QMCPACK/qmcpack/pull/4671)
* Nexus: Add pdos flag for Projwfc [#4655](https://github.com/QMCPACK/qmcpack/pull/4655)
* Nexus: Adding crowds_serialize_walkers tag to dmc input list [#4651](https://github.com/QMCPACK/qmcpack/pull/4651)
* Nexus: Qdens handles batched driver input/output [#4645](https://github.com/QMCPACK/qmcpack/pull/4645)
* Nexus: Fix namelist read for Projwfc input [#4644](https://github.com/QMCPACK/qmcpack/pull/4644)

### Known problems

* When offload builds are compiled with CUDA toolkit versions above 11.2 using LLVM, multideterminant tests and functionality will
fail, seemingly due to an issue with the toolkit. This is discussed in https://github.com/llvm/llvm-project/issues/54633 . All
other functionality appears to work as expected. As a workaround, the CUDA toolkit 11.2 can be used. The actual NVIDIA drivers can
be more recent.

## [3.16.0] - 2023-01-31

Expand Down
10 changes: 8 additions & 2 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ endif()
######################################################################
project(
qmcpack
VERSION 3.16.9
VERSION 3.17.9
LANGUAGES C CXX)

# add the automatically determined parts of the RPATH
Expand Down Expand Up @@ -658,9 +658,15 @@ else()
set(HDF5_USE_STATIC_LIBRARIES off)
endif()

find_package(HDF5 1.10 COMPONENTS C)
find_package(HDF5 COMPONENTS C) # Note: minimum version check is done below to bypass find_package
# and HDF5 version compatibility subtleties

if(HDF5_FOUND)
if(HDF5_VERSION)
if (HDF5_VERSION VERSION_LESS 1.10.0)
message(FATAL_ERROR "QMCPACK requires HDF5 version >= 1.10.0")
endif()
endif(HDF5_VERSION)
if(HDF5_IS_PARALLEL)
if(HAVE_MPI)
message(STATUS "Parallel HDF5 library found")
Expand Down
8 changes: 6 additions & 2 deletions config/build_alcf_polaris_Clang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,16 @@ if [[ $name == *"_MP"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_MIXED_PRECISION=ON"
fi

if [[ $name == *"offload"* || $name == *"cuda"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_GPU_ARCHS=sm_80"
fi

if [[ $name == *"offload"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON -DUSE_OBJECT_TARGET=ON -DOFFLOAD_ARCH=sm_80"
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON"
fi

if [[ $name == *"cuda"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=80"
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON"
fi

folder=build_${Machine}_${Compiler}_${name}
Expand Down
99 changes: 99 additions & 0 deletions config/build_nersc_perlmutter_Clang.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/bin/bash
# This recipe is intended for NERSC Perlmutter https://docs.nersc.gov/systems/perlmutter
# It builds all the varaints of QMCPACK in the current directory
# last revision: Aug 12th 2023
#
# How to invoke this script?
# build_nersc_perlmutter_Clang.sh # build all the variants assuming the current directory is the source directory.
# build_nersc_perlmutter_Clang.sh <source_dir> # build all the variants with a given source directory <source_dir>
# build_nersc_perlmutter_Clang.sh <source_dir> <install_dir> # build all the variants with a given source directory <source_dir> and install to <install_dir>

module load PrgEnv-gnu
module load cray-libsci
CRAY_LIBSCI_LIB=$CRAY_LIBSCI_PREFIX_DIR/lib/libsci_gnu_mp.so

module load PrgEnv-llvm/0.1 llvm/16
module load cray-fftw/3.3.10.3
module load cray-hdf5-parallel/1.12.2.3
module load cmake/3.24.3


echo "**********************************"
echo '$ clang -v'
clang -v
echo "**********************************"

TYPE=Release
Machine=perlmutter
Compiler=Clang16

if [[ $# -eq 0 ]]; then
source_folder=`pwd`
elif [[ $# -eq 1 ]]; then
source_folder=$1
else
source_folder=$1
install_folder=$2
fi

if [[ -f $source_folder/CMakeLists.txt ]]; then
echo Using QMCPACK source directory $source_folder
else
echo "Source directory $source_folder doesn't contain CMakeLists.txt. Pass QMCPACK source directory as the first argument."
exit
fi

for name in offload_cuda_real_MP offload_cuda_real offload_cuda_cplx_MP offload_cuda_cplx \
cpu_real_MP cpu_real cpu_cplx_MP cpu_cplx
do

CMAKE_FLAGS="-DCMAKE_BUILD_TYPE=$TYPE -DBLAS_LIBRARIES=$CRAY_LIBSCI_LIB"

if [[ $name == *"cplx"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_COMPLEX=ON"
fi

if [[ $name == *"_MP"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_MIXED_PRECISION=ON"
fi

if [[ $name == *"offload"* || $name == *"cuda"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DQMC_GPU_ARCHS=sm_80"
fi

if [[ $name == *"offload"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_OFFLOAD=ON"
fi

if [[ $name == *"cuda"* ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DENABLE_CUDA=ON"
fi

folder=build_${Machine}_${Compiler}_${name}

if [[ -v install_folder ]]; then
CMAKE_FLAGS="$CMAKE_FLAGS -DCMAKE_INSTALL_PREFIX=$install_folder/$folder"
fi

echo "**********************************"
echo "$folder"
echo "$CMAKE_FLAGS"
echo "**********************************"

mkdir $folder
cd $folder

if [ ! -f CMakeCache.txt ] ; then
cmake $CMAKE_FLAGS -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx $source_folder
fi

if [[ -v install_folder ]]; then
make -j16 install && chmod -R -w $install_folder/$folder
else
make -j16
fi

cd ..

echo
done
22 changes: 17 additions & 5 deletions config/docker/dependencies/ubuntu22/openmpi/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ RUN wget https://apt.kitware.com/kitware-archive.sh &&\
sh kitware-archive.sh

RUN export DEBIAN_FRONTEND=noninteractive &&\
apt-get install gcc g++ \
clang \
clang-format \
clang-tidy \
libomp-dev \
apt-get install gcc-9 g++-9 \
clang-14 \
clang-format-14 \
clang-tidy-14 \
libomp-14-dev \
gcovr \
python3 \
cmake \
Expand Down Expand Up @@ -49,6 +49,18 @@ RUN export DEBIAN_FRONTEND=noninteractive &&\
RUN export DEBIAN_FRONTEND=noninteractive &&\
pip3 install cif2cell

RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 100 && \
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 100

# add clang-14 as clang
RUN update-alternatives --install /usr/bin/clang clang /usr/bin/clang-14 100 && \
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-14 100

# add clang-format and clang-tidy as well as libomp
RUN update-alternatives --install /usr/bin/clang-format clang-format /usr/bin/clang-format-14 100 && \
update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-14 100 && \
update-alternatives --install /usr/bin/clang-tidy-diff.py clang-tidy-diff.py /usr/bin/clang-tidy-diff-14.py 100

# must add a user different from root
# to run MPI executables
RUN useradd -ms /bin/bash user
Expand Down
2 changes: 1 addition & 1 deletion nexus/lib/machines.py
Original file line number Diff line number Diff line change
Expand Up @@ -2314,7 +2314,7 @@ def write_job_header(self,job):
echo $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR
'''
if job.threads>1:
if (job.threads>1) and ('cpu' in job.constraint):
c+='''
export OMP_PROC_BIND=true
export OMP_PLACES=threads
Expand Down
11 changes: 7 additions & 4 deletions nexus/lib/qmcpack_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -1810,10 +1810,10 @@ class simulationcell(QIxml):
#end class simulationcell

class particleset(QIxml):
attributes = ['name','size','random','random_source','randomsrc','charge','source']
attributes = ['name','size','random','random_source','randomsrc','charge','source','spinor']
elements = ['group','simulationcell']
attribs = ['ionid','position']
write_types= obj(random=yesno)
write_types= obj(random=yesno,spinor=yesno)
identifier = 'name'
#end class particleset

Expand Down Expand Up @@ -2319,15 +2319,15 @@ class dm1b(QIxml): # legacy
tag = 'estimator'
identifier = 'type'
attributes = ['type','name','reuse']#reuse is a temporary dummy keyword
parameters = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed']
parameters = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed','samples']
write_types = obj(energy_matrix=yesno,check_overlap=yesno,check_derivatives=yesno,acceptance_ratio=yesno,rstats=yesno,normalized=yesno,volume_normed=yesno)
#end class dm1b

class onebodydensitymatrices(QIxml): # batched
tag = 'estimator'
identifier = 'type'
attributes = ['type','name','reuse']#reuse is a temporary dummy keyword
parameters = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed']
parameters = ['energy_matrix','basis_size','integrator','points','scale','basis','evaluator','center','check_overlap','check_derivatives','acceptance_ratio','rstats','normalized','volume_normed','samples']
write_types = obj(energy_matrix=yesno,check_overlap=yesno,check_derivatives=yesno,acceptance_ratio=yesno,rstats=yesno,normalized=yesno,volume_normed=yesno)
#end class onebodydensitymatrices

Expand Down Expand Up @@ -2536,6 +2536,7 @@ class vmc(QIxml):
'blocks','steps','substeps','timestep','maxcpusecs','rewind',
'storeconfigs','checkproperties','recordconfigs','current',
'stepsbetweensamples','samplesperthread','samples','usedrift',
'spinmass',
'walkers','nonlocalpp','tau','walkersperthread','reconfiguration', # legacy - batched
'dmcwalkersperthread','current','ratio','firststep',
'minimumtargetwalkers','max_seconds']
Expand All @@ -2558,6 +2559,7 @@ class dmc(QIxml):
'stepsbetweensamples','samplesperthread','samples','reconfiguration',
'nonlocalmoves','maxage','alpha','gamma','reserve','use_nonblocking',
'branching_cutoff_scheme','feedback','sigmabound',
'spinmass',
'walkers','nonlocalmove','pop_control','targetwalkers', # legacy - batched
'minimumtargetwalkers','energybound','feedback','recordwalkers',
'fastgrad','popcontrol','branchinterval','usedrift','storeconfigs',
Expand Down Expand Up @@ -2812,6 +2814,7 @@ class gen(QIxml):
l2_diffusion = 'L2_diffusion',
maxage = 'MaxAge',
sigmabound = 'sigmaBound',
spinmass = 'spinMass',
)
# afqmc names
Names.set_afqmc_expanded_names(
Expand Down
4 changes: 3 additions & 1 deletion src/Estimators/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,6 @@ endif()
target_include_directories(qmcestimators PUBLIC "${CMAKE_CURRENT_SOURCE_DIR}")
target_link_libraries(qmcestimators PUBLIC containers qmcham qmcparticle qmcutil)

add_subdirectory(tests)
if(BUILD_UNIT_TESTS)
add_subdirectory(tests)
endif()
4 changes: 3 additions & 1 deletion src/Platforms/CUDA/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,14 @@
#// File created by: Ye Luo, [email protected], Argonne National Laboratory
#//////////////////////////////////////////////////////////////////////////////////////

set(CUDA_RT_SRCS CUDAfill.cpp CUDAallocator.cpp CUDAruntime.cpp)
set(CUDA_RT_SRCS CUDAfill.cpp CUDAallocator.cpp CUDAruntime.cpp CUDADeviceManager.cpp)
set(CUDA_LA_SRCS cuBLAS_missing_functions.cu)

add_library(platform_cuda_runtime ${CUDA_RT_SRCS})
add_library(platform_cuda_LA ${CUDA_LA_SRCS})

target_link_libraries(platform_cuda_runtime PRIVATE platform_host_runtime)

if(NOT QMC_CUDA2HIP)
target_link_libraries(platform_cuda_runtime PUBLIC CUDA::cudart)
target_link_libraries(platform_cuda_LA PUBLIC CUDA::cublas CUDA::cusolver)
Expand Down
50 changes: 50 additions & 0 deletions src/Platforms/CUDA/CUDADeviceManager.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
//////////////////////////////////////////////////////////////////////////////////////
// This file is distributed under the University of Illinois/NCSA Open Source License.
// See LICENSE file in top directory for details.
//
// Copyright (c) 2023 QMCPACK developers.
//
// File developed by: Ye Luo, [email protected], Argonne National Laboratory
//
// File created by: Ye Luo, [email protected], Argonne National Laboratory
//
//////////////////////////////////////////////////////////////////////////////////////


#include "CUDADeviceManager.h"
#include <stdexcept>
#include "CUDAruntime.hpp"
#include "OutputManager.h"
#include "determineDefaultDeviceNum.h"

namespace qmcplusplus
{
CUDADeviceManager::CUDADeviceManager(int& default_device_num, int& num_devices, int local_rank, int local_size)
: cuda_default_device_num(-1), cuda_device_count(0)
{
cudaErrorCheck(cudaGetDeviceCount(&cuda_device_count), "cudaGetDeviceCount failed!");
if (num_devices == 0)
num_devices = cuda_device_count;
else if (num_devices != cuda_device_count)
throw std::runtime_error("Inconsistent number of CUDA devices with the previous record!");
if (cuda_device_count > local_size)
app_warning() << "More CUDA devices than the number of MPI ranks. "
<< "Some devices will be left idle.\n"
<< "There is potential performance issue with the GPU affinity. "
<< "Use CUDA_VISIBLE_DEVICE or MPI launcher to expose desired devices.\n";
if (num_devices > 0)
{
cuda_default_device_num = determineDefaultDeviceNum(cuda_device_count, local_rank, local_size);
if (default_device_num < 0)
default_device_num = cuda_default_device_num;
else if (default_device_num != cuda_default_device_num)
throw std::runtime_error("Inconsistent assigned CUDA devices with the previous record!");

#pragma omp parallel
{
cudaErrorCheck(cudaSetDevice(cuda_default_device_num), "cudaSetDevice failed!");
cudaErrorCheck(cudaFree(0), "cudaFree failed!");
}
}
}
} // namespace qmcplusplus
Loading

0 comments on commit 42c5498

Please sign in to comment.