Skip to content

Comparing performance with QMCPACK

Mark Dewing edited this page Sep 27, 2018 · 6 revisions

The key kernels from QMCPACK are present in miniQMC, but the relative importance of each kernel may not be retained.

The inverse determinant update is an N^3 operation, and increasingly dominates the run time as the system size increases.

The acceptance ratio option (-r) affects the number of inverse updates performed. The default setting is 0.5, which corresponds the acceptance ratio of a VMC run. To more accurately reproduce a DMC run, use an acceptance ratio 0.99 (-r 0.99)

To more closely reproduce the number of calls from the DMC NiO test cases, use "-n 25 -r 0.999".

The non-local pseudopotential evaluation in QMCPACK is also part of the miniapp, but is not part of the core particle movement loop.

Kernel QMCPACK timer name MiniQMC timer name
Total (core particle loop) DMCUpdatePbyP::movePbyP Diffusion
Inverse matrix update DiracDeterminantBase::update Update/Determinant
Single Particle Orbitals DiracDeterminantBase::spovgl New Gradient/Single-Particle Orbitals
Two body Jastrow WaveFunction::J2_bspline_accept_reject Update/TwoBodyJastrow
... WaveFunction::J2_bspline_VGL New Gradient/TwoBodyJastrow
Distance tables ParticleSet::makeMove Diffusion/Make move
... ParticleSet::setActive Diffusion/Set active
... ParticleSet::donePbyP Diffusion/Accept move
Non-local pseudopotential (not part of core loop) Hamiltonian::NonLocalECP Pseudopotential

Comparing times

Time fraction for MiniQMC vs. QMCPACK. Run on dual-socket Skylake 8180. First grouping of bars (lighter colors) is miniQMC. Second grouping of bars (darker colors) is QMCPACK. Each bar in the grouping is a different thread count (1, 14, 28, and 56). Time fraction for MiniQMC vs. QMCPACK

Absolute times using "-n 25 -r 0.999". These runs use 14 threads. Time fraction for MiniQMC vs. QMCPACK

Clone this wiki locally