Skip to content

Version 1.31: Sparse CTF + CUDA support + performance model improvements

Compare
Choose a tag to compare
@solomonik solomonik released this 09 Nov 13:23
· 1203 commits to master since this release

New features/updates:

  1. Renewed support for CUDA-based offloading
  2. New macro -DTUNE, activates model tuning, which adjusts performance model parameters. A benchmark aimed to execute a characteristic training set is included in bench/model_trainer and can be used to tune models for any architecture. Output at the end of the benchmark should be pasted into src/shared/init_models.cxx. It is not advisable to always run with -DTUNE on. Current parameters are based on a 16 node Edison runs with 4 processes per node and should be reasonable in most settings. Cubic polynomial model also available, but not used due to inferior observed modelling quality.
  3. Added a mapping search that exhaustively tries all possible mappings. As no effective benefit was observed from this expanded search space and the search itself slowed time down for low-cost contractions (e.g. in CCSDT), this mapping search is only done when the time estimated by the old scheme is longer than 1 second.
  4. Fixed up configure file and generated config.mk file to be more effective and better documented.