Releases: cyclops-community/ctf
Improved intermediates for expression evaluation
lower order intermediates for tensor contraction chains
Version 1.33: minor fixes, some improving sequential performance
minor changes with respect to previous version, improvements to sequential performance by avoiding extra copy, corrections to GPU support code
Version 1.32: fixes and improvements to user-specified functions/transforms
Adjusted functionality and correctness of behavior for Functions and Transforms. Arbitrary-type support and multi-type functions are now more mature. For explanation of usage see the paper,
http://arxiv.org/abs/1512.00066
as well as shortest paths codes in examples/sssp.cxx and examples/apsp.cxx as well as a more complicated betweenness centrality code, examples/btwn_central.cxx.
Sparse CTF
Added support for sparse tensors and extended elementwise function interface to support C++11 Lambdas.
Also made some incremental changes to performance models.
It is now possible to sum sparse tensors and contract a sparse tensor with a dense tensor.
Contractions between two sparse tensors are not yet supported.
Version 1.31: Sparse CTF + CUDA support + performance model improvements
New features/updates:
- Renewed support for CUDA-based offloading
- New macro -DTUNE, activates model tuning, which adjusts performance model parameters. A benchmark aimed to execute a characteristic training set is included in bench/model_trainer and can be used to tune models for any architecture. Output at the end of the benchmark should be pasted into src/shared/init_models.cxx. It is not advisable to always run with -DTUNE on. Current parameters are based on a 16 node Edison runs with 4 processes per node and should be reasonable in most settings. Cubic polynomial model also available, but not used due to inferior observed modelling quality.
- Added a mapping search that exhaustively tries all possible mappings. As no effective benefit was observed from this expanded search space and the search itself slowed time down for low-cost contractions (e.g. in CCSDT), this mapping search is only done when the time estimated by the old scheme is longer than 1 second.
- Fixed up configure file and generated config.mk file to be more effective and better documented.