Releases: intel/yask
Releases · intel/yask
Version 2.22.00
Most significant bits:
- Added feature to control ordering of domain dimensions without having to modify the DSL stencil code. This affects memory layout, looping order, vector-folding, and rank layout. See slide 60 of tutorial. Controls available via YASK compiler command-line and API.
- Changed the heuristic to determine vector-folding sizes when some sizes are specified. This did not affect the default folding sizes.
- Updated the tutorial.
Version 2.21.00
Most significant bits:
- Added APIs for creating finite-difference coefficients. Center, forward, backward and arbitrary forms are supported. These can be accessed from the YASK compiler or kernel. See the documentation under Functions in the Common Utilities section of the API guide. See example usage in
src/stencils/Iso3dfdStencil.cpp
. Thanks to Jeremy Tillay for contributing the coefficient-calculation code when he was an MS student at Rice University. - Improved the performance of permute instruction sequences in generated AVX2 code.
Version 2.20.00
Changes made in the behavior of the several APIs:
- Added checking of the step-dimension index value in the
yk_grid::get_element()
and similar APIs.
Previously, invalid values silently "wrapped" around to valid values.
Now, by default, the step index must be valid when reading, and the valid step indices are updated when writing.
The old behavior of silent index wrapping may be restored viaset_step_wrap(true)
. - The default for all
strict_indices
API parameters is nowtrue
to catch more programming errors and
increase consistency of behavior between "set" and "get" APIs. - The advanced
share_storage()
APIs have been replaced withfuse_grids()
.
Version 2.19.03
Most-significant changes since 2.18.00:
- Allow custom stencil-name in binary to be specified during build via
YK_STENCIL
make var. For example,make -j stencil=iso3dfd radius=4 YK_STENCIL=iso_r4
will create a binary that can be run viabin/yask.sh -stencil iso_r4 ...
. - Report 50th-percentile trial result in addition to best result. When running an odd number of trials, this is equivalent to the median result.
- Improve parsing of auto-tuned block sizes by
utils/bin/yask_log_to_csv.pl
utility. - Add
-trace
option to control debug output from a binary compiled withtrace=1
and/ortrace_mem=1
. - Add
-no-print_suffixes
option to simplify machine-parsing of output. For example,1234000
would be printed instead of1.234M
.
Version 2.18.00
Major changes:
- Strong-scaling mode now supported: use the
-g
* options to control the global problem size. Local-domain sizes will be determined automatically. There is a correspondingset_overall_domain_size()
API. - Number of MPI ranks in each dimension is also determined automatically if not supplied.
- Tutorial updated; look in the
docs
directory or the link in the README file.
Version 2.17.00
Most important changes:
- Architecture in
make
andbin/yask.sh
and number of MPI ranks inbin/yask.sh
are now determined automatically from the current host. This changed the old behavior ofmake
defaulting tosnb
architecture andbin/yask.sh
requiring-arch
and-ranks
. Those options are still available to override the host-based defaults. - The
-auto_tune_mini_blocks
option directs the auto-tuner to search mini-block sizes instead of block sizes. Useful for temporal blocking.
Version 2.16.03
Most significant changes:
- Expand capability of "step condition" to allow testing of YASK vars, not just the step index.
- Fixed bugs in
get_{first,last}_rank_domain_index()
APIs. - Moved the position of the log-file name to the last column in the CSV output of
utils/bin/yask_log_to_csv.pl
.
Version 2.16.00
Most significant new features:
- Addition of math functions (sqrt, exp, sin, cos, etc.) into the DSL. (Not yet available in the YASK compiler API.)
- Revamping of the thread-binding performance feature useful with scratch vars and temporal tiling.
Version 2.15.11
A bug-fix release:
- Support more combinations of NUMA and mmap include files.
- Fix reporting of number of OpenMP threads.
Version 2.15.10
Most-significant changes:
- Fix corner-case bug with overlapping MPI comms of asymmetrical stencils.
- Add script to convert log files to csv format: run
utils/bin/yask_log_to_csv.pl
. - Split up some of the large source files for faster parallel builds.