Skip to content

v0.17

Compare
Choose a tag to compare
@tprimak tprimak released this 19 Nov 20:09
· 13 commits to rls-v0.17 since this release

Performance optimizations

  • Improved int8 convolutions performance on processors with Intel® AVX512-DL Boost instruction set support.
  • Improved performance of fp32 convolutions with number of input and output channels not divisible by the SIMD width for processors with Intel® AVX2 instruction set support.
  • Improved performance of Recurrent Neural Networks (RNNs) functionality.
  • Improved performance of int8 deconvolution.
  • Added optimizations for fp32 inference and training for processors with Intel® AVX instruction set support.
  • Added optimizations for convolutions and auxiliary primitives with 3D spatial data for processors with Intel® AVX2 instruction set support.
  • Improved int8 Winograd convolution performance for real-time inference use cases.

New functionality

  • Introduced int8 data-type support for inner-product primitive.
  • Introduced support for int8 convolutions with signed input and signed weights.
  • Introduced 1D spatial data support in convolution and auxiliary primitives. This functionality is optimized for processors with Intel® AVX512 instruction set support.
  • Introduced the Shuffle primitive.
  • Introduced a general-purpose matrix-matrix multiplication function for int8 data (gemm_s8u8s32 and gemm_s8s8s32).
  • Feature preview: Threading Building Blocks (TBB) support.

API deprecations and breaking changes

  • Order of the gates for LSTM cells was changed to input, forget, candidate, output. This might produce incorrect results.
  • Backward RNN primitive creation without the hint in C++ is deprecated.
  • Int8 Winograd convolution behavior with respect to scales is aligned with the direct convolution algorithm.

Usability improvements

  • Primitives now accept tensors with 0 for the dimension and do nothing in that case.
  • Added support for clang sanitizers.
  • Build system extended with the following capabilities:
    • Allow building with static Intel MKL by passing -DMKLDNN_USE_MKL=FULL:STATIC to cmake
    • Allow specifying the Intel MKL to use by passing -DMKLDNN_USE_MKL={DEF,NONE,ML,FULL} to cmake for that
    • Allow using the compiler's OpenMP RT by passing -DMKLDNN_THREADING=OMP:COMP to cmake for that
    • Allow building a static library by passing -DMKLDNN_LIBRARY_TYPE=STATIC to cmake

Thanks to the contributors

This release contains contributions from many Intel Performance Libraries developers as well as Dmitry Baksheev @dbakshee, Yuta Okamoto @okapies, and Eduardo Gonzalez @wmeddie. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.