v0.17
Performance optimizations
- Improved int8 convolutions performance on processors with Intel® AVX512-DL Boost instruction set support.
- Improved performance of fp32 convolutions with number of input and output channels not divisible by the SIMD width for processors with Intel® AVX2 instruction set support.
- Improved performance of Recurrent Neural Networks (RNNs) functionality.
- Improved performance of int8 deconvolution.
- Added optimizations for fp32 inference and training for processors with Intel® AVX instruction set support.
- Added optimizations for convolutions and auxiliary primitives with 3D spatial data for processors with Intel® AVX2 instruction set support.
- Improved int8 Winograd convolution performance for real-time inference use cases.
New functionality
- Introduced int8 data-type support for inner-product primitive.
- Introduced support for int8 convolutions with signed input and signed weights.
- Introduced 1D spatial data support in convolution and auxiliary primitives. This functionality is optimized for processors with Intel® AVX512 instruction set support.
- Introduced the Shuffle primitive.
- Introduced a general-purpose matrix-matrix multiplication function for int8 data (gemm_s8u8s32 and gemm_s8s8s32).
- Feature preview: Threading Building Blocks (TBB) support.
API deprecations and breaking changes
- Order of the gates for LSTM cells was changed to input, forget, candidate, output. This might produce incorrect results.
- Backward RNN primitive creation without the hint in C++ is deprecated.
- Int8 Winograd convolution behavior with respect to scales is aligned with the direct convolution algorithm.
Usability improvements
- Primitives now accept tensors with 0 for the dimension and do nothing in that case.
- Added support for clang sanitizers.
- Build system extended with the following capabilities:
- Allow building with static Intel MKL by passing
-DMKLDNN_USE_MKL=FULL:STATIC
to cmake - Allow specifying the Intel MKL to use by passing
-DMKLDNN_USE_MKL={DEF,NONE,ML,FULL}
to cmake for that - Allow using the compiler's OpenMP RT by passing
-DMKLDNN_THREADING=OMP:COMP
to cmake for that - Allow building a static library by passing
-DMKLDNN_LIBRARY_TYPE=STATIC
to cmake
- Allow building with static Intel MKL by passing
Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers as well as Dmitry Baksheev @dbakshee, Yuta Okamoto @okapies, and Eduardo Gonzalez @wmeddie. We would also like to thank everyone who asked questions and reported issues.
*Other names and brands may be claimed as the property of others.