Skip to content

v2.6

Compare
Choose a tag to compare
@harrymao2022 harrymao2022 released this 29 Mar 21:57
· 48 commits to rls-v2.6 since this release

Performance Optimizations

  • Intel Architecture Processors
    • Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
    • Improved performance of matmul primitive for processors with Intel AVX-512 support.
  • Intel Graphics Products
    • Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
    • Improved performance for future Intel Arc graphics (code name Alchemist and DG2).
  • AArch64-based Processors
    • Improved binary primitive performance with Arm Compute Library (ACL).
    • Improved shuffle primitive performance for processors with SVE 512 support.

Functionality

  • Introduced bfloat16 destination support for int8 convolution, matmul and inner product primitives for processors with Intel AVX-512 support and or future Intel Xeon® Scalable processors (code name Sapphire Rapids)
  • Extended RNN primitive with support for AUGRU cell.
  • Added support for non-zero negative slope in ReLU post-op for batch normalization primitive.
  • Introduced support for mixed source and destination data types in softmax primitive.
  • Introduced persistent cache API. This functionality allows to serialize and reuse JIT kernels.

Usability

  • Added build time options to manage the set of supported instruction set architectures on Intel Graphics Products. See ONEDNN_ENABLE_PRIMITIVE_GPU_ISA for more details. This feature further reduces the binary footprint.
  • Extended built time options ONEDNN_ENABLE_PRIMITIVE and ONEDNN_ENABLE_WORKLOAD to GPU implementations. This feature further reduces the binary footprint.
  • Reduced stack consumption in GEMM implementation.
  • Added command line help to benchdnn.

Deprecated Functionality

  • Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Breaking Changes

  • Removed performance optimizations for Intel Xeon Phi processors. oneDNN will continue to be functional on these processors using Intel AVX2 codepath.

Thanks to the Contributors

This release contains contributions from the project core team as well as Arthur Mitrano @aaraujom, Aslan @aslanxie, Attila T. Áfra @atafra, Damian Szwichtenberg @dszwicht, Diana Bite @diaena, Joel Dippold @jedippold, Jonathan Deakin @jondea, Jonathan Louis Kaplan @JLouisKaplan-Arm, Kentaro Kawakami @kawakami-k, Luke Ireland @LukeIreland1, Mesut Meterelliyoz @mmeterel, Nathan John Sircombe @nSircombe, Peter Caday @petercad, Tengfei Han @Tengfei09, and Thiago Macieira @thiagomacieira. We would also like to thank everyone who asked questions and reported issues.