Skip to content

tensorflow-directml-plugin 0.1.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@PatriceVignola PatriceVignola released this 29 Sep 17:08
· 28 commits to main since this release
536ad9a

The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.0

  • Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
  • Add DirectML kernels for the RngSkip and RngReadAndSkip operators.
  • Add DirectML kernels for the StatelessRandomGetKeyCounterAlg, StatelessRandomGetKeyCounter and StatelessRandomGetAlg operators.
  • Add a DirectML kernel for SparseApplyAdagrad.
  • Add a DirectML kernel for StatelessRandomUniformV2.
  • Add a DirectML kernel for InTopKV2.
  • Add DirectML kernels for MatrixDiagV3 and MatrixDiagPartV3.
  • Add emulated support for int64.
  • Add a dependency on tensorflow-cpu>=2.10.0. Users should install the tensorflow-cpu package instead of tensorflow or tensorflow-gpu when using tensorflow-directml-plugin.
  • Add int32 support for StridedSlice.
  • Add CPU emulated versions of UnsortedSegmentSum, UnsortedSegmentMax, UnsortedSegmentMin and UnsortedSegmentProd to get rid of device placement errors in transformer models.
  • Add a C API for Linux. The C API can be downloaded from the releases page in the tensorflow-directml-plugin GitHub repository.
  • Add support for multiple devices.
  • Add integer support for Relu.
  • Add int32 support for Pack.
  • Fix the incomplete adapter description on Linux.
  • Fix a crash in ArgMin and ArgMax when the output type was int16 or uint16.
  • Fix an undefined behavior when retrieving a list of strings from an attribute.
  • Fix a memory leak in the BFC allocator.
  • Fix a memory leak in the graph optimizer.
  • Fix a memory leak in SegmentReduction.
  • Fix a memory leak in StridedSlice.
  • Fix a memory leak in the emulated random kernels.
  • Fix the validation of Range to allow values near INT_MAX.
  • Get rid of warnings related to unsupported DataFormatDimMap and DataFormatVecPermute operators.
  • Prevent unbounded growth of command allocator memory.
  • Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
  • Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
  • Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
  • Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.