The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin.

Changes in 0.1.0

Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
Add DirectML kernels for the RngSkip and RngReadAndSkip operators.
Add DirectML kernels for the StatelessRandomGetKeyCounterAlg, StatelessRandomGetKeyCounter and StatelessRandomGetAlg operators.
Add a DirectML kernel for SparseApplyAdagrad.
Add a DirectML kernel for StatelessRandomUniformV2.
Add a DirectML kernel for InTopKV2.
Add DirectML kernels for MatrixDiagV3 and MatrixDiagPartV3.
Add emulated support for int64.
Add a dependency on tensorflow-cpu>=2.10.0. Users should install the tensorflow-cpu package instead of tensorflow or tensorflow-gpu when using tensorflow-directml-plugin.
Add int32 support for StridedSlice.
Add CPU emulated versions of UnsortedSegmentSum, UnsortedSegmentMax, UnsortedSegmentMin and UnsortedSegmentProd to get rid of device placement errors in transformer models.
Add a C API for Linux. The C API can be downloaded from the releases page in the tensorflow-directml-plugin GitHub repository.
Add support for multiple devices.
Add integer support for Relu.
Add int32 support for Pack.
Fix the incomplete adapter description on Linux.
Fix a crash in ArgMin and ArgMax when the output type was int16 or uint16.
Fix an undefined behavior when retrieving a list of strings from an attribute.
Fix a memory leak in the BFC allocator.
Fix a memory leak in the graph optimizer.
Fix a memory leak in SegmentReduction.
Fix a memory leak in StridedSlice.
Fix a memory leak in the emulated random kernels.
Fix the validation of Range to allow values near INT_MAX.
Get rid of warnings related to unsupported DataFormatDimMap and DataFormatVecPermute operators.
Prevent unbounded growth of command allocator memory.
Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorflow-directml-plugin 0.1.0

Changes in 0.1.0