tensorflow-directml-plugin 0.1.0
Pre-release
Pre-release
PatriceVignola
released this
29 Sep 17:08
·
28 commits
to main
since this release
The Python packages are available as a PyPI release. To download the latest python package automatically, simply pip install tensorflow-directml-plugin
.
Changes in 0.1.0
- Upgrade the DirectML version to 1.9.1, which includes minor bug fixes and performance improvements.
- Add DirectML kernels for the
RngSkip
andRngReadAndSkip
operators. - Add DirectML kernels for the
StatelessRandomGetKeyCounterAlg
,StatelessRandomGetKeyCounter
andStatelessRandomGetAlg
operators. - Add a DirectML kernel for
SparseApplyAdagrad
. - Add a DirectML kernel for
StatelessRandomUniformV2
. - Add a DirectML kernel for
InTopKV2
. - Add DirectML kernels for
MatrixDiagV3
andMatrixDiagPartV3
. - Add emulated support for
int64
. - Add a dependency on
tensorflow-cpu>=2.10.0
. Users should install thetensorflow-cpu
package instead oftensorflow
ortensorflow-gpu
when usingtensorflow-directml-plugin
. - Add
int32
support forStridedSlice
. - Add CPU emulated versions of
UnsortedSegmentSum
,UnsortedSegmentMax
,UnsortedSegmentMin
andUnsortedSegmentProd
to get rid of device placement errors in transformer models. - Add a C API for Linux. The C API can be downloaded from the releases page in the
tensorflow-directml-plugin
GitHub repository. - Add support for multiple devices.
- Add integer support for
Relu
. - Add
int32
support forPack
. - Fix the incomplete adapter description on Linux.
- Fix a crash in
ArgMin
andArgMax
when the output type wasint16
oruint16
. - Fix an undefined behavior when retrieving a list of strings from an attribute.
- Fix a memory leak in the BFC allocator.
- Fix a memory leak in the graph optimizer.
- Fix a memory leak in
SegmentReduction
. - Fix a memory leak in
StridedSlice
. - Fix a memory leak in the emulated random kernels.
- Fix the validation of
Range
to allow values nearINT_MAX
. - Get rid of warnings related to unsupported
DataFormatDimMap
andDataFormatVecPermute
operators. - Prevent unbounded growth of command allocator memory.
- Optimize output allocation for inputs that can be executed in-place and directly forwarded to the output.
- Increase the available memory by allowing devices to allocate shared (nonlocal) memory.
- Improve the performance of the unsorted segment operators by batching GPU->CPU copies together.
- Increase the performance of emulated operators by reducing the number of eager context and eager ops creation.