TensorFlow-DirectML 1.15.3.dev200911
Pre-release
Pre-release
jstoecker
released this
12 Sep 00:58
·
520 commits
to directml
since this release
Preview build of tensorflow-directml built on September 11, 2020.
The Python packages are available as a PyPI release. To download the appropriate python package automatically, simply pip install tensorflow-directml
.
Changes in dev200911:
- 64 new kernels registered for the DML device (Block-RNN/LSTM/GRU ops, matrix diag ops, roll, and others).
- New BFC-based allocator for DML resources that greatly improves utilization of available memory.
- TF will only attempt to use DirectX devices with support for 16- and 8-bit datatypes, such as FLOAT16, since there is no way to disable certain kernel registrations at runtime.
- Add support for DML_VISIBLE_DEVICES environment variable. This behaves identically to CUDA_VISIBLE_DEVICES. When this environment variable is set, it filters (or re-orders) adapter indices in a process-wide fashion. When adapters are filtered in this way, they don't appear to TF at all and don't show up during device enumeration.
- Add support for TF_DIRECTML_KERNEL_CACHE_SIZE environment variable, which can be used to used to potentially reuse kernel instances more frequently (defaults to 1024 kernels).
- Deliberately leak per-process DML/D3D12 resources and state (see DmlDeviceCache::Instance) to avoid order-of-destruction issues during process exist (matches CUDA device behavior).
- Various bug fixes in kernels and out-of-memory handling.