Skip to content

TensorFlow-DirectML 1.15.3.dev200911

Pre-release
Pre-release
Compare
Choose a tag to compare
@jstoecker jstoecker released this 12 Sep 00:58
· 520 commits to directml since this release
ad352d2

Preview build of tensorflow-directml built on September 11, 2020.

The Python packages are available as a PyPI release. To download the appropriate python package automatically, simply pip install tensorflow-directml.

Changes in dev200911:

  • 64 new kernels registered for the DML device (Block-RNN/LSTM/GRU ops, matrix diag ops, roll, and others).
  • New BFC-based allocator for DML resources that greatly improves utilization of available memory.
  • TF will only attempt to use DirectX devices with support for 16- and 8-bit datatypes, such as FLOAT16, since there is no way to disable certain kernel registrations at runtime.
  • Add support for DML_VISIBLE_DEVICES environment variable. This behaves identically to CUDA_VISIBLE_DEVICES. When this environment variable is set, it filters (or re-orders) adapter indices in a process-wide fashion. When adapters are filtered in this way, they don't appear to TF at all and don't show up during device enumeration.
  • Add support for TF_DIRECTML_KERNEL_CACHE_SIZE environment variable, which can be used to used to potentially reuse kernel instances more frequently (defaults to 1024 kernels).
  • Deliberately leak per-process DML/D3D12 resources and state (see DmlDeviceCache::Instance) to avoid order-of-destruction issues during process exist (matches CUDA device behavior).
  • Various bug fixes in kernels and out-of-memory handling.