v0.0.4
Released on October 8, 2019.
- Reduced GPU memory fragmentation by caching CUDA streams for copy.
- Fixed potential GPU memory violation on tuple of multiple tensors.
- Fixed potential GPU memory violation on shifted view tensors. (issue #27366 and pull request #27371 on PyTorch)