[2.7] What Is New #327

hwu36 · 2021-09-25T04:04:53Z

hwu36
Sep 25, 2021
Maintainer

We speedup our pace and release 2.7 shortly after 2.6.1. README and CHANGELOG list all of new additions in 2.7. Below adds a little more details:

A Smoky fast strided dgrad kernel by smartly cutting down unnecessary computations. This kernel is used when any stride is larger than 1. To use it, just set StrideSupport here to StrideSupport::kStrided. Now that we have optimized implementation for all convolution kernels, we change the default convolution algorithm to kOptimized from kAnalytic. The profiler only generates the optimized convolution kernels by default.
In the convolution kernels, we no longer require the channel to be 128bit aligned to use tensor cores. You don't have to pad your tensors to meet this requirement any more. Though, 128bit alignment can deliver the best performance. This is implemented by @mengchihe from the community. Thank you very much!!!
We added a mainloop fusion kernel in example 23. This kernel can reduce one of the GEMM operands along GEMM-k dimension when doing GEMM. It has a new 1xM or 1xN vector output depends on which operand is reduced. The additional reduction operation adds almost no runtime overhead. This can be used in Megatron.
We added a fp16 acceleration for gelu_taylor. The same idea can be applied to other activation functions.
We speedup the convolution unit testing time by 40x without losing coverage.