cuda 11.4 is out. #284

hwu36 · 2021-07-01T03:29:42Z

hwu36
Jul 1, 2021
Maintainer

The best compiler so far for Ampere.

Cutlass wgrad kernels are improved by 14% in geomean when evaluating the layers of resnet-50. The max improvement is 37%. No regression in any layer.

11.4 adds many new features in ld, cp.async ptx (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#ptx-isa-version-7-4). The coming cutlass 2.6 will support prefetch_size which can slightly improve the performance of many kernels. If you cannot wait, you can just add them to https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/arch/memory.h and https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/arch/memory_sm80.h. For example, cp.async.ca.shared.global.L2::128B. Moreover, please let us know if you find the new cache eviction policy feature is helpful to your applications. We can consider to support them in the future releases.

FindDefinition · 2021-07-01T09:18:41Z

FindDefinition
Jul 1, 2021

How does prefetch_size work? assume we have 128x32 tile shape for A, 128bit access, half data, threads in a warp prefetch overlapped global memory to L2?

3 replies

hwu36 Jul 1, 2021
Maintainer Author

kind of. It is not for thread 0. L2 is shared among all SMs, not just for one thread.

FindDefinition Jul 2, 2021

how to understand prefetch_size=64B/128B/256B? doc says it's just a performance hint , how does the value of prefetch size affect SASS? (may be cache control in SASS?)

hwu36 Jul 2, 2021
Maintainer Author

Just use 128B. We tested all three and 128B is the best one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda 11.4 is out. #284

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

cuda 11.4 is out. #284

hwu36 Jul 1, 2021 Maintainer

Replies: 1 comment · 3 replies

FindDefinition Jul 1, 2021

hwu36 Jul 1, 2021 Maintainer Author

FindDefinition Jul 2, 2021

hwu36 Jul 2, 2021 Maintainer Author

hwu36
Jul 1, 2021
Maintainer

Replies: 1 comment 3 replies

FindDefinition
Jul 1, 2021

hwu36 Jul 1, 2021
Maintainer Author

hwu36 Jul 2, 2021
Maintainer Author