[CUDA] Fix SkipLayerNorm strict mode when skip has broadcast #17896

tianleiwu · 2023-10-12T00:53:17Z

Description

In SLN strict mode, current code (#16510) does not handle skip broadcast nicely . There are two issues:
(1) skip related parameters is not passed to cuda kernel in strict mode
(2) Strict mode kernel also has bug in handling skip broadcasting (like cuWelfordMuSigma2 does not handle skip broadcasting).

Here we remove the support of skip broadcasting in strict mode, and operator will return error message that strict mode only support same shape of input and skip.

Other changes:

skip_size is misleading when there is no broadcasting. Change to correct value.
Refactor the code to be more efficient: (1) no need to check whether there is broadcasting in kernel. (2) remove one local buffer (load input to sum_v directly to save a local buffer copy).
compute input + bias + skip instead of input + skip + bias. The order is followed common pattern in transformers model (Here assume graph fusion will distinguish input and skip correctly, need double check fusion code later).
update unit test so that strict mode is triggered in each test case (unless skip broadcasting) to have higher test coverage.

Motivation and Context

SLN strict mode does not support skip broadcast but current code will silently run (kernel might fail)

onnxruntime/contrib_ops/cuda/bert/skip_layer_norm.cc

In SLN strict mode, current code (#16510) does not handle skip broadcast nicely . There are two issues: (1) skip related parameters is not passed to cuda kernel in strict mode (2) Strict mode kernel also has bug in handling skip broadcasting (like cuWelfordMuSigma2 does not handle skip broadcasting). Here we remove the support of skip broadcasting in strict mode, and operator will return error message that strict mode only support same shape of input and skip. Other changes: * skip_size is misleading when there is no broadcasting. Change to correct value. * Refactor the code to be more efficient: (1) no need to check whether there is broadcasting in kernel. (2) remove one local buffer (load input to sum_v directly to save a local buffer copy). * compute input + bias + skip instead of input + skip + bias. The order is followed common pattern in transformers model (Here assume graph fusion will distinguish input and skip correctly, need double check fusion code later). * update unit test so that strict mode is triggered in each test case (unless skip broadcasting) to have higher test coverage. ### Motivation and Context  SLN strict mode does not support skip broadcast but current code will silently run (kernel might fail)

…ft#17896) In SLN strict mode, current code (microsoft#16510) does not handle skip broadcast nicely . There are two issues: (1) skip related parameters is not passed to cuda kernel in strict mode (2) Strict mode kernel also has bug in handling skip broadcasting (like cuWelfordMuSigma2 does not handle skip broadcasting). Here we remove the support of skip broadcasting in strict mode, and operator will return error message that strict mode only support same shape of input and skip. Other changes: * skip_size is misleading when there is no broadcasting. Change to correct value. * Refactor the code to be more efficient: (1) no need to check whether there is broadcasting in kernel. (2) remove one local buffer (load input to sum_v directly to save a local buffer copy). * compute input + bias + skip instead of input + skip + bias. The order is followed common pattern in transformers model (Here assume graph fusion will distinguish input and skip correctly, need double check fusion code later). * update unit test so that strict mode is triggered in each test case (unless skip broadcasting) to have higher test coverage. ### Motivation and Context  SLN strict mode does not support skip broadcast but current code will silently run (kernel might fail)

fix SLN strict mode with skip broadcast

97c43dd

tianleiwu requested review from wangyems and zhanghuanrong October 12, 2023 00:53

tianleiwu changed the title ~~Fix SkipLayerNorm strict mode when skip has broadcast~~ [CUDA] Fix SkipLayerNorm strict mode when skip has broadcast Oct 12, 2023

Merge branch 'main' into tlwu/sln_update

d9b5883

tianleiwu marked this pull request as draft October 12, 2023 16:19

tianleiwu marked this pull request as ready for review October 12, 2023 16:21

use narrow cast

ebd6779

wangyems reviewed Oct 13, 2023

View reviewed changes

onnxruntime/contrib_ops/cuda/bert/skip_layer_norm.cc Show resolved Hide resolved

wangyems approved these changes Oct 13, 2023

View reviewed changes

tianleiwu merged commit 67d7eb3 into main Oct 13, 2023
89 of 91 checks passed

tianleiwu deleted the tlwu/sln_update branch October 13, 2023 14:51

tianleiwu added the release:1.16.2 label Oct 24, 2023

faxu added triage:approved Approved for cherrypicks for release sdxl_llama labels Oct 25, 2023

tianleiwu removed triage:approved Approved for cherrypicks for release release:1.16.2 labels Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Fix SkipLayerNorm strict mode when skip has broadcast #17896

[CUDA] Fix SkipLayerNorm strict mode when skip has broadcast #17896

tianleiwu commented Oct 12, 2023 •

edited

Loading

[CUDA] Fix SkipLayerNorm strict mode when skip has broadcast #17896

[CUDA] Fix SkipLayerNorm strict mode when skip has broadcast #17896

Conversation

tianleiwu commented Oct 12, 2023 • edited Loading

Description

Motivation and Context

tianleiwu commented Oct 12, 2023 •

edited

Loading