Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Cuda: Decoder Masked Multihead Attention Q values get corrupted when …
…using cross attention (#16721) ### Description Some code was accidentally moved into the `if(!params.is_cross_attention)' block, it must stay outside to work in both cases. ### Motivation and Context This causes invalid results. We detected this as a performance bug, as it caused the EOS early exit to never happen, and the runs would always take max_length to complete which was slow.
- Loading branch information