You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TP linear async_chunk=4 mismatch async_chunk=1 result when sequence length longer than 16K, but match when <= 8K.
Environment Information
- GCC version: 7.5.0
- Torch version: 1.13.1
- Linux system version: Ubuntu 18.04.6 LTS
- CUDA version: 11.6
- Torch's CUDA version (as per `torch.cuda.version()`): 11.6
To Reproduce
CUDA_LAUNCH_BLOCKING can fix this
Expected Behavior
match
Screenshots
No response
Additional Information
No response
Confirmation
I have reviewed and verified all the information provided in this report.
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Description of the Bug
TP linear async_chunk=4 mismatch async_chunk=1 result when sequence length longer than 16K, but match when <= 8K.
Environment Information
To Reproduce
CUDA_LAUNCH_BLOCKING can fix this
Expected Behavior
match
Screenshots
No response
Additional Information
No response
Confirmation
The text was updated successfully, but these errors were encountered: