[ROCM] adjust test_flash_attn_rocm test tolerance #21379
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The test_flash_attn_rocm.py from #21032 failed frequently. For example, I saw two failed jobs today:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1433040&view=logs&j=7b1aee87-96d6-5250-4d5a-ebd31f36823c&t=0a027a84-13d7-51f7-8b84-dc245dd18394&l=1868
E Max absolute difference: 0.002167
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1433428&view=logs&j=7b1aee87-96d6-5250-4d5a-ebd31f36823c&t=0a027a84-13d7-51f7-8b84-dc245dd18394&l=1869
E Max absolute difference: 0.002686
Adjust the abs threshold from 0.002 to 0.005, and use default relative tolerance rtol=0.001.
Motivation and Context