Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust test tolerance #19947

Merged
merged 10 commits into from
Mar 19, 2024
Merged

Adjust test tolerance #19947

merged 10 commits into from
Mar 19, 2024

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Mar 16, 2024

Description

Improve the precision of tests.

Changes include:
(1) Update checkers.cc to use consistent default tolerance.
(2) Allow different default tolerances for different providers at runtime (Previously, threshold of a test is decided during compiling).
(3) Explicitly set absolute and relative error tolerances for tests that failed to pass new default threshold.

Default Thresholds Change

Note that the formula of testing is abs(expected - value) < absolute + relative * abs(expected)

Default test thresholds when both absolute and relative tolerance are not set:

type provider absolute (before) absolute (after) relative (before) relative (after)
double CPU 0.001 0.00001 0 0.00001
double CUDA 0.005 0.00001 0 0.00001
double TRT 0.005 0.00001 0 0.00001
double ROCM 0.005 0.00001 0 0.00001
double DML 0.005 0.00001 0 0.00001
           
float CPU 0.0001 0.00001 0 0.0001
float CUDA 0.005 0.00001 0 0.0001
float TRT 0.005 0.00001 0 0.0001
float ROCM 0.005 0.00001 0 0.0001
float DML 0.005 0.00001 0 0.0001
float Training* 0.005 0.001 0 0.0001
           
half CPU 0.001 0.0025 0 0.001
half CUDA 0.005 0.0025 0 0.001
half TRT 0.005 0.0025 0 0.001
half ROCM 0.005 0.0025 0 0.001
half DML 0.02 0.005 0 0.001
half Training* 0.005 0.005 0 0.001
           
bfloat16 CPU 0.0001 0.02 0 0.01
bfloat16 CUDA 0.0001 0.02 0.05 0.01
bfloat16 TRT 0.0001 0.02 0.05 0.01
bfloat16 ROCM 0.0001 0.02 0.05 0.01
bfloat16 DML 0.0001 0.02 0.05 0.01
bfloat16 Training* 0.0001 0.02 0.05 0.01

*Training mean a build flag ENABLE_TRAINING_CORE is defined. The provider can be any one.

Threshold for provider

Previously, the threshold might change according to build flags:

#if defined(USE_CUDA) || defined(USE_ROCM) || defined(USE_DML)
  constexpr float threshold = 0.005f;
#else
  constexpr float threshold = 0.0001f;
#endif

For a cpu only build, the threshold is 0.0001. For a cuda build, the threshold for CPU provider (some tests in cuda build actually run with CPU provider) is changed to 0.005.

After this change, the threshold only depends on data type and provider used in the test. It will not change by build flags for non-training builds.

Default thresholds for training might be different from inference (please refer to the above table). There are a few factors there: Training has gradient outputs; TF32 is not disabled in training; Some training tests has iterations, and error might accumulate. How to set different thresholds based on these factors could be a future task.

Motivation and Context

@tianleiwu tianleiwu marked this pull request as draft March 16, 2024 01:43
@tianleiwu tianleiwu force-pushed the tlwu/default_test_threshold branch 2 times, most recently from 2f5575a to a16da8a Compare March 18, 2024 06:36
@tianleiwu tianleiwu force-pushed the tlwu/default_test_threshold branch from a16da8a to 5e2a247 Compare March 18, 2024 07:05
@tianleiwu tianleiwu changed the title Adjust test tolerance and disable tf32 in tests Adjust test tolerance Mar 18, 2024
@tianleiwu tianleiwu marked this pull request as ready for review March 18, 2024 07:22
@tianleiwu tianleiwu marked this pull request as draft March 18, 2024 16:35
@tianleiwu tianleiwu marked this pull request as ready for review March 19, 2024 16:41
snnn
snnn previously approved these changes Mar 19, 2024
@tianleiwu tianleiwu requested a review from snnn March 19, 2024 20:38
@tianleiwu tianleiwu merged commit 597e828 into main Mar 19, 2024
93 of 95 checks passed
@tianleiwu tianleiwu deleted the tlwu/default_test_threshold branch March 19, 2024 22:50
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
### Description
Improve the precision of tests. 

Changes include:
(1) Update checkers.cc to use consistent default tolerance.
(2) Allow different default tolerances for different providers at
runtime (Previously, threshold of a test is decided during compiling).
(3) Explicitly set absolute and relative error tolerances for tests that
failed to pass new default threshold.

#### Default Thresholds Change

Note that the formula of testing is `abs(expected - value) < absolute +
relative * expected`

Default test thresholds when both absolute and relative tolerance are
not set:

type | provider | absolute (before) | absolute (after) | relative
(before) | relative (after)
-- | -- | -- | -- | -- | --
double | CPU | 0.001 | 0.00001 | 0 | 0.00001
double | CUDA | 0.005 | 0.00001 | 0 | 0.00001
double | TRT | 0.005 | 0.00001 | 0 | 0.00001
double | ROCM | 0.005 | 0.00001 | 0 | 0.00001
double | DML | 0.005 | 0.00001 | 0 | 0.00001
  |   |   |   |   |  
float | CPU | 0.0001 | 0.00001 | 0 | 0.0001
float | CUDA | 0.005 | 0.00001 | 0 | 0.0001
float | TRT | 0.005 | 0.00001 | 0 | 0.0001
float | ROCM | 0.005 | 0.00001 | 0 | 0.0001
float | DML | 0.005 | 0.00001 | 0 | 0.0001
float | Training* | 0.005 | 0.001 | 0 | 0.0001
  |   |   |   |   |  
half | CPU | 0.001 | 0.0025 | 0 | 0.001
half | CUDA | 0.005 | 0.0025 | 0 | 0.001
half | TRT | 0.005 | 0.0025 | 0 | 0.001
half | ROCM | 0.005 | 0.0025 | 0 | 0.001
half | DML | 0.02 | 0.005 | 0 | 0.001
half | Training* | 0.005 | 0.005 | 0 | 0.001
  |   |   |   |   |  
bfloat16 | CPU | 0.0001 | 0.02 | 0 | 0.01
bfloat16 | CUDA | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | TRT | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | ROCM | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | DML | 0.0001 | 0.02 | 0.05 | 0.01
bfloat16 | Training* | 0.0001 | 0.02 | 0.05 | 0.01

*Training mean a build flag ENABLE_TRAINING_CORE is defined. The
provider can be any one.

#### Threshold for provider
 
Previously, the threshold might change according to build flags:
```
#if defined(USE_CUDA) || defined(USE_ROCM) || defined(USE_DML)
  constexpr float threshold = 0.005f;
#else
  constexpr float threshold = 0.0001f;
#endif
```
For a cpu only build, the threshold is 0.0001. For a cuda build, the
threshold for CPU provider (some tests in cuda build actually run with
CPU provider) is changed to 0.005.

After this change, the threshold only depends on data type and provider
used in the test. It will not change by build flags for non-training
builds.


Default thresholds for training might be different from inference
(please refer to the above table). There are a few factors there:
Training has gradient outputs; TF32 is not disabled in training; Some
training tests has iterations, and error might accumulate. How to set
different thresholds based on these factors could be a future task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants