Inconsistent handling of tolerance can lead to cryptic failure #190

RossBoylan · 2024-03-21T22:01:10Z

Symptom

> expect_equal(0.105487503092834, 0.10548753, tolerance = 1.5e-7)

Error: 0.105487503092834 (`actual`) not equal to 0.10548753 (`expected`).

actual != expected but don't know how to show the difference

Encountered while using testthat but debugging shows the problem lies in waldo, and so filing this report here.

Summary Analysis

The outer calls lead to waldo's num_equal which applies the tolerance to $|x-y|/|y| < \delta$ as long as $|y|>\delta$ where $\delta$ is the tolerance. Equivalently, it checks $|x-y|<\delta |y|$. Having discovered the difference, it then turns it over to compare_numeric() which, I think, attempts to show only as many digits as are necessary to show the difference. However, the logic here ignores the fact that the absolute difference that causes the test to fail is usually $\delta |y|$, and just uses $\delta$.

With the values shown above this means it's one digit short of what it needs to show the difference; since the numbers agree before that point there are no differences in the formatted values and so the list of (string) differences to be shown is empty. The result is the that compare_numeric() reports the "don't know how to show the difference error."

Possible Fix

In min_digits(),

waldo/R/compare-value.R

Line 161 in e31e97d

n <- min(n, digits(tolerance))

use instead

  if (!is.null(tolerance)) {
   if (abs(y)>tolerance)
       tolerance <- tolerance*abs(y)
    n <- min(n, digits(tolerance))
  }

Although this may solve the immediate problem, it does not address the fact that the logic for interpreting the tolerance is living in at least 2 separate places, which seems undesirable.

Fuller Analysis

Here's the call stack, with the innermost call on the top

num_equal takes a tolerance arg -> FALSE
3.2. compare_numeric produces
actual != expected but don't know how to show the difference
3.1. num_equal says they don't match even though difference is -2.7e-8 and tol is 1.5e-7. x is 0.105, and so it must be doing relative tolerance. (it divided average absolute diff among those that differ by average y for same, unless average y < tolerance)
x = 0.105487503092834
y = 0.10548753
3. compare_vector
2. compare_by_attr -> empty string

compare_terminate-> empty string
compare_structure
waldo::compare
waldo_compare
expect_waldo_equal
expect_equal

The numbers indicate calls made in sequence: compare_structure() first called compare_terminate() which returned an empty string, then compare_by_attr() and finally compare_vector(). The latter first called num_equal() which said the arguments were not equal, and then compare_numeric() which first decided there were differences and then, after processing them, decided there weren't any.

The formatting weirdness between the stack line starting with 2. and the one with 1. is not intentional or significant; I just don't know how to prevent it.

Related Issues

As my note on num_equal() suggests, I found the interpretation of the tolerance surprising, consistent with #188 (at least with with scalar comparison I wasn't exposed to the average difference approach that it apparently uses). At least in waldo it is documented; in testthat the tolerance parameter is named, but absolutely no information on its exact meaning appears.

The text was updated successfully, but these errors were encountered:

hadley · 2024-05-06T15:39:50Z

Closing since it's tracked in waldo

qmarcou mentioned this issue Apr 23, 2024

unstable behavior of expect_equal on two numeric arrays depending on tolerance value r-lib/testthat#1953

Closed

hadley closed this as completed May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent handling of tolerance can lead to cryptic failure #190

Inconsistent handling of tolerance can lead to cryptic failure #190

RossBoylan commented Mar 21, 2024

hadley commented May 6, 2024

Inconsistent handling of tolerance can lead to cryptic failure #190

Inconsistent handling of tolerance can lead to cryptic failure #190

Comments

RossBoylan commented Mar 21, 2024

Symptom

Summary Analysis

Possible Fix

Fuller Analysis

Related Issues

hadley commented May 6, 2024