-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Denormalization causes loss of accuracy in *
#42
Comments
LinearAlgebra.givensAlgorithm
*
Hey @haampie, thank you for bringing this to my attention! This looks like an unfortunate case where my lax approach to renormalization breaks things... At the moment I'm not sure if there's a simple tweak that can fix this without having performance implications in other cases, but I'll certainly give it a try. In the meantime, can you provide the full |
I've finally found a rotation that causes an non-normalized multifloat: julia> c = Float64x4((1.0, -1.2814460661601042e-50, -1.1456058604534196e-66, -4.018828477571062e-83))
julia> s = Float64x4((1.6009035362320234e-25, -8.470934410604026e-42, 4.900032439512285e-58, -3.05869844561302e-74))
julia> c*c + s*s - 1 # check if preserves norm as a rotation
-1.15647600223948327554941871035198256900584233789933e-64
julia> x = Float64x4((-2.3017404993032726e-25, -1.8187505516645134e-41, 4.637866565216834e-58, -2.3866542964252726e-74))
julia> y = Float64x4((1.4377758854357827, -9.834411896819007e-17, 4.676677591256931e-33, 7.260162680046171e-50))
julia> z = c * x + s * y
-1.8321827051883379215672299273506959629434104035485e-50
julia> z._limbs
(-1.832182705188338e-50, 6.742120038269503e-67, -5.263781917376804e-74, 0.0)
julia> z + 0.0 === z
false |
Hey @haampie! I had a chance to dig into this today, and unfortunately, I don't think it's possible to fix this issue without significant performance impact to MultiFloats.jl. First, just to make sure we're on the same page, I've verified that there is no loss of accuracy occurring in the calculation using MultiFloats
# cover full Float64 exponent range, including subnormals
setprecision(exponent(floatmax(Float64)) - exponent(floatmin(Float64)) + precision(Float64))
c = Float64x4((1.0, -1.2814460661601042e-50, -1.1456058604534196e-66, -4.018828477571062e-83))
s = Float64x4((1.6009035362320234e-25, -8.470934410604026e-42, 4.900032439512285e-58, -3.05869844561302e-74))
x = Float64x4((-2.3017404993032726e-25, -1.8187505516645134e-41, 4.637866565216834e-58, -2.3866542964252726e-74))
y = Float64x4((1.4377758854357827, -9.834411896819007e-17, 4.676677591256931e-33, 7.260162680046171e-50))
cx = c * x
cx_big = big(c) * big(x)
println("Accurate bits in cx: ", round(-log2(abs(cx_big - cx) / abs(cx_big))))
sy = s * y
sy_big = big(s) * big(y)
println("Accurate bits in sy: ", round(-log2(abs(sy_big - sy) / abs(sy_big))))
cxpsy = cx + sy
cxpsy_big = big(cx) + big(sy)
println("Accurate bits in cx+sy: ", round(-log2(abs(cxpsy - cxpsy_big) / abs(cxpsy_big)))) You should get this output:
So, the pathology here is not in the value of
You can see that their first limbs are equal, and their second limbs are nearly equal. These values propagate through
After defining this function, you should see that
Here, the first and second components are unexpectedly small compared to the input operands, and this denormalization is not corrected by the default
I've designed I think the easiest way to fix this is to manually call In the next release of MultiFloats.jl, I've added an overload for If this is unacceptable for your application, I can also provide a |
*
/ +
*
A safer type |
@edljk Sure thing! While I'm working on For example, if you're having problems with |
A quick update for those following this thread: I've been doing a survey of the literature on extended-precision floating-point arithmetic, and it seems that solving these issues with destructive cancellation is quite a bit more involved than just renormalizing after every add/sub. With inspiration from the following papers:
I'm working on devising new Anyway, please be reassured that I'm actively working on this issue, but what I initially thought would be a quick fix has turned into an area of active research. |
After running the QR algorithm for a bunch of iterations, I'm hitting 99% of the cases numbers like these, which lose precision when multiplied:
Relative errors:
I think they're not normalized.
Example. Input matrix:
Apply a double shift of the QR algorithm, and move the "bulge" to the last two rows, it ends up looking like this:
Here accuracy is still fine:
But the last Given's rotation that zeros out the -3.0e-51 value is completely inaccurate:
That's probably because the numbers of which a rotation is computed are not normalized:
If I normalize them "by hand", it looks like accuracy is restored:
But... I can't write my algorithm like that.
How can I avoid loss of accuracy?
The text was updated successfully, but these errors were encountered: