IEEE precisions

This is a document describing useful information about IEEE 754 floating point standard.

Round-off error

The error introduced depends on the number of bits used for the significand s, and the rounding mode used:

with round-to-nearest mode, an additional bit is added and the error is 2^-(s+1)
with round-to-zero mode, the error is 2^-s

Range

The range depends on the number of exponent bits e. The bits encode an unsigned integer u, and the fixed-point base 2 number 1.s...s (where s...s is the significand) is multiplied by 2^u-bias, where bias = 2^e-1 - 1. The largest and the smallest representable values of u are reserved for infinity and denormalized numbers, so the largest possible exponent is reached for u = 2^e - 2, thus u - bias = 2^e - 2^e-1 - 1 = 2^e-1 - 1. Which means that the largest representable value is:

max(e, s) = 2^{2^e-1 - 1} * 1.1...1 = 2^{2^e-1 - 1} * (2 - 2^-s)

The smallest exponent is reached for u = 1, so u - bias = 1 - 2^e-1 + 1 = -(2^e-1 - 2). Thus, the smallest (normalized) representable value is reached for:

min(e, s) = 2^{-(2^e-1 - 2)} * 1.0...0 = 2^{-(2^e-1 - 2)}

Table of useful properties

name	#bits	e	s	R2N error	R2n digits	R2Z error	R2Z digits	min	max
double	64	11	52	1.11e-16	15.95	2.22e-16	15.65	2.23e-308	1.80e+308
	32	11	20	4.77e-7	6.32	9.54e-7	6.02	2.23e-308	1.80e+308
	16	11	4	0.03125	1.51	0.0625	1.20	2.23e-308	1.74e+308
single	32	8	23	5.96e-8	7.22	1.19e-7	6.92	1.18e-38	3.40e+38
	16	8	7	0.00391	2.41	0.0078125	2.11	1.18e-38	3.39e+38
half	16	5	10	0.00048828125	3.31	0.0009765625	3.01	6.10e-5	6.55e+4

Ginkgo Library

Home

Changelog

Tutorial: Building a Poisson Solver

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IEEE precisions

Round-off error

Range

Table of useful properties

Clone this wiki locally