Draft floating point Boogie language extension

DRAFT: Floating point type support in the Boogie IVL

Summary

This wiki details the proposed syntax for the floating-point extension for Boogie. The float type proposed is the IEEE-754 floating point, a standard which can be found here.

The proposed floating point syntax is based on the SMT-LIB standard, a summary of which can be found here.

The original proposal and discussion history for this extension can be found here. The syntax for floating-point constants was later updated in this pull request.

Type declarations

A floating point type with 'sig' significand bits (including the hidden bit) and 'exp' exponent bits can be declared using the syntax:

float[sig]e[exp]

For example, a variable 'name' with type equivalent to IEEE-binary32 could be declared with:

var name : float24e8;

Constant declarations

A floating point constant can be declared with the following syntax:

(-)0x[sig]e[exp]f[sigSize]e[expSize]

In the above line:

sig = hexdigit {hexdigit} '.' hexdigit {hexdigit}

exp = digit {digit}

sigSize = digit {digit}

expSize = digit {digit}

A floating-point number written using the above syntax is equivalent to the value (-)(sig * 16^exp). As long as the the equivalent value fits in a floating-point variable with 'sigSize' significand bits and 'expSize' exponent bits, there are no restrictions on the values of 'sig' and 'exp'. This allows the same number to be written in multiple ways. For example, the number 10 may be written as 0xA0.0e-1f24e8, 0xA.0e0f24e8, 0x0.Ae1f24e8, 0x0.0Ae2f24e8, etc.

As another example, we can assign the constant -2.25 to var 'name' by writing:

var name : float24e8;
name := -0x2.4e0f24e8;

There are several constants that must be declared using a special syntax:

Value	Declaration
NaN	`0NaN[sigSize]e[expSize]`
+infinity	`0+oo[sigSize]e[expSize]`
-infinity	`0-oo[sigSize]e[expSize]`

It is disallowed to use synonyms of these special values (for example, 0x1.0e32f24e8 is equivalent to +infinity but is a syntax error).

Built-in operators

The following operators are built into Boogie with the following syntax, where x and y are floating point values with identical exponent and significand sizes. Note that each of these operators is assumed to use the round to nearest (RNE) rounding mode. For an explanation on how to use different rounding modes, see the section "SMT-LIBv2 operators via 'builtin' attribute." Note also that the data type returned by each operation is listed in the third column; when a function returns data of type float, that return value is assumed to have the same exponent and significand bit sizes as x and y. Boogie operator == is not overloaded to IEEE 754-2008 equality which is captured via the built-in function fp.eq.

Operator	Boogie Equivalent	Return Type
fp.neg(x)	-x	float
fp.add(x, y)	x + y	float
fp.sub(x, y)	x - y	float
fp.mul(x, y)	x * y	float
fp.div(x, y)	x / y	float
fp.leq(x, y)	x <= y	boolean
fp.lt(x, y)	x < y	boolean
fp.geq(x, y)	x >= y	boolean
fp.gt(x, y)	x > y	boolean

Rounding mode

The rmode type allows the ability to store one of the five below rounding modes:

Rounding Mode	Acronym
roundNearestTiesToEven	RNE
roundNearestTiesToAway	RNA
roundTowardPositive	RTP
roundTowardNegative	RTN
roundTowardZero	RTZ

For example, the Round Toward Positive rounding mode may be assigned to var 'mode' by writing:

var mode: rmode;
mode := RTP;

Or, alternatively, the full name may be used:

var mode: rmode;
mode := roundTowardPositive;

SMT-LIBv2 operators via "builtin" attribute

Some of the operators listed in the "Built-in operators" section can be modified to use one of the five rounding modes listed in the "Rounding mode" section.

This is done by creating a new function with 'function_name' that takes in type arguments 'args', along with an rmode variable, and returns a value of type 'return_type', with the syntax below:

function {:builtin "fp.op"} 'function_name'(rmode, args) returns(return_type);

For example, the following code demonstrates how to declare the "fp.add" operation and use it with both the RTZ and RTN rounding modes:

function {:builtin "fp.add"} FP_ADD_FLOAT32(rmode, float24e8, float24e8) returns(float24e8);

.
.
.
.

var x : float24e8;
var y : float24e8;
var z : float24e8;

z := FP_ADD_FLOAT32(RTZ, x, y);
z := FP_ADD_FLOAT32(RTN, x, y);

Or, alternatively:

function {:builtin "fp.add"} FP_ADD_FLOAT32(rmode, float24e8, float24e8) returns(float24e8);

.
.
.
.

var mode : rmode;
var x : float24e8;
var y : float24e8;
var z : float24e8;

mode := RTZ;
z := FP_ADD_FLOAT32(mode, x, y);
mode := RTN;
z := FP_ADD_FLOAT32(mode, x, y);

The "builtin" attribute can also be used to generate the floating point operators listed below. Descriptions of these functions can be found in the SMT-LIB standard. Note that 'x', 'y', and 'z' are all floating points with equal exponent and significand bit sizes. Note also that when functions return a value of type float, that value is assumed to have the same exponent and significand bit sizes as the arguments to the function.

Operators	Return Type
fp.abs(x)	float
fp.fma(x, y, z)	float
fp.sqrt(x)	float
fp.rem(x, y)	float
fp.roundToIntegral(x)	float
fp.min(x, y)	float
fp.max(x, y)	float
fp.isNormal(x)	boolean
fp.isSubnormal(x)	boolean
fp.isZero(x)	boolean
fp.isInfinite(x)	boolean
fp.isNaN(x)	boolean
fp.isNegative(x)	boolean
fp.isPositive(x)	boolean

The syntax for creating functions for these operators is identical to the syntax for creating functions for the built-in operators.

Alternatively, the above operators (and the built-in operators) may have functions declared that do not accept an "rmode" variable, instead conforming to a specific rounding mode that is specified as part of the "builtin" attribute. For example:

function {:builtin "fp.sqrt RNA"} FP_SQRT_FLOAT32_RNA(float24e8) returns(float24e8);
function {:builtin "fp.sqrt RTP"} FP_SQRT_FLOAT32_RTP(float24e8) returns(float24e8);
.
.
.
.

var x : float24e8;
x := FP_SQRT_FLOAT32_RNA(x); //want to use RNA rounding mode
x := FP_SQRT_FLOAT32_RTP(x); //want to use RTP rounding mode

Type conversion

Conversion to and from floating point values in Boogie is also done using the "builtin" attribute. The following conversions are well defined, where the argument given to an operator is the data type given to the function. Note that when float is given as an argument or return type, the number of exponent and significand bits are defined as part of the function declaration. Also note that a rounding mode may be defined for each conversion to a floating point type:

Operator	Return Type
to_fp(real)	float
to_fp(float)	float
to_fp(bit_vec)*	float
to_fp_unsigned(bit_vec)	float
fp.to_ubv(float)	bit_vec
fp.to_sbv(float)	bit_vec
fp.to_real(float)	real

*When converting from a bit vector to a floating point value using this builtin definition, the bitvector must be the same size as the resulting floating point value and, in particular, the first 'exp' bits of the bitvector become the exponent bits of the resulting floating point while the remaining bits become the significand of the resulting floating point. To treat the given bitvector as a signed two's complement integer, a special attribute 'ai' must be given with the value "True." For example, to convert a size 16 bitvector to a IEEE-binary16 floating point value directly, one could declare the following function:

function {:builtin "(_ to_fp 5 11)" } TO_FLOAT16_BITVECTOR(rmode, bv16) returns(float11e5);

While to convert a bitvector representing signed two's complement integer to a IEEE-binary16 floating point using the "RNA" rounding mode, one could declare the following function:

function {:builtin "(_ to_fp 5 11) RNA" :ai "True" } TO_FLOAT16_SINTEGER_RNA(rmode, bv16) returns(float11e5);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly