Floating-point numbers are a an approximation of the real numbers that are efficiently implemented in computer hardware.
Computations that use floating-point numbers are very efficient; however, the nature of the way that they approximate the real numbers is complex, with many corner cases.
The IEEE 754 standard, which defines the floating-point format that is used on modern computers, allows hardware designers to make certain choices, and real systems differ in these small details.
For example, there are many distinct bit representations of NaN, the indicator that a result is undefined, and some platforms differ with respect to whichNaN is returned from adding two NaNs.
Lean exposes the underlying platform's floating-point values for use in programming, but they are not encoded in Lean's logic.
They are represented by an opaque type.
This means that the kernel is not capable of computing with or reasoning about floating-point values without additional axioms.
A consequence of this is that equality of floating-point numbers is not decidable.
Furthermore, comparisons between floating-point values are decidable, but the code that does so is opaque; in practice, the decision procedure can only be used in compiled code.
Lean provides two floating-point types: Float represents 64-bit floating point values, while Float32 represents 32-bit floating point values.
The precision of Float does not vary based on the platform that Lean is running on.
Float corresponds to the IEEE 754 binary64 format (double in C or f64 in Rust).
Floating-point numbers are a finite representation of a subset of the real numbers, extended with
extra βsentinelβ values that represent undefined and infinite results as well as separate positive
and negative zeroes. Arithmetic on floating-point numbers approximates the corresponding operations
on the real numbers by rounding the results to numbers that are representable, propagating error and
infinite values.
Floating-point numbers include subnormal numbers.
Their special values are:
NaN, which denotes a class of βnot a numberβ values that result from operations such as
dividing zero by zero, and
Inf and -Inf, which represent positive and infinities that result from dividing non-zero
values by zero.
Float32 corresponds to the IEEE 754 binary32 format (float in C or f32 in Rust).
Floating-point numbers are a finite representation of a subset of the real numbers, extended with
extra βsentinelβ values that represent undefined and infinite results as well as separate positive
and negative zeroes. Arithmetic on floating-point numbers approximates the corresponding operations
on the real numbers by rounding the results to numbers that are representable, propagating error and
infinite values.
Floating-point numbers include subnormal numbers.
Their special values are:
NaN, which denotes a class of βnot a numberβ values that result from operations such as
dividing zero by zero, and
Inf and -Inf, which represent positive and infinities that result from dividing non-zero
values by zero.
No Kernel Reasoning About Floating-Point Numbers
The Lean kernel can compare expressions of type Float for syntactic equality, so 0.0 is definitionally equal to itself.
This tactic uses the axiom Lean.ofReduceBool, which states that the Lean compiler, interpreter and the low-level implementations of built-in operators are trusted in addition to the kernel.
'Float.zero_eq_zero_plus_zero' depends on axioms: [propext, Classical.choice, Lean.ofReduceBool]#printaxiomsFloat.zero_eq_zero_plus_zero
'Float.zero_eq_zero_plus_zero' depends on axioms: [propext, Classical.choice, Lean.ofReduceBool]
Floating-Point Equality Is Not Reflexive
Floating-point operations may produce NaN values that indicate an undefined result.
These values are not comparable with each other; in particular, all comparisons involving NaN will return false, including equality.
Applying a function to two equal floating-point numbers may not result in equal numbers.
In particular, positive and negative zero are distinct values that are equated by floating-point equality, but division by positive or negative zero yields positive or negative infinite values.
Lean does not have dedicated floating-point literals.
Instead, floating-point literals are resolved via the appropriate instances of the OfScientific and Neg type classes.
Checks whether a floating-point number is finite, that is, whether it is normal, subnormal, or zero,
but not infinite or NaN.
This function does not reduce in the kernel. It is compiled to the C operator isfinite.
18.6.2.2.Β Syntax
These operations exist to support the OfScientificFloat and OfScientificFloat32 instances and are normally invoked indirectly as a result of a literal value.
Converts a floating-point number to an 8-bit unsigned integer.
If the given Float is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt8. Returns 0 if the Float is negative or NaN, and returns the
largest UInt8 value (i.e. UInt8.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 8-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int8 (including Inf), returns the maximum value of
Int8 (i.e. Int8.maxValue). If it is smaller than the minimum value for Int8 (including -Inf),
returns the minimum value of Int8 (i.e. Int8.minValue). If it is NaN, returns 0.
Converts a floating-point number to an 8-bit unsigned integer.
If the given Float32 is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt8. Returns 0 if the Float32 is negative or NaN, and returns the
largest UInt8 value (i.e. UInt8.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 8-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int8 (including Inf), returns the maximum value of
Int8 (i.e. Int8.maxValue). If it is smaller than the minimum value for Int8 (including -Inf),
returns the minimum value of Int8 (i.e. Int8.minValue). If it is NaN, returns 0.
Converts a floating-point number to a 16-bit unsigned integer.
If the given Float is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt16. Returns 0 if the Float is negative or NaN, and returns the
largest UInt16 value (i.e. UInt16.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 16-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int16 (including Inf), returns the maximum
value of Int16 (i.e. Int16.maxValue). If it is smaller than the minimum value for Int16
(including -Inf), returns the minimum value of Int16 (i.e. Int16.minValue). If it is NaN,
returns 0.
Converts a floating-point number to a 16-bit unsigned integer.
If the given Float32 is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt16. Returns 0 if the Float32 is negative or NaN, and returns
the largest UInt16 value (i.e. UInt16.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 16-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int16 (including Inf), returns the maximum
value of Int16 (i.e. Int16.maxValue). If it is smaller than the minimum value for Int16
(including -Inf), returns the minimum value of Int16 (i.e. Int16.minValue). If it is NaN,
returns 0.
Converts a floating-point number to a 32-bit unsigned integer.
If the given Float is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt32. Returns 0 if the Float is negative or NaN, and returns the
largest UInt32 value (i.e. UInt32.size-1) if the float is larger than it.
Converts a floating-point number to a 32-bit unsigned integer.
If the given Float32 is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt32. Returns 0 if the Float32 is negative or NaN, and returns
the largest UInt32 value (i.e. UInt32.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 32-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int32 (including Inf), returns the maximum
value of Int32 (i.e. Int32.maxValue). If it is smaller than the minimum value for Int32
(including -Inf), returns the minimum value of Int32 (i.e. Int32.minValue). If it is NaN,
returns 0.
Truncates a floating-point number to the nearest 32-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int32 (including Inf), returns the maximum
value of Int32 (i.e. Int32.maxValue). If it is smaller than the minimum value for Int32
(including -Inf), returns the minimum value of Int32 (i.e. Int32.minValue). If it is NaN,
returns 0.
Converts a floating-point number to a 64-bit unsigned integer.
If the given Float is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt64. Returns 0 if the Float is negative or NaN, and returns the
largest UInt64 value (i.e. UInt64.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 64-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int64 (including Inf), returns the maximum
value of Int64 (i.e. Int64.maxValue). If it is smaller than the minimum value for Int64
(including -Inf), returns the minimum value of Int64 (i.e. Int64.minValue). If it is NaN,
returns 0.
Converts a floating-point number to a 64-bit unsigned integer.
If the given Float32 is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of UInt64. Returns 0 if the Float32 is negative or NaN, and returns
the largest UInt64 value (i.e. UInt64.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest 64-bit signed integer, rounding towards zero.
If the Float is larger than the maximum value for Int64 (including Inf), returns the maximum
value of Int64 (i.e. Int64.maxValue). If it is smaller than the minimum value for Int64
(including -Inf), returns the minimum value of Int64 (i.e. Int64.minValue). If it is NaN,
returns 0.
Converts a floating-point number to a word-sized unsigned integer.
If the given Float is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of USize. Returns 0 if the Float is negative or NaN, and returns the
largest USize value (i.e. USize.size-1) if the float is larger than it.
Converts a floating-point number to a word-sized unsigned integer.
If the given Float32 is non-negative, truncates the value to a positive integer, rounding down and
clamping to the range of USize. Returns 0 if the Float32 is negative or NaN, and returns the
largest USize value (i.e. USize.size-1) if the float is larger than it.
Truncates a floating-point number to the nearest word-sized signed integer, rounding towards zero.
If the Float is larger than the maximum value for ISize (including Inf), returns the maximum
value of ISize (i.e. ISize.maxValue). If it is smaller than the minimum value for ISize
(including -Inf), returns the minimum value of ISize (i.e. ISize.minValue). If it is NaN,
returns 0.
Truncates a floating-point number to the nearest word-sized signed integer, rounding towards zero.
If the Float is larger than the maximum value for ISize (including Inf), returns the maximum
value of ISize (i.e. ISize.maxValue). If it is smaller than the minimum value for ISize
(including -Inf), returns the minimum value of ISize (i.e. ISize.minValue). If it is NaN,
returns 0.
Converts an integer into the closest-possible 64-bit floating-point number, or positive or negative
infinite floating-point value if the range of Float is exceeded.
Converts an integer into the closest-possible 32-bit floating-point number, or positive or negative
infinite floating-point value if the range of Float32 is exceeded.
Converts a natural number into the closest-possible 64-bit floating-point number, or an infinite
floating-point value if the range of Float is exceeded.
Converts a natural number into the closest-possible 32-bit floating-point number, or an infinite
floating-point value if the range of Float32 is exceeded.
Splits the given float x into a significand/exponent pair (s,i) such that x=s*2^i where
sβ(-1;-0.5]βͺ[0.5;1). Returns an undefined value if x is not finite.
This function does not reduce in the kernel. It is implemented in compiled code by the C function
frexp.
Splits the given float x into a significand/exponent pair (s,i) such that x=s*2^i where
sβ(-1;-0.5]βͺ[0.5;1). Returns an undefined value if x is not finite.
This function does not reduce in the kernel. It is implemented in compiled code by the C function
frexp.
Checks whether two floating-point numbers are equal according to IEEE 754.
Floating-point equality does not correspond with propositional equality. In particular, it is not
reflexive since NaN!=NaN, and it is not a congruence because 0.0==-0.0, but
1.0/0.0!=1.0/-0.0.
This function does not reduce in the kernel. It is compiled to the C equality operator.
Checks whether two floating-point numbers are equal according to IEEE 754.
Floating-point equality does not correspond with propositional equality. In particular, it is not
reflexive since NaN!=NaN, and it is not a congruence because 0.0==-0.0, but
1.0/0.0!=1.0/-0.0.
This function does not reduce in the kernel. It is compiled to the C equality operator.
18.6.2.4.1.Β Inequalities
The decision procedures for inequalities are opaque constants in the logic.
They can only be used via the Lean.ofReduceBool axiom, e.g. via the native_decide tactic.