14.6.Β Floating-Point NumbersπŸ”—

Floating-point numbers are a an approximation of the real numbers that are efficiently implemented in computer hardware. Computations that use floating-point numbers are very efficient; however, the nature of the way that they approximate the real numbers is complex, with many corner cases. The IEEE 754 standard, which defines the floating-point format that is used on modern computers, allows hardware designers to make certain choices, and real systems differ in these small details. For example, there are many distinct bit representations of NaN, the indicator that a result is undefined, and some platforms differ with respect to which NaN is returned from adding two NaNs.

Lean exposes the underlying platform's floating-point values for use in programming, but they are not encoded in Lean's logic. They are represented by an opaque type. This means that the kernel is not capable of computing with or reasoning about floating-point values without additional axioms. A consequence of this is that equality of floating-point numbers is not decidable. Furthermore, comparisons between floating-point values are decidable, but the code that does so is opaque; in practice, the decision procedure can only be used in compiled code.

Lean provides two floating-point types: Float represents 64-bit floating point values, while Float32 represents 32-bit floating point values. The precision of Float does not vary based on the platform that Lean is running on.

πŸ”—structure
Float : Type

Native floating point type, corresponding to the IEEE 754 binary64 format (double in C or f64 in Rust).

Constructor

Float.mk

Fields

val : floatSpec.float
πŸ”—structure
Float32 : Type

Native floating point type, corresponding to the IEEE 754 binary32 format (float in C or f32 in Rust).

Constructor

Float32.mk

Fields

val : float32Spec.float
No Kernel Reasoning About Floating-Point Numbers

The Lean kernel can compare expressions of type Float for syntactic equality, so 0.0 is definitionally equal to itself.

example : (0.0 : Float) = (0.0 : Float) := ⊒ 0.0 = 0.0 All goals completed! πŸ™

Terms that require reduction to become syntactically equal cannot be checked by the kernel:

example : (0.0 : Float) = (0.0 + 0.0 : Float) := ⊒ 0.0 = 0.0 + 0.0 ⊒ 0.0 = 0.0 + 0.0
tactic 'rfl' failed, the left-hand side
  0.0
is not definitionally equal to the right-hand side
  0.0 + 0.0
⊒ 0.0 = 0.0 + 0.0

Similarly, the kernel cannot evaluate Bool-valued comparisons of floating-point numbers while checking definitional equality:

theorem Float.zero_eq_zero_plus_zero : ((0.0 : Float) == (0.0 + 0.0 : Float)) = true := ⊒ (0.0 == 0.0 + 0.0) = true ⊒ (0.0 == 0.0 + 0.0) = true
tactic 'rfl' failed, the left-hand side
  0.0 == 0.0 + 0.0
is not definitionally equal to the right-hand side
  true
⊒ (0.0 == 0.0 + 0.0) = true

However, the native_decide tactic can invoke the underlying platform's floating-point primitives that are used by Lean for run-time programs:

theorem Float.zero_eq_zero_plus_zero : ((0.0 : Float) == (0.0 + 0.0 : Float)) = true := ⊒ (0.0 == 0.0 + 0.0) = true All goals completed! πŸ™

This tactic uses the axiom Lean.ofReduceBool, which states that the Lean compiler, interpreter and the low-level implementations of built-in operators are trusted in addition to the kernel.

'Float.zero_eq_zero_plus_zero' depends on axioms: [propext, Classical.choice, Lean.ofReduceBool]#print axioms Float.zero_eq_zero_plus_zero
'Float.zero_eq_zero_plus_zero' depends on axioms: [propext, Classical.choice, Lean.ofReduceBool]
Floating-Point Equality Is Not Reflexive

Floating-point operations may produce NaN values that indicate an undefined result. These values are not comparable with each other; in particular, all comparisons involving NaN will return false, including equality.

false#eval ((0.0 : Float) / 0.0) == ((0.0 : Float) / 0.0)
Floating-Point Equality Is Not a Congruence

Applying a function to two equal floating-point numbers may not result in equal numbers. In particular, positive and negative zero are distinct values that are equated by floating-point equality, but division by positive or negative zero yields positive or negative infinite values.

def neg0 : Float := -0.0 def pos0 : Float := 0.0 (true, false)#eval (neg0 == pos0, 1.0 / neg0 == 1.0 / pos0)
(true, false)

14.6.1.Β Syntax

Lean does not have dedicated floating-point literals. Instead, floating-point literals are resolved via the appropriate instances of the OfScientific and Neg type classes.

Floating-Point Literals

The term

(-2.523 : Float)

is syntactic sugar for

(Neg.neg (OfScientific.ofScientific 22523 true 4) : Float)

and the term

(413.52 : Float32)

is syntactic sugar for

(OfScientific.ofScientific 41352 true 2 : Float32)

14.6.2.Β API ReferenceπŸ”—

14.6.2.1.Β Properties

Floating-point numbers fall into one of three categories:

  • Finite numbers are ordinary floating-point values.

  • Infinities, which may be positive or negative, result from division by zero.

  • NaNs, which are not numbers, result from other undefined operations, such as the square root of a negative number.

πŸ”—opaque
Float.isInf : Float β†’ Bool
πŸ”—opaque
Float32.isInf : Float32 β†’ Bool
πŸ”—opaque
Float.isNaN : Float β†’ Bool
πŸ”—opaque
Float32.isNaN : Float32 β†’ Bool
πŸ”—opaque
Float.isFinite : Float β†’ Bool
πŸ”—opaque
Float32.isFinite : Float32 β†’ Bool

14.6.2.2.Β Syntax

These operations exist to support the OfScientific Float and OfScientific Float32 instances and are normally invoked indirectly as a result of a literal value.

πŸ”—opaque
Float.ofScientific (m : Nat) (s : Bool)
  (e : Nat) : Float
πŸ”—opaque
Float32.ofScientific (m : Nat) (s : Bool)
  (e : Nat) : Float32

14.6.2.3.Β Conversions

πŸ”—opaque
Float.toBits : Float β†’ UInt64

Raw transmutation to UInt64.

Floats and UInts have the same endianness on all supported platforms. IEEE 754 very precisely specifies the bit layout of floats.

Note that this function is distinct from Float.toUInt64, which attempts to preserve the numeric value, and not the bitwise value.

πŸ”—opaque
Float32.toBits : Float32 β†’ UInt32

Raw transmutation to UInt32.

Float32s and UInts have the same endianness on all supported platforms. IEEE 754 very precisely specifies the bit layout of floats.

Note that this function is distinct from Float32.toUInt32, which attempts to preserve the numeric value, and not the bitwise value.

πŸ”—opaque
Float.ofBits : UInt64 β†’ Float

Raw transmutation from UInt64.

Floats and UInts have the same endianness on all supported platforms. IEEE 754 very precisely specifies the bit layout of floats.

πŸ”—opaque
Float32.ofBits : UInt32 β†’ Float32

Raw transmutation from UInt32.

Float32s and UInts have the same endianness on all supported platforms. IEEE 754 very precisely specifies the bit layout of floats.

πŸ”—opaque
Float.toFloat32 : Float β†’ Float32
πŸ”—opaque
Float32.toFloat : Float32 β†’ Float
πŸ”—opaque
Float.toString : Float β†’ String
πŸ”—opaque
Float32.toString : Float32 β†’ String
πŸ”—opaque
Float.toUInt8 : Float β†’ UInt8

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt8 (including Inf), returns the maximum value of UInt8 (i.e. UInt8.size - 1).

πŸ”—opaque
Float.toInt8 : Float β†’ Int8

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int8 (including Inf), returns the maximum value of Int8 (i.e. Int8.maxValue). If smaller than the minimum value for Int8 (including -Inf), returns the minimum value of Int8 (i.e. Int8.minValue).

πŸ”—opaque
Float32.toUInt8 : Float32 β†’ UInt8

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt8 (including Inf), returns the maximum value of UInt8 (i.e. UInt8.size - 1).

πŸ”—opaque
Float32.toInt8 : Float32 β†’ Int8

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int8 (including Inf), returns the maximum value of Int8 (i.e. Int8.maxValue). If smaller than the minimum value for Int8 (including -Inf), returns the minimum value of Int8 (i.e. Int8.minValue).

πŸ”—opaque
Float.toUInt16 : Float β†’ UInt16

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt16 (including Inf), returns the maximum value of UInt16 (i.e. UInt16.size - 1).

πŸ”—opaque
Float.toInt16 : Float β†’ Int16

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int16 (including Inf), returns the maximum value of Int16 (i.e. Int16.maxValue). If smaller than the minimum value for Int16 (including -Inf), returns the minimum value of Int16 (i.e. Int16.minValue).

πŸ”—opaque
Float32.toUInt16 : Float32 β†’ UInt16

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt16 (including Inf), returns the maximum value of UInt16 (i.e. UInt16.size - 1).

πŸ”—opaque
Float32.toInt16 : Float32 β†’ Int16

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int16 (including Inf), returns the maximum value of Int16 (i.e. Int16.maxValue). If smaller than the minimum value for Int16 (including -Inf), returns the minimum value of Int16 (i.e. Int16.minValue).

πŸ”—opaque
Float.toUInt32 : Float β†’ UInt32

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt32 (including Inf), returns the maximum value of UInt32 (i.e. UInt32.size - 1).

πŸ”—opaque
Float32.toUInt32 : Float32 β†’ UInt32

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt32 (including Inf), returns the maximum value of UInt32 (i.e. UInt32.size - 1).

πŸ”—opaque
Float.toInt32 : Float β†’ Int32

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int32 (including Inf), returns the maximum value of Int32 (i.e. Int32.maxValue). If smaller than the minimum value for Int32 (including -Inf), returns the minimum value of Int32 (i.e. Int32.minValue).

πŸ”—opaque
Float32.toInt32 : Float32 β†’ Int32

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int32 (including Inf), returns the maximum value of Int32 (i.e. Int32.maxValue). If smaller than the minimum value for Int32 (including -Inf), returns the minimum value of Int32 (i.e. Int32.minValue).

πŸ”—opaque
Float.toUInt64 : Float β†’ UInt64

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt64 (including Inf), returns the maximum value of UInt64 (i.e. UInt64.size - 1).

πŸ”—opaque
Float.toInt64 : Float β†’ Int64

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int64 (including Inf), returns the maximum value of Int64 (i.e. Int64.maxValue). If smaller than the minimum value for Int64 (including -Inf), returns the minimum value of Int64 (i.e. Int64.minValue).

πŸ”—opaque
Float32.toUInt64 : Float32 β†’ UInt64

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for UInt64 (including Inf), returns the maximum value of UInt64 (i.e. UInt64.size - 1).

πŸ”—opaque
Float32.toInt64 : Float32 β†’ Int64

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for Int64 (including Inf), returns the maximum value of Int64 (i.e. Int64.maxValue). If smaller than the minimum value for Int64 (including -Inf), returns the minimum value of Int64 (i.e. Int64.minValue).

πŸ”—opaque
Float.toUSize : Float β†’ USize

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for USize (including Inf), returns the maximum value of USize (i.e. USize.size - 1). This value is platform dependent).

πŸ”—opaque
Float32.toUSize : Float32 β†’ USize

If the given float is non-negative, truncates the value to the nearest non-negative integer. If negative or NaN, returns 0. If larger than the maximum value for USize (including Inf), returns the maximum value of USize (i.e. USize.size - 1). This value is platform dependent).

πŸ”—opaque
Float.toISize : Float β†’ ISize

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for ISize (including Inf), returns the maximum value of ISize (i.e. ISize.maxValue). If smaller than the minimum value for ISize (including -Inf), returns the minimum value of ISize (i.e. ISize.minValue).

πŸ”—opaque
Float32.toISize : Float32 β†’ ISize

Truncates the value to the nearest integer, rounding towards zero. If NaN, returns 0. If larger than the maximum value for ISize (including Inf), returns the maximum value of ISize (i.e. ISize.maxValue). If smaller than the minimum value for ISize (including -Inf), returns the minimum value of ISize (i.e. ISize.minValue).

πŸ”—def
Float.ofInt : Int β†’ Float
πŸ”—def
Float32.ofInt : Int β†’ Float
πŸ”—def
Float.ofNat (n : Nat) : Float
πŸ”—def
Float32.ofNat (n : Nat) : Float32
πŸ”—def
Float.ofBinaryScientific (m : Nat) (e : Int) :
  Float

Computes m * 2^e.

πŸ”—def
Float32.ofBinaryScientific (m : Nat) (e : Int) :
  Float32

Computes m * 2^e.

πŸ”—opaque
Float.frExp : Float β†’ Float Γ— Int

Splits the given float x into a significand/exponent pair (s, i) such that x = s * 2^i where s ∈ (-1;-0.5] βˆͺ [0.5; 1). Returns an undefined value if x is not finite.

πŸ”—opaque
Float32.frExp : Float32 β†’ Float32 Γ— Int

Splits the given float x into a significand/exponent pair (s, i) such that x = s * 2^i where s ∈ (-1;-0.5] βˆͺ [0.5; 1). Returns an undefined value if x is not finite.

14.6.2.4.Β Comparisons

πŸ”—opaque
Float.beq (a b : Float) : Bool

Note: this is not reflexive since NaN != NaN.

πŸ”—opaque
Float32.beq (a b : Float32) : Bool

Note: this is not reflexive since NaN != NaN.

14.6.2.4.1.Β Inequalities

The decision procedures for inequalities are opaque constants in the logic. They can only be used via the Lean.ofReduceBool axiom, e.g. via the native_decide tactic.

πŸ”—def
Float.le : Float β†’ Float β†’ Prop
πŸ”—def
Float32.le : Float32 β†’ Float32 β†’ Prop
πŸ”—def
Float.lt : Float β†’ Float β†’ Prop
πŸ”—def
Float32.lt : Float32 β†’ Float32 β†’ Prop
πŸ”—opaque
Float.decLe (a b : Float) : Decidable (a ≀ b)
πŸ”—opaque
Float32.decLe (a b : Float32) :
  Decidable (a ≀ b)
πŸ”—opaque
Float.decLt (a b : Float) : Decidable (a < b)
πŸ”—opaque
Float32.decLt (a b : Float32) :
  Decidable (a < b)

14.6.2.5.Β Arithmetic

Arithmetic operations on floating-point values are typically invoked via the Add Float, Sub Float, Mul Float, Div Float, and HomogeneousPow Float instances, along with the corresponding Float32 instances.

πŸ”—opaque
Float.add : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.add : Float32 β†’ Float32 β†’ Float32
πŸ”—opaque
Float.sub : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.sub : Float32 β†’ Float32 β†’ Float32
πŸ”—opaque
Float.mul : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.mul : Float32 β†’ Float32 β†’ Float32
πŸ”—opaque
Float.div : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.div : Float32 β†’ Float32 β†’ Float32
πŸ”—opaque
Float.pow : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.pow : Float32 β†’ Float32 β†’ Float32
πŸ”—opaque
Float.exp : Float β†’ Float
πŸ”—opaque
Float32.exp : Float32 β†’ Float32
πŸ”—opaque
Float.exp2 : Float β†’ Float
πŸ”—opaque
Float32.exp2 : Float32 β†’ Float32

14.6.2.5.1.Β Roots

Computing the square root of a negative number yields NaN.

πŸ”—opaque
Float.sqrt : Float β†’ Float
πŸ”—opaque
Float32.sqrt : Float32 β†’ Float32
πŸ”—opaque
Float.cbrt : Float β†’ Float
πŸ”—opaque
Float32.cbrt : Float32 β†’ Float32

14.6.2.6.Β Logarithms

πŸ”—opaque
Float.log : Float β†’ Float
πŸ”—opaque
Float32.log : Float32 β†’ Float32
πŸ”—opaque
Float.log10 : Float β†’ Float
πŸ”—opaque
Float32.log10 : Float32 β†’ Float32
πŸ”—opaque
Float.log2 : Float β†’ Float
πŸ”—opaque
Float32.log2 : Float32 β†’ Float32

14.6.2.7.Β Scaling

πŸ”—opaque
Float.scaleB (x : Float) (i : Int) : Float

Efficiently computes x * 2^i.

πŸ”—opaque
Float32.scaleB (x : Float32) (i : Int) : Float32

Efficiently computes x * 2^i.

14.6.2.8.Β Rounding

πŸ”—opaque
Float.round : Float β†’ Float
πŸ”—opaque
Float32.round : Float32 β†’ Float32
πŸ”—opaque
Float.floor : Float β†’ Float
πŸ”—opaque
Float32.floor : Float32 β†’ Float32
πŸ”—opaque
Float.ceil : Float β†’ Float
πŸ”—opaque
Float32.ceil : Float32 β†’ Float32

14.6.2.9.Β Trigonometry

14.6.2.9.1.Β Sine

πŸ”—opaque
Float.sin : Float β†’ Float
πŸ”—opaque
Float32.sin : Float32 β†’ Float32
πŸ”—opaque
Float.sinh : Float β†’ Float
πŸ”—opaque
Float32.sinh : Float32 β†’ Float32
πŸ”—opaque
Float.asin : Float β†’ Float
πŸ”—opaque
Float32.asin : Float32 β†’ Float32
πŸ”—opaque
Float.asinh : Float β†’ Float
πŸ”—opaque
Float32.asinh : Float32 β†’ Float32

14.6.2.9.2.Β Cosine

πŸ”—opaque
Float.cos : Float β†’ Float
πŸ”—opaque
Float32.cos : Float32 β†’ Float32
πŸ”—opaque
Float.cosh : Float β†’ Float
πŸ”—opaque
Float32.cosh : Float32 β†’ Float32
πŸ”—opaque
Float.acos : Float β†’ Float
πŸ”—opaque
Float32.acos : Float32 β†’ Float32
πŸ”—opaque
Float.acosh : Float β†’ Float
πŸ”—opaque
Float32.acosh : Float32 β†’ Float32

14.6.2.9.3.Β Tangent

πŸ”—opaque
Float.tan : Float β†’ Float
πŸ”—opaque
Float32.tan : Float32 β†’ Float32
πŸ”—opaque
Float.tanh : Float β†’ Float
πŸ”—opaque
Float32.tanh : Float32 β†’ Float32
πŸ”—opaque
Float.atan : Float β†’ Float
πŸ”—opaque
Float32.atan : Float32 β†’ Float32
πŸ”—opaque
Float.atanh : Float β†’ Float
πŸ”—opaque
Float32.atanh : Float32 β†’ Float32
πŸ”—opaque
Float.atan2 : Float β†’ Float β†’ Float
πŸ”—opaque
Float32.atan2 : Float32 β†’ Float32 β†’ Float32

14.6.2.10.Β Negation and Absolute Value

πŸ”—opaque
Float.abs : Float β†’ Float
πŸ”—opaque
Float32.abs : Float32 β†’ Float32
πŸ”—opaque
Float.neg : Float β†’ Float
πŸ”—opaque
Float32.neg : Float32 β†’ Float32