19.7. Characters🔗

Characters are represented by the type Char, which may be any Unicode scalar value. While strings are UTF-8-encoded arrays of bytes, characters are represented by full 32-bit values. Lean provides special syntax for character literals.

19.7.1. Logical Model🔗

From the perspective of Lean's logic, characters consist of a 32-bit unsigned integer paired with a proof that it is a valid Unicode scalar value.

structure

Char : Type
Char : Type

Characters are Unicode scalar values.

Constructor

Char.mk

Fields

val : UInt32

The underlying Unicode scalar value as a UInt32.

valid : self.val.isValidChar

The value must be a legal scalar value.

19.7.2. Run-Time Representation🔗

As a trivial wrapper, characters are represented identically to UInt32. In particular, characters are represented as 32-bit immediate values in monomorphic contexts. In other words, a field of a constructor or structure of type Char does not require indirection to access. In polymorphic contexts, characters are boxed.

19.7.3. Syntax🔗

Character literals consist of a single character or an escape sequence enclosed in single quotes (', Unicode 'APOSTROPHE' (U+0027)). Between these single quotes, the character literal may contain character other that ', including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n', regardless of file encoding and platform). Special characters may be escaped with a backslash, so '\'' is a character literal that contains a single quote. The following forms of escape sequences are accepted:

\r, \n, \t, \\, \", \': These escape sequences have the usual meaning, mapping to CR, LF, tab, backslash, double quote, and single quote, respectively.
\xNN: When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the two-digit hexadecimal code.
\uNNNN: When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the four-digit hexadecimal code.

19.7.4. API Reference🔗

19.7.4.1. Conversions

def

Char.ofNat (n : Nat) : Char
Char.ofNat (n : Nat) : Char

Converts a Nat into a Char. If the Nat does not encode a valid Unicode scalar value, '\0' is returned instead.

def

Char.toNat (c : Char) : Nat
Char.toNat (c : Char) : Nat

The character's Unicode code point as a Nat.

def

Char.isValidCharNat (n : Nat) : Prop
Char.isValidCharNat (n : Nat) : Prop

True for natural numbers that are valid Unicode scalar values.

def

Char.ofUInt8 (n : UInt8) : Char
Char.ofUInt8 (n : UInt8) : Char

Converts an 8-bit unsigned integer into a character.

The integer's value is interpreted as a Unicode code point.

def

Char.toUInt8 (c : Char) : UInt8
Char.toUInt8 (c : Char) : UInt8

Converts a character into a UInt8 that contains its code point.

If the code point is larger than 255, it is truncated (reduced modulo 256).

There are two ways to convert a character to a string. Char.toString converts a character to a singleton string that consists of only that character, while Char.quote converts the character to a string representation of the corresponding character literal.

def

Char.toString (c : Char) : String
Char.toString (c : Char) : String

Constructs a singleton string that contains only the provided character.

Examples:

'L'.toString = "L"
'"'.toString = "\""

def

Char.quote (c : Char) : String
Char.quote (c : Char) : String

Quotes the character to its representation as a character literal, surrounded by single quotes and escaped as necessary.

Examples:

'L'.quote = "'L'"
'"'.quote = "'\\\"'"

From Characters to Strings

Char.toString produces a string that contains only the character in question:

"e"#eval 'e'.toString

"e"

"e"#eval '\x65'.toString

"e"

"\""#eval '"'.toString

"\""

Char.quote produces a string that contains a character literal, suitably escaped:

"'e'"#eval 'e'.quote

"'e'"

"'e'"#eval '\x65'.quote

"'e'"

"'\\\"'"#eval '"'.quote

"'\\\"'"

19.7.4.2. Character Classes🔗

def

Char.isAlpha (c : Char) : Bool
Char.isAlpha (c : Char) : Bool

Returns true if the character is an ASCII letter.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

def

Char.isAlphanum (c : Char) : Bool
Char.isAlphanum (c : Char) : Bool

Returns true if the character is an ASCII letter or digit.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. The ASCII digits are the following: 0123456789.

def

Char.isDigit (c : Char) : Bool
Char.isDigit (c : Char) : Bool

Returns true if the character is an ASCII digit.

The ASCII digits are the following: 0123456789.

def

Char.isLower (c : Char) : Bool
Char.isLower (c : Char) : Bool

Returns true if the character is a lowercase ASCII letter.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

def

Char.isUpper (c : Char) : Bool
Char.isUpper (c : Char) : Bool

Returns true if the character is a uppercase ASCII letter.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

def

Char.isWhitespace (c : Char) : Bool
Char.isWhitespace (c : Char) : Bool

Returns true if the character is a space (' ', U+0020), a tab ('\t', U+0009), a carriage return ('\r', U+000D), or a newline ('\n', U+000A).

19.7.4.3. Case Conversion

def

Char.toUpper (c : Char) : Char
Char.toUpper (c : Char) : Char

Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII alphabet are returned unchanged.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

def

Char.toLower (c : Char) : Char
Char.toLower (c : Char) : Char

Converts an uppercase ASCII letter to the corresponding lowercase letter. Letters outside the ASCII alphabet are returned unchanged.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

19.7.4.4. Comparisons

def

Char.le (a b : Char) : Prop
Char.le (a b : Char) : Prop

One character is less than or equal to another if its code point is less than or equal to the other's.

def

Char.lt (a b : Char) : Prop
Char.lt (a b : Char) : Prop

One character is less than another if its code point is strictly less than the other's.

19.7.4.5. Unicode

def

Char.utf8Size (c : Char) : Nat
Char.utf8Size (c : Char) : Nat

Returns the number of bytes required to encode this Char in UTF-8.

def

Char.utf16Size (c : Char) : UInt32
Char.utf16Size (c : Char) : UInt32

Returns the number of bytes required to encode this Char in UTF-16.