The Lean Language Reference

18.7. Characters🔗

Characters are represented by the type Char, which may be any Unicode scalar value. While strings are UTF-8-encoded arrays of bytes, characters are represented by full 32-bit values. Lean provides special syntax for character literals.

18.7.1. Logical Model🔗

From the perspective of Lean's logic, characters consist of a 32-bit unsigned integer paired with a proof that it is a valid Unicode scalar value.

🔗structure
Char : Type

Characters are Unicode scalar values.

Constructor

Char.mk

Fields

val : UInt32

The underlying Unicode scalar value as a UInt32.

valid : self.val.isValidChar

The value must be a legal scalar value.

18.7.2. Run-Time Representation🔗

As a trivial wrapper, characters are represented identically to UInt32. In particular, characters are represented as 32-bit immediate values in monomorphic contexts. In other words, a field of a constructor or structure of type Char does not require indirection to access. In polymorphic contexts, characters are boxed.

18.7.3. Syntax🔗

Character literals consist of a single character or an escape sequence enclosed in single quotes (', Unicode 'APOSTROPHE' (U+0027)). Between these single quotes, the character literal may contain character other that ', including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n', regardless of file encoding and platform). Special characters may be escaped with a backslash, so '\'' is a character literal that contains a single quote. The following forms of escape sequences are accepted:

\r, \n, \t, \\, \", \'

These escape sequences have the usual meaning, mapping to CR, LF, tab, backslash, double quote, and single quote, respectively.

\xNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the two-digit hexadecimal code.

\uNNNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the four-digit hexadecimal code.

18.7.4. API Reference🔗

18.7.4.1. Conversions

🔗def
Char.ofNat (n : Nat) : Char

Converts a Nat into a Char. If the Nat does not encode a valid Unicode scalar value, '\0' is returned instead.

🔗def
Char.toNat (c : Char) : Nat

The character's Unicode code point as a Nat.

🔗def
Char.isValidCharNat (n : Nat) : Prop

True for natural numbers that are valid Unicode scalar values.

🔗def
Char.ofUInt8 (n : UInt8) : Char

Converts an 8-bit unsigned integer into a character.

The integer's value is interpreted as a Unicode code point.

🔗def
Char.toUInt8 (c : Char) : UInt8

Converts a character into a UInt8 that contains its code point.

If the code point is larger than 255, it is truncated (reduced modulo 256).

There are two ways to convert a character to a string. Char.toString converts a character to a singleton string that consists of only that character, while Char.quote converts the character to a string representation of the corresponding character literal.

🔗def
Char.toString (c : Char) : String

Constructs a singleton string that contains only the provided character.

Examples:

🔗def
Char.quote (c : Char) : String

Quotes the character to its representation as a character literal, surrounded by single quotes and escaped as necessary.

Examples:

From Characters to Strings

Char.toString produces a string that contains only the character in question:

"e"#eval 'e'.toString
"e"
"e"#eval '\x65'.toString
"e"
"\""#eval '"'.toString
"\""

Char.quote produces a string that contains a character literal, suitably escaped:

"'e'"#eval 'e'.quote
"'e'"
"'e'"#eval '\x65'.quote
"'e'"
"'\\\"'"#eval '"'.quote
"'\\\"'"

18.7.4.2. Character Classes🔗

🔗def
Char.isAlpha (c : Char) : Bool

Returns true if the character is an ASCII letter.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

🔗def
Char.isAlphanum (c : Char) : Bool

Returns true if the character is an ASCII letter or digit.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. The ASCII digits are the following: 0123456789.

🔗def
Char.isDigit (c : Char) : Bool

Returns true if the character is an ASCII digit.

The ASCII digits are the following: 0123456789.

🔗def
Char.isLower (c : Char) : Bool

Returns true if the character is a lowercase ASCII letter.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

🔗def
Char.isUpper (c : Char) : Bool

Returns true if the character is a uppercase ASCII letter.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

🔗def
Char.isWhitespace (c : Char) : Bool

Returns true if the character is a space (' ', U+0020), a tab ('\t', U+0009), a carriage return ('\r', U+000D), or a newline ('\n', U+000A).

18.7.4.3. Case Conversion

🔗def
Char.toUpper (c : Char) : Char

Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII alphabet are returned unchanged.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

🔗def
Char.toLower (c : Char) : Char

Converts an uppercase ASCII letter to the corresponding lowercase letter. Letters outside the ASCII alphabet are returned unchanged.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

18.7.4.4. Comparisons

🔗def
Char.le (a b : Char) : Prop

One character is less than or equal to another if its code point is less than or equal to the other's.

🔗def
Char.lt (a b : Char) : Prop

One character is less than another if its code point is strictly less than the other's.

18.7.4.5. Unicode

🔗def
Char.utf8Size (c : Char) : Nat

Returns the number of bytes required to encode this Char in UTF-8.

🔗def
Char.utf16Size (c : Char) : UInt32

Returns the number of bytes required to encode this Char in UTF-16.