The Lean Language Reference

20.7.Β CharactersπŸ”—

Characters are represented by the type Char, which may be any Unicode scalar value. While strings are UTF-8-encoded arrays of bytes, characters are represented by full 32-bit values. Lean provides special syntax for character literals.

20.7.1.Β Logical ModelπŸ”—

From the perspective of Lean's logic, characters consist of a 32-bit unsigned integer paired with a proof that it is a valid Unicode scalar value.

πŸ”—structure
Char : Type
Char : Type

Characters are Unicode scalar values.

Constructor

Char.mk

Fields

val : UInt32

The underlying Unicode scalar value as a UInt32.

valid : self.val.isValidChar

The value must be a legal scalar value.

20.7.2.Β Run-Time RepresentationπŸ”—

As a trivial wrapper, characters are represented identically to UInt32. In particular, characters are represented as 32-bit immediate values in monomorphic contexts. In other words, a field of a constructor or structure of type Char does not require indirection to access. In polymorphic contexts, characters are boxed.

20.7.3.Β SyntaxπŸ”—

Character literals consist of a single character or an escape sequence enclosed in single quotes (', Unicode 'APOSTROPHE' (U+0027)). Between these single quotes, the character literal may contain character other that ', including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n', regardless of file encoding and platform). Special characters may be escaped with a backslash, so '\'' is a character literal that contains a single quote. The following forms of escape sequences are accepted:

\r, \n, \t, \\, \", \'

These escape sequences have the usual meaning, mapping to CR, LF, tab, backslash, double quote, and single quote, respectively.

\xNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the two-digit hexadecimal code.

\uNNNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the four-digit hexadecimal code.

20.7.4.Β API ReferenceπŸ”—

20.7.4.1.Β ConversionsπŸ”—

πŸ”—def

Converts a Nat into a Char. If the Nat does not encode a valid Unicode scalar value, '\0' is returned instead.

πŸ”—def

The character's Unicode code point as a Nat.

πŸ”—def

True for natural numbers that are valid Unicode scalar values.

πŸ”—def

Converts an 8-bit unsigned integer into a character.

The integer's value is interpreted as a Unicode code point.

πŸ”—def

Converts a character into a UInt8 that contains its code point.

If the code point is larger than 255, it is truncated (reduced modulo 256).

There are two ways to convert a character to a string. Char.toString converts a character to a singleton string that consists of only that character, while Char.quote converts the character to a string representation of the corresponding character literal.

πŸ”—def

Constructs a singleton string that contains only the provided character.

Examples:

πŸ”—def

Quotes the character to its representation as a character literal, surrounded by single quotes and escaped as necessary.

Examples:

From Characters to Strings

Char.toString produces a string that contains only the character in question:

"e"#eval 'e'.toString
"e"
"e"#eval '\x65'.toString
"e"
"\""#eval '"'.toString
"\""

Char.quote produces a string that contains a character literal, suitably escaped:

"'e'"#eval 'e'.quote
"'e'"
"'e'"#eval '\x65'.quote
"'e'"
"'\\\"'"#eval '"'.quote
"'\\\"'"

20.7.4.2.Β Character ClassesπŸ”—

πŸ”—def

Returns true if the character is an ASCII letter.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.

πŸ”—def

Returns true if the character is an ASCII letter or digit.

The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz. The ASCII digits are the following: 0123456789.

πŸ”—def

Returns true if the character is an ASCII digit.

The ASCII digits are the following: 0123456789.

πŸ”—def

Returns true if the character is a lowercase ASCII letter.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

πŸ”—def

Returns true if the character is a uppercase ASCII letter.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

πŸ”—def

Returns true if the character is a space (' ', U+0020), a tab ('\t', U+0009), a carriage return ('\r', U+000D), or a newline ('\n', U+000A).

20.7.4.3.Β Case ConversionπŸ”—

πŸ”—def

Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII alphabet are returned unchanged.

The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.

πŸ”—def

Converts an uppercase ASCII letter to the corresponding lowercase letter. Letters outside the ASCII alphabet are returned unchanged.

The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.

20.7.4.4.Β ComparisonsπŸ”—

πŸ”—def
Char.le (a b : Char) : Prop
Char.le (a b : Char) : Prop

One character is less than or equal to another if its code point is less than or equal to the other's.

πŸ”—def
Char.lt (a b : Char) : Prop
Char.lt (a b : Char) : Prop

One character is less than another if its code point is strictly less than the other's.

20.7.4.5.Β UnicodeπŸ”—

πŸ”—def

Returns the number of bytes required to encode this Char in UTF-8.

πŸ”—def

Returns the number of bytes required to encode this Char in UTF-16.