Characters are Unicode scalar values.
Constructor
Char.mk
Fields
val : UInt32
The underlying Unicode scalar value as a UInt32
.
valid : self.val.isValidChar
The value must be a legal scalar value.
Characters are represented by the type Char
, which may be any Unicode scalar value.
While strings are UTF-8-encoded arrays of bytes, characters are represented by full 32-bit values.
Lean provides special syntax for character literals.
From the perspective of Lean's logic, characters consist of a 32-bit unsigned integer paired with a proof that it is a valid Unicode scalar value.
Char : Type
Characters are Unicode scalar values.
Char.mk
val : UInt32
The underlying Unicode scalar value as a UInt32
.
valid : self.val.isValidChar
The value must be a legal scalar value.
As a trivial wrapper, characters are represented identically to UInt32
.
In particular, characters are represented as 32-bit immediate values in monomorphic contexts.
In other words, a field of a constructor or structure of type Char
does not require indirection to access.
In polymorphic contexts, characters are boxed.
Character literals consist of a single character or an escape sequence enclosed in single quotes ('
, Unicode 'APOSTROPHE' (U+0027)
).
Between these single quotes, the character literal may contain character other that '
, including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n'
, regardless of file encoding and platform).
Special characters may be escaped with a backslash, so '\''
is a character literal that contains a single quote.
The following forms of escape sequences are accepted:
\r
, \n
, \t
, \\
, \"
, \'
These escape sequences have the usual meaning, mapping to CR
, LF
, tab, backslash, double quote, and single quote, respectively.
\xNN
When NN
is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the two-digit hexadecimal code.
\uNNNN
When NN
is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the four-digit hexadecimal code.
Char.isValidCharNat (n : Nat) : Prop
True for natural numbers that are valid Unicode scalar values.
Char.ofUInt8 (n : UInt8) : Char
Converts an 8-bit unsigned integer into a character.
The integer's value is interpreted as a Unicode code point.
Char.toUInt8 (c : Char) : UInt8
Converts a character into a UInt8
that contains its code point.
If the code point is larger than 255, it is truncated (reduced modulo 256).
There are two ways to convert a character to a string.
Char.toString
converts a character to a singleton string that consists of only that character, while Char.quote
converts the character to a string representation of the corresponding character literal.
Char.toString
produces a string that contains only the character in question:
#eval 'e'.toString
"e"
#eval '\x65'.toString
"e"
#eval '"'.toString
"\""
Char.quote
produces a string that contains a character literal, suitably escaped:
#eval 'e'.quote
"'e'"
#eval '\x65'.quote
"'e'"
#eval '"'.quote
"'\\\"'"
Char.isAlpha (c : Char) : Bool
Returns true
if the character is an ASCII letter.
The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
.
Char.isAlphanum (c : Char) : Bool
Returns true
if the character is an ASCII letter or digit.
The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
.
The ASCII digits are the following: 0123456789
.
Char.isDigit (c : Char) : Bool
Returns true
if the character is an ASCII digit.
The ASCII digits are the following: 0123456789
.
Char.isLower (c : Char) : Bool
Returns true
if the character is a lowercase ASCII letter.
The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz
.
Char.le (a b : Char) : Prop
One character is less than or equal to another if its code point is less than or equal to the other's.
Char.lt (a b : Char) : Prop
One character is less than another if its code point is strictly less than the other's.