9.7. Strings๐Ÿ”—

Strings represent Unicode text. Strings are specially supported by Lean:

  • They have a logical model that specifies their behavior in terms of lists of characters, which specifies the meaning of each operation on strings.

  • They have an optimized run-time representation in compiled code, as packed arrays of bytes that encode the string as UTF-8, and the Lean runtime specially optimizes string operations.

  • There is string literal syntax for writing strings.

The fact that strings are internally represented as UTF-8-encoded byte arrays is visible in the API:

  • There is no operation to project a particular character out of the string, as this would be a performance trap. Use a String.Iterator in a loop instead of a Nat.

  • Strings are indexed by String.Pos, which internally records byte counts rather than character counts, and thus takes constant time. Aside from 0, these should not be constructed directly, but rather updated using String.next and String.prev.

9.7.1. Logical Model

๐Ÿ”—structure
String : Type

String is the type of (UTF-8 encoded) strings.

The compiler overrides the data representation of this type to a byte sequence, and both String.utf8ByteSize and String.length are cached and O(1).

Constructor

String.mk (data : List Char) : String

Pack a List Char into a String. This function is overridden by the compiler and is O(n) in the length of the list.

Fields

data:List Char

Unpack String into a List Char. This function is overridden by the compiler and is O(n) in the length of the list.

The logical model of strings in Lean is a structure that contains a single field, which is a list of characters. This is convenient when specifying and proving properties of string-processing functions at a low level.

9.7.2. Run-Time Representation๐Ÿ”—

Memory layout of strings

Strings are represented as dynamic arrays of bytes, encoded in UTF-8. After the object header, a string contains:

byte count

The number of bytes that currently contain valid string data

capacity

The number of bytes presently allocated for the string

length

The length of the encoded string, which may be shorter than the byte count due to UTF-8 multibyte characters

data

The actual character data in the string, null-terminated

Many string functions in the Lean runtime check whether they have exclusive access to their argument by consulting the reference count in the object header. If they do, and the string's capacity is sufficient, then the existing string can be mutated rather than allocating fresh memory. Otherwise, a new string must be allocated.

9.7.2.1. Performance Notes๐Ÿ”—

Despite the fact that they appear to be an ordinary constructor and projection, String.mk and String.data take time linear in the length of the string. This is because they must implement the conversions between lists of characters and packed arrays of bytes, which must necessarily visit each character.

9.7.3. Syntax๐Ÿ”—

Lean has three kinds of string literals: ordinary string literals, interpolated string literals, and raw string literals.

9.7.3.1. String Literals๐Ÿ”—

String literals begin and end with a double-quote character ". Between these characters, they may contain any other character, including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n', regardless of file encoding and platform). Special characters that cannot otherwise be written in string literals may be escaped with a backslash, so "\"Quotes\"" is a string literal that begins and ends with double quotes. The following forms of escape sequences are accepted:

\r, \n, \t, \\, \", \'

These escape sequences have the usual meaning, mapping to CR, LF, tab, backslash, double quote, and single quote, respectively.

\xNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode codepoint is indicated by the two-digit hexadecimal code.

\uNNNN

When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode codepoint is indicated by the four-digit hexadecimal code.

String literals may contain gaps. A gap is indicated by an escaped newline, with no intervening characters between the escaping backslash and the newline. In this case, the string denoted by the literal is missing the newline and all leading whitespace from the next line. String gaps may not precede lines that contain only whitespace.

Here, str1 and str2 are the same string:

def str1 := "String with \ a gap" def str2 := "String with a gap" example : str1 = str2 := rfl

If the line following the gap is empty, the string is rejected:

def str3 := "String with \unexpected additional newline in string gap
             a gap"

The parser error is:

<example>:2:0: unexpected additional newline in string gap

9.7.3.2. Interpolated Strings๐Ÿ”—

Preceding a string literal with s! causes it to be processed as an interpolated string, in which regions of the string surrounded by { and } characters are parsed and interpreted as Lean expressions. Interpolated strings are interpreted by appending the string that precedes the interpolation, the expression (with an added call to toString surrounding it), and the string that follows the interpolation.

For example:

example : s!"1 + 1 = {1 + 1}\n" = "1 + 1 = " ++ toString (1 + 1) ++ "\n" := rfl

Preceding a literal with m! causes the interpolation to result in an instance of MessageData, the compiler's internal data structure for messages to be shown to users.

9.7.3.3. Raw String Literals๐Ÿ”—

In raw string literals, there are no escape sequences or gaps, and each character denotes itself exactly. Raw string literals are preceded by r, followed by zero or more hash characters (#) and a double quote ". The string literal is completed at a double quote that is followed by the same number of hash characters. For example, they can be used to avoid the need to double-escape certain characters:

example : r"\t" = "\\t" := rfl "Write backslash in a string using '\\\\\\\\'"#eval r"Write backslash in a string using '\\\\'"

The #eval yields:

"Write backslash in a string using '\\\\\\\\'"

Including hash marks allows the strings to contain unescaped quotes:

example : r#"This is "literally" quoted"# = "This is \"literally\" quoted" := rfl

Adding sufficiently many hash marks allows any raw literal to be written literally:

example : r##"This is r#"literally"# quoted"## = "This is r#\"literally\"# quoted" := rfl

9.7.4. API Reference๐Ÿ”—

9.7.4.1. Constructing๐Ÿ”—

๐Ÿ”—def
String.singleton (c : Char) : String
๐Ÿ”—def
String.append : String โ†’ String โ†’ String

Appends two strings.

The internal implementation uses dynamic arrays and will perform destructive updates if the string is not shared.

Example: "abc".append "def" = "abcdef"

๐Ÿ”—def
String.join (l : List String) : String
๐Ÿ”—def
String.intercalate (s : String)
    : List String โ†’ String

9.7.4.2. Conversions๐Ÿ”—

๐Ÿ”—def
String.toList (s : String) : List Char

Converts a string to a list of characters.

Even though the logical model of strings is as a structure that wraps a list of characters, this operation takes time and space linear in the length of the string, because the compiler uses an optimized representation as dynamic arrays.

Example: "abc".toList = ['a', 'b', 'c']

๐Ÿ”—def
String.isNat (s : String) : Bool
๐Ÿ”—def
String.toNat? (s : String) : Option Nat
๐Ÿ”—def
String.toNat! (s : String) : Nat

Interpret the string as the decimal representation of a natural number.

Panics if the string is not a string of digits.

๐Ÿ”—def
String.isInt (s : String) : Bool
๐Ÿ”—def
String.toInt? (s : String) : Option Int
๐Ÿ”—def
String.toInt! (s : String) : Int
๐Ÿ”—def
String.toFormat (s : String) : Lean.Format

9.7.4.3. Properties๐Ÿ”—

๐Ÿ”—def
String.isEmpty (s : String) : Bool
๐Ÿ”—def
String.length : String โ†’ Nat

Returns the length of a string in Unicode code points.

Examples:

  • "".length = 0

  • "abc".length = 3

  • "Lโˆƒโˆ€N".length = 4

๐Ÿ”—def
String.str : String โ†’ Char โ†’ String

9.7.4.4. Positions๐Ÿ”—

๐Ÿ”—structure
String.Pos : Type

A byte position in a String. Internally, Strings are UTF-8 encoded. Codepoint positions (counting the Unicode codepoints rather than bytes) are represented by plain Nats instead. Indexing a String by a byte position is constant-time, while codepoint positions need to be translated internally to byte positions in linear-time.

A byte position p is valid for a string s if 0 โ‰ค p โ‰ค s.endPos and p lies on a UTF8 byte boundary.

Constructor

String.Pos.mk (byteIdx : Nat) : String.Pos

Fields

byteIdx:Nat

Get the underlying byte index of a String.Pos

๐Ÿ”—def
String.Pos.isValid (s : String) (p : String.Pos)
    : Bool

Returns true if p is a valid UTF-8 position in the string s, meaning that p โ‰ค s.endPos and p lies on a UTF-8 character boundary. This has an O(1) implementation in the runtime.

๐Ÿ”—def
String.atEnd : String โ†’ String.Pos โ†’ Bool

Returns true if a specified position is greater than or equal to the position which points to the end of a string. Otherwise, returns false.

Examples: Given def abc := "abc" and def lean := "Lโˆƒโˆ€N",

  • (0 |> abc.next |> abc.next |> abc.atEnd) = false

  • (0 |> abc.next |> abc.next |> abc.next |> abc.next |> abc.atEnd) = true

  • (0 |> lean.next |> lean.next |> lean.next |> lean.next |> lean.atEnd) = true

Because "Lโˆƒโˆ€N" contains multi-byte characters, lean.next (lean.next 0) is not equal to abc.next (abc.next 0).

๐Ÿ”—def
String.endPos (s : String) : String.Pos

A String.Pos pointing at the end of this string.

๐Ÿ”—def
String.next (s : String) (p : String.Pos)
    : String.Pos

Returns the next position in a string after position p. If p is not a valid position or p = s.endPos, the result is unspecified.

Examples: Given def abc := "abc" and def lean := "Lโˆƒโˆ€N",

  • abc.get (0 |> abc.next) = 'b'

  • lean.get (0 |> lean.next |> lean.next) = 'โˆ€'

Cases where the result is unspecified:

  • "abc".next โŸจ3โŸฉ, since 3 = s.endPos

  • "Lโˆƒโˆ€N".next โŸจ2โŸฉ, since 2 points into the middle of a multi-byte UTF-8 character

๐Ÿ”—def
String.next' (s : String) (p : String.Pos)
    (h : ยฌs.atEnd p = true) : String.Pos

Returns the next position in a string after position p. If p is not a valid position, the result is unspecified.

Requires evidence, h, that p is within bounds instead of performing a runtime bounds check as in next.

Examples:

  • let abc := "abc"; abc.get (abc.next' 0 (by decide)) = 'b'

A typical pattern combines next' with a dependent if-else expression to avoid the overhead of an additional bounds check. For example:

def next? (s: String) (p : String.Pos) : Option Char :=
  if h : s.atEnd p then none else s.get (s.next' p h)
๐Ÿ”—def
String.nextWhile (s : String) (p : Char โ†’ Bool)
    (i : String.Pos) : String.Pos
๐Ÿ”—def
String.nextUntil (s : String) (p : Char โ†’ Bool)
    (i : String.Pos) : String.Pos
๐Ÿ”—def
String.prev : String โ†’ String.Pos โ†’ String.Pos

Returns the position in a string before a specified position, p. If p = โŸจ0โŸฉ, returns 0. If p is not a valid position, the result is unspecified.

Examples: Given def abc := "abc" and def lean := "Lโˆƒโˆ€N",

  • abc.get (abc.endPos |> abc.prev) = 'c'

  • lean.get (lean.endPos |> lean.prev |> lean.prev |> lean.prev) = 'โˆƒ'

  • "Lโˆƒโˆ€N".prev โŸจ3โŸฉ is unspecified, since byte 3 occurs in the middle of the multi-byte character 'โˆƒ'.

๐Ÿ”—def
String.Pos.min (pโ‚ pโ‚‚ : String.Pos) : String.Pos

9.7.4.5. Lookups and Modifications๐Ÿ”—

๐Ÿ”—def
String.get (s : String) (p : String.Pos) : Char

Returns the character at position p of a string. If p is not a valid position, returns (default : Char).

See utf8GetAux for the reference implementation.

Examples:

  • "abc".get โŸจ1โŸฉ = 'b'

  • "abc".get โŸจ3โŸฉ = (default : Char) = 'A'

Positions can also be invalid if a byte index points into the middle of a multi-byte UTF-8 character. For example,"Lโˆƒโˆ€N".get โŸจ2โŸฉ = (default : Char) = 'A'.

๐Ÿ”—def
String.get? : String โ†’ String.Pos โ†’ Option Char

Returns the character at position p. If p is not a valid position, returns none.

Examples:

  • "abc".get? โŸจ1โŸฉ = some 'b'

  • "abc".get? โŸจ3โŸฉ = none

Positions can also be invalid if a byte index points into the middle of a multi-byte UTF-8 character. For example, "Lโˆƒโˆ€N".get? โŸจ2โŸฉ = none

๐Ÿ”—def
String.get! (s : String) (p : String.Pos) : Char

Returns the character at position p of a string. If p is not a valid position, returns (default : Char) and produces a panic error message.

Examples:

  • "abc".get! โŸจ1โŸฉ = 'b'

  • "abc".get! โŸจ3โŸฉ panics

Positions can also be invalid if a byte index points into the middle of a multi-byte UTF-8 character. For example, "Lโˆƒโˆ€N".get! โŸจ2โŸฉ panics.

๐Ÿ”—def
String.get' (s : String) (p : String.Pos)
    (h : ยฌs.atEnd p = true) : Char

Returns the character at position p of a string. If p is not a valid position, returns (default : Char).

Requires evidence, h, that p is within bounds instead of performing a runtime bounds check as in get.

Examples:

  • "abc".get' 0 (by decide) = 'a'

  • let lean := "Lโˆƒโˆ€N"; lean.get' (0 |> lean.next |> lean.next) (by decide) = 'โˆ€'

A typical pattern combines get' with a dependent if-else expression to avoid the overhead of an additional bounds check. For example:

def getInBounds? (s : String) (p : String.Pos) : Option Char :=
  if h : s.atEnd p then none else some (s.get' p h)

Even with evidence of ยฌ s.atEnd p, p may be invalid if a byte index points into the middle of a multi-byte UTF-8 character. For example, "Lโˆƒโˆ€N".get' โŸจ2โŸฉ (by decide) = (default : Char).

๐Ÿ”—def
String.extract
    : String โ†’ String.Pos โ†’ String.Pos โ†’ String
๐Ÿ”—def
String.take (s : String) (n : Nat) : String
๐Ÿ”—def
String.takeWhile (s : String) (p : Char โ†’ Bool)
    : String
๐Ÿ”—def
String.takeRight (s : String) (n : Nat) : String
๐Ÿ”—def
String.takeRightWhile (s : String)
    (p : Char โ†’ Bool) : String
๐Ÿ”—def
String.drop (s : String) (n : Nat) : String
๐Ÿ”—def
String.dropWhile (s : String) (p : Char โ†’ Bool)
    : String
๐Ÿ”—def
String.dropRight (s : String) (n : Nat) : String
๐Ÿ”—def
String.dropRightWhile (s : String)
    (p : Char โ†’ Bool) : String
๐Ÿ”—def
String.trim (s : String) : String
๐Ÿ”—def
String.trimLeft (s : String) : String
๐Ÿ”—def
String.trimRight (s : String) : String
๐Ÿ”—def
String.removeLeadingSpaces (s : String) : String
๐Ÿ”—def
String.set : String โ†’ String.Pos โ†’ Char โ†’ String

Replaces the character at a specified position in a string with a new character. If the position is invalid, the string is returned unchanged.

If both the replacement character and the replaced character are ASCII characters and the string is not shared, destructive updates are used.

Examples:

  • "abc".set โŸจ1โŸฉ 'B' = "aBc"

  • "abc".set โŸจ3โŸฉ 'D' = "abc"

  • "Lโˆƒโˆ€N".set โŸจ4โŸฉ 'X' = "LโˆƒXN"

Because 'โˆƒ' is a multi-byte character, the byte index 2 in Lโˆƒโˆ€N is an invalid position, so "Lโˆƒโˆ€N".set โŸจ2โŸฉ 'X' = "Lโˆƒโˆ€N".

๐Ÿ”—def
String.modify (s : String) (i : String.Pos)
    (f : Char โ†’ Char) : String

Replaces the character at position p in the string s with the result of applying f to that character. If p is an invalid position, the string is returned unchanged.

Examples:

  • abc.modify โŸจ1โŸฉ Char.toUpper = "aBc"

  • abc.modify โŸจ3โŸฉ Char.toUpper = "abc"

๐Ÿ”—def
String.front (s : String) : Char

Returns the first character in s. If s = "", returns (default : Char).

Examples:

  • "abc".front = 'a'

  • "".front = (default : Char)

๐Ÿ”—def
String.back (s : String) : Char

Returns the last character in s. If s = "", returns (default : Char).

Examples:

  • "abc".back = 'c'

  • "".back = (default : Char)

๐Ÿ”—def
String.posOf (s : String) (c : Char)
    : String.Pos

Returns the position of the first occurrence of a character, c, in s. If s does not contain c, returns s.endPos.

Examples:

  • "abba".posOf 'a' = โŸจ0โŸฉ

  • "abba".posOf 'z' = โŸจ4โŸฉ

  • "Lโˆƒโˆ€N".posOf 'โˆ€' = โŸจ4โŸฉ

๐Ÿ”—def
String.revPosOf (s : String) (c : Char)
    : Option String.Pos

Returns the position of the last occurrence of a character, c, in s. If s does not contain c, returns none.

Examples:

  • "abba".posOf 'a' = some โŸจ3โŸฉ

  • "abba".posOf 'z' = none

  • "Lโˆƒโˆ€N".posOf 'โˆ€' = some โŸจ4โŸฉ

๐Ÿ”—def
String.contains (s : String) (c : Char) : Bool
๐Ÿ”—def
String.offsetOfPos (s : String)
    (pos : String.Pos) : Nat
๐Ÿ”—def
String.replace (s pattern replacement : String)
    : String

Replace all occurrences of pattern in s with replacement.

๐Ÿ”—def
String.findLineStart (s : String)
    (pos : String.Pos) : String.Pos

Return the beginning of the line that contains character pos.

๐Ÿ”—def
String.find (s : String) (p : Char โ†’ Bool)
    : String.Pos
๐Ÿ”—def
String.revFind (s : String) (p : Char โ†’ Bool)
    : Option String.Pos

9.7.4.6. Folds and Aggregation๐Ÿ”—

๐Ÿ”—def
String.map (f : Char โ†’ Char) (s : String)
    : String
๐Ÿ”—def
String.foldl.{u} {ฮฑ : Type u} (f : ฮฑ โ†’ Char โ†’ ฮฑ)
    (init : ฮฑ) (s : String) : ฮฑ
๐Ÿ”—def
String.foldr.{u} {ฮฑ : Type u} (f : Char โ†’ ฮฑ โ†’ ฮฑ)
    (init : ฮฑ) (s : String) : ฮฑ
๐Ÿ”—def
String.all (s : String) (p : Char โ†’ Bool) : Bool
๐Ÿ”—def
String.any (s : String) (p : Char โ†’ Bool) : Bool

9.7.4.7. Comparisons๐Ÿ”—

๐Ÿ”—def
String.le (a b : String) : Prop
๐Ÿ”—def
String.firstDiffPos (a b : String) : String.Pos

Returns the first position where the two strings differ.

๐Ÿ”—def
String.substrEq (s1 : String)
    (off1 : String.Pos) (s2 : String)
    (off2 : String.Pos) (sz : Nat) : Bool

Return true iff the substring of byte size sz starting at position off1 in s1 is equal to that starting at off2 in s2.. False if either substring of that byte size does not exist.

๐Ÿ”—def
String.isPrefixOf (p s : String) : Bool

Return true iff p is a prefix of s

๐Ÿ”—def
String.startsWith (s pre : String) : Bool
๐Ÿ”—def
String.endsWith (s post : String) : Bool
๐Ÿ”—def
String.decEq (sโ‚ sโ‚‚ : String)
    : Decidable (sโ‚ = sโ‚‚)

Decides equality on String. This function is overridden with a native implementation.

๐Ÿ”—
String.hash (s : String) : UInt64

An opaque string hash function.

9.7.4.8. Manipulation๐Ÿ”—

๐Ÿ”—def
String.split (s : String) (p : Char โ†’ Bool)
    : List String
๐Ÿ”—def
String.splitOn (s : String)
  (sep : String := " ") : List String

Splits a string s on occurrences of the separator sep. When sep is empty, it returns [s]; when sep occurs in overlapping patterns, the first match is taken. There will always be exactly n+1 elements in the returned list if there were n nonoverlapping matches of sep in the string. The default separator is " ". The separators are not included in the returned substrings.

"here is some text ".splitOn = ["here", "is", "some", "text", ""]
"here is some text ".splitOn "some" = ["here is ", " text "]
"here is some text ".splitOn "" = ["here is some text "]
"ababacabac".splitOn "aba" = ["", "bac", "c"]
๐Ÿ”—def
String.push : String โ†’ Char โ†’ String

Pushes a character onto the end of a string.

The internal implementation uses dynamic arrays and will perform destructive updates if the string is not shared.

Example: "abc".push 'd' = "abcd"

๐Ÿ”—def
String.pushn (s : String) (c : Char) (n : Nat)
    : String
๐Ÿ”—def
String.capitalize (s : String) : String
๐Ÿ”—def
String.decapitalize (s : String) : String
๐Ÿ”—def
String.toUpper (s : String) : String
๐Ÿ”—def
String.toLower (s : String) : String

9.7.4.9. Iterators๐Ÿ”—

Fundamentally, a String.Iterator is a pair of a string and a valid position in the string. Iterators provide functions for getting the current character (curr), replacing the current character (setCurr), checking whether the iterator can move to the left or the right (hasPrev and hasNext, respectively), and moving the iterator (prev and next, respectively). Clients are responsible for checking whether they've reached the beginning or end of the string; otherwise, the iterator ensures that its position always points at a character.

๐Ÿ”—structure
String.Iterator : Type

Iterator over the characters (Char) of a String.

Typically created by s.iter, where s is a String.

An iterator is valid if the position i is valid for the string s, meaning 0 โ‰ค i โ‰ค s.endPos and i lies on a UTF8 byte boundary. If i = s.endPos, the iterator is at the end of the string.

Most operations on iterators return arbitrary values if the iterator is not valid. The functions in the String.Iterator API should rule out the creation of invalid iterators, with two exceptions:

  • Iterator.next iter is invalid if iter is already at the end of the string (iter.atEnd is true), and

  • Iterator.forward iter n/Iterator.nextn iter n is invalid if n is strictly greater than the number of remaining characters.

Constructor

String.Iterator.mk (s : String) (i : String.Pos) : String.Iterator

Fields

i:String.Pos

The current position.

This position is not necessarily valid for the string, for instance if one keeps calling Iterator.next when Iterator.atEnd is true. If the position is not valid, then the current character is (default : Char), similar to String.get on an invalid position.

s:String

The string the iterator is for.

๐Ÿ”—def
String.iter (s : String) : String.Iterator

Creates an iterator at the beginning of a string.

๐Ÿ”—def
String.mkIterator (s : String) : String.Iterator

Creates an iterator at the beginning of a string.

๐Ÿ”—def
String.Iterator.curr : String.Iterator โ†’ Char

The character at the current position.

On an invalid position, returns (default : Char).

๐Ÿ”—def
String.Iterator.hasNext : String.Iterator โ†’ Bool

True if the iterator is not past the string's last character.

๐Ÿ”—def
String.Iterator.next
    : String.Iterator โ†’ String.Iterator

Moves the iterator's position forward by one character, unconditionally.

It is only valid to call this function if the iterator is not at the end of the string, i.e. Iterator.atEnd is false; otherwise, the resulting iterator will be invalid.

๐Ÿ”—def
String.Iterator.forward
    : String.Iterator โ†’ Nat โ†’ String.Iterator

Moves the iterator's position several characters forward.

The resulting iterator is only valid if the number of characters to skip is less than or equal to the number of characters left in the iterator.

๐Ÿ”—def
String.Iterator.nextn
    : String.Iterator โ†’ Nat โ†’ String.Iterator

Moves the iterator's position several characters forward.

The resulting iterator is only valid if the number of characters to skip is less than or equal to the number of characters left in the iterator.

๐Ÿ”—def
String.Iterator.hasPrev : String.Iterator โ†’ Bool

True if the position is not zero.

๐Ÿ”—def
String.Iterator.prev
    : String.Iterator โ†’ String.Iterator

Decreases the iterator's position.

If the position is zero, this function is the identity.

๐Ÿ”—def
String.Iterator.prevn
    : String.Iterator โ†’ Nat โ†’ String.Iterator

Moves the iterator's position several characters back.

If asked to go back more characters than available, stops at the beginning of the string.

๐Ÿ”—def
String.Iterator.atEnd : String.Iterator โ†’ Bool

True if the iterator is past the string's last character.

๐Ÿ”—def
String.Iterator.toEnd
    : String.Iterator โ†’ String.Iterator

Moves the iterator's position to the end of the string.

Note that i.toEnd.atEnd is always true.

๐Ÿ”—def
String.Iterator.setCurr
    : String.Iterator โ†’ Char โ†’ String.Iterator

Replaces the current character in the string.

Does nothing if the iterator is at the end of the string. If the iterator contains the only reference to its string, this function will mutate the string in-place instead of allocating a new one.

๐Ÿ”—def
String.Iterator.extract
    : String.Iterator โ†’ String.Iterator โ†’ String

Extracts the substring between the positions of two iterators.

Returns the empty string if the iterators are for different strings, or if the position of the first iterator is past the position of the second iterator.

๐Ÿ”—def
String.Iterator.remainingToString
    : String.Iterator โ†’ String

The remaining characters in an iterator, as a string.

๐Ÿ”—def
String.Iterator.remainingBytes
    : String.Iterator โ†’ Nat

Number of bytes remaining in the iterator.

๐Ÿ”—def
String.Iterator.pos (self : String.Iterator)
    : String.Pos

The current position.

This position is not necessarily valid for the string, for instance if one keeps calling Iterator.next when Iterator.atEnd is true. If the position is not valid, then the current character is (default : Char), similar to String.get on an invalid position.

9.7.4.10. Substrings๐Ÿ”—

๐Ÿ”—def
String.toSubstring (s : String) : Substring

Convert a String into a Substring denoting the entire string.

๐Ÿ”—def
String.toSubstring' (s : String) : Substring

String.toSubstring without [inline] annotation.

9.7.4.11. Proof Automation๐Ÿ”—

๐Ÿ”—def
String.reduceGT : Lean.Meta.Simp.Simproc
๐Ÿ”—def
String.reduceGE : Lean.Meta.Simp.Simproc
๐Ÿ”—def
String.reduceBinPred (declName : Lean.Name)
    (arity : Nat) (op : String โ†’ String โ†’ Bool)
    (e : Lean.Expr)
    : Lean.Meta.SimpM Lean.Meta.Simp.Step
๐Ÿ”—def
String.reduceLT : Lean.Meta.Simp.Simproc
๐Ÿ”—def
String.reduceLE : Lean.Meta.Simp.Simproc
๐Ÿ”—def
String.reduceBEq : Lean.Meta.Simp.DSimproc
๐Ÿ”—def
String.reduceEq : Lean.Meta.Simp.Simproc
๐Ÿ”—def
String.reduceAppend : Lean.Meta.Simp.DSimproc
๐Ÿ”—def
String.reduceMk : Lean.Meta.Simp.DSimproc
๐Ÿ”—def
String.reduceBoolPred (declName : Lean.Name)
    (arity : Nat) (op : String โ†’ String โ†’ Bool)
    (e : Lean.Expr)
    : Lean.Meta.SimpM Lean.Meta.Simp.DStep
๐Ÿ”—def
String.reduceBNe : Lean.Meta.Simp.DSimproc
๐Ÿ”—def
String.reduceNe : Lean.Meta.Simp.Simproc

9.7.4.12. Metaprogramming๐Ÿ”—

๐Ÿ”—def
String.toName (s : String) : Lean.Name

Converts a String to a hierarchical Name after splitting it at the dots.

"a.b".toName is the name a.b, not ยซa.bยป. For the latter, use Name.mkSimple.

๐Ÿ”—def
String.toFileMap (s : String) : Lean.FileMap
๐Ÿ”—def
String.quote (s : String) : String
๐Ÿ”—def
String.fromExpr? (e : Lean.Expr)
    : Lean.Meta.SimpM (Option String)

9.7.4.13. Encodings๐Ÿ”—

๐Ÿ”—def
String.utf16PosToCodepointPos (s : String)
    (pos : Nat) : Nat
๐Ÿ”—def
String.utf16PosToCodepointPosFrom (s : String)
    (utf16pos : Nat) (off : String.Pos) : Nat

Computes the position of the Unicode codepoint at UTF-16 offset utf16pos in the substring of s starting at UTF-8 offset off.

๐Ÿ”—def
String.codepointPosToUtf16Pos (s : String)
    (pos : Nat) : Nat
๐Ÿ”—def
String.codepointPosToUtf16PosFrom (s : String)
    (n : Nat) (off : String.Pos) : Nat

Computes the UTF-16 offset of the n-th Unicode codepoint in the substring of s starting at UTF-8 offset off. Yes, this is actually useful.

๐Ÿ”—def
String.getUtf8Byte (s : String) (n : Nat)
    (h : n < s.utf8ByteSize) : UInt8

Accesses a byte in the UTF-8 encoding of the String. O(1)

๐Ÿ”—def
String.utf8ByteSize : String โ†’ Nat

The UTF-8 byte length of this string. This is overridden by the compiler to be cached and O(1).

๐Ÿ”—def
String.utf8EncodeChar (c : Char) : List UInt8
๐Ÿ”—def
String.utf8DecodeChar? (a : ByteArray) (i : Nat)
    : Option Char
๐Ÿ”—def
String.fromUTF8 (a : ByteArray)
    (h : String.validateUTF8 a = true) : String

Converts a UTF-8 encoded ByteArray string to String.

๐Ÿ”—def
String.fromUTF8? (a : ByteArray) : Option String

Converts a UTF-8 encoded ByteArray string to String, or returns none if a is not properly UTF-8 encoded.

๐Ÿ”—def
String.fromUTF8! (a : ByteArray) : String

Converts a UTF-8 encoded ByteArray string to String, or panics if a is not properly UTF-8 encoded.

๐Ÿ”—def
String.toUTF8 (a : String) : ByteArray

Converts the given String to a UTF-8 encoded byte array.

๐Ÿ”—def
String.validateUTF8 (a : ByteArray) : Bool

Returns true if the given byte array consists of valid UTF-8.

๐Ÿ”—def
String.utf16Length (s : String) : Nat
๐Ÿ”—def
String.codepointPosToUtf8PosFrom (s : String)
    : String.Pos โ†’ Nat โ†’ String.Pos

Starting at utf8pos, finds the UTF-8 offset of the p-th codepoint.

๐Ÿ”—def
String.crlfToLf (text : String) : String

Replaces each \r\n with \n to normalize line endings, but does not validate that there are no isolated \r characters. It is an optimized version of String.replace text "\r\n" "\n".

9.7.5. FFI๐Ÿ”—

๐Ÿ”—def
String.csize (c : Char) : Nat
FFI type
typedef struct {
    lean_object m_header;
    /* byte length including '\0' terminator */
    size_t      m_size;
    size_t      m_capacity;
    /* UTF8 length */
    size_t      m_length;
    char        m_data[0];
} lean_string_object;

The representation of strings in C. See the description of run-time Strings for more details.

FFI function
bool lean_is_string(lean_object * o)

Returns true if o is a string, or false otherwise.

FFI function
lean_string_object * lean_to_string(lean_object * o)

Performs a runtime check that o is indeed a string. If o is not a string, an assertion fails.