The Lean Language Reference

19.4. Defining New Syntax🔗

Lean's uniform representation of syntax is very general and flexible. This means that extensions to Lean's parser do not require extensions to the representation of parsed syntax.

19.4.1. Syntax Model🔗

Lean's parser produces a concrete syntax tree, of type Lean.Syntax. Lean.Syntax is an inductive type that represents all of Lean's syntax, including commands, terms, tactics, and any custom extensions. All of these are represented by a few basic building blocks:

Atoms

Atoms are the fundamental terminals of the grammar, including literals (such as those for characters and numbers), parentheses, operators, and keywords.

Identifiers

Identifiers represent names, such as x, Nat, or Nat.add. Identifier syntax includes a list of pre-resolved names that the identifier might refer to.

Nodes

Nodes represent the parsing of nonterminals. Nodes contain a syntax kind, which identifies the syntax rule that the node results from, along with an array of child Syntax values.

Missing Syntax

When the parser encounters an error, it returns a partial result, so Lean can provide some feedback about partially-written programs or programs that contain mistakes. Partial results contain one or more instances of missing syntax.

Atoms and identifiers are collectively referred to as tokens.

🔗inductive type
Lean.Syntax : Type

Lean syntax trees.

Syntax trees are used pervasively throughout Lean: they are produced by the parser, transformed by the macro expander, and elaborated. They are also produced by the delaborator and presented to users.

Constructors

missing : Lean.Syntax

A portion of the syntax tree that is missing because of a parse error.

The indexing operator on Syntax also returns Syntax.missing when the index is out of bounds.

node (info : Lean.SourceInfo) (kind : Lean.SyntaxNodeKind)
  (args : Array Lean.Syntax) : Lean.Syntax

A node in the syntax tree that may have further syntax as child nodes. The node's kind determines its interpretation.

For nodes produced by the parser, the info field is typically Lean.SourceInfo.none, and source information is stored in the corresponding fields of identifiers and atoms. This field is used in two ways:

  1. The delaborator uses it to associate nodes with metadata that are used to implement interactive features.

  2. Nodes created by quotations use the field to mark the syntax as synthetic (storing the result of Lean.SourceInfo.fromRef) even when its leading or trailing tokens are not.

atom (info : Lean.SourceInfo) (val : String) : Lean.Syntax

A non-identifier atomic component of syntax.

All of the following are atoms:

  • keywords, such as def, fun, and inductive

  • literals, such as numeric or string literals

  • punctuation and delimiters, such as (, ), and =>.

Identifiers are represented by the Lean.Syntax.ident constructor. Atoms also correspond to quoted strings inside syntax declarations.

ident (info : Lean.SourceInfo) (rawVal : Substring)
  (val : Lean.Name)
  (preresolved : List Lean.Syntax.Preresolved) : Lean.Syntax

An identifier.

In addition to source information, identifiers have the following fields:

  • rawVal is the literal substring from the input file

  • val is the parsed Lean name, potentially including macro scopes.

  • preresolved is the list of possible declarations this could refer to, populated by quotations.

🔗inductive type
Lean.Syntax.Preresolved : Type

A possible binding of an identifier in the context in which it was quoted.

Identifiers in quotations may refer to either global declarations or to namespaces that are in scope at the site of the quotation. These are saved in the Syntax.ident constructor and are part of the implementation of hygienic macros.

Constructors

«namespace» (ns : Lean.Name) : Lean.Syntax.Preresolved

A potential namespace reference

decl (n : Lean.Name) (fields : List String) :
  Lean.Syntax.Preresolved

A potential global constant or section variable reference, with additional field accesses

19.4.2. Syntax Node Kinds

Syntax node kinds typically identify the parser that produced the node. This is one place where the names given to operators or notations (or their automatically-generated internal names) occur. While only nodes contain a field that identifies their kind, identifiers have the kind identKind by convention, while atoms have their internal string as their kind by convention. Lean's parser wraps each keyword atom KW in a singleton node whose kind is `token.KW. The kind of a syntax value can be extracted using Syntax.getKind.

🔗def
Lean.SyntaxNodeKind : Type

Specifies the interpretation of a Syntax.node value. An abbreviation for Name.

Node kinds may be any name, and do not need to refer to declarations in the environment. Conventionally, however, a node's kind corresponds to the Parser or ParserDesc declaration that produces it. There are also a number of built-in node kinds that are used by the parsing infrastructure, such as nullKind and choiceKind; these do not correspond to parser declarations.

🔗def
Lean.Syntax.isOfKind (stx : Lean.Syntax)
  (k : Lean.SyntaxNodeKind) : Bool

Checks whether syntax has the given kind or pseudo-kind.

“Pseudo-kinds” are kinds that are assigned by convention to non-Syntax.node values: identKind for Syntax.ident, `missing for Syntax.missing, and the atom's string literal for atoms.

🔗def
Lean.Syntax.getKind (stx : Lean.Syntax) :
  Lean.SyntaxNodeKind

Gets the kind of a Syntax.node value, or the pseudo-kind of any other Syntax value.

“Pseudo-kinds” are kinds that are assigned by convention to non-Syntax.node values: identKind for Syntax.ident, `missing for Syntax.missing, and the atom's string literal for atoms.

🔗def
Lean.Syntax.setKind (stx : Lean.Syntax)
  (k : Lean.SyntaxNodeKind) : Lean.Syntax

Changes the kind at the root of a Syntax.node to k.

Returns all other Syntax values unchanged.

19.4.3. Token and Literal Kinds

A number of named kinds are associated with the basic tokens produced by the parser. Typically, single-token syntax productions consist of a node that contains a single atom; the kind saved in the node allows the value to be recognized. Atoms for literals are not interpreted by the parser: string atoms include their leading and trailing double-quote characters along with any escape sequences contained within, and hexadecimal numerals are saved as a string that begins with "0x". Helpers such as Lean.TSyntax.getString are provided to perform this decoding on demand.

🔗def
Lean.identKind : Lean.SyntaxNodeKind

The pseudo-kind assigned to identifiers: `ident.

The name `ident is not actually used as a kind for Syntax.node values. It is used by convention as the kind of Syntax.ident values.

🔗def
Lean.strLitKind : Lean.SyntaxNodeKind

`str is the node kind of string literals like "foo".

🔗def
Lean.interpolatedStrKind : Lean.SyntaxNodeKind

`interpolatedStrKind is the node kind of an interpolated string literal like "value = {x}" in s!"value = {x}".

🔗def
Lean.interpolatedStrLitKind :
  Lean.SyntaxNodeKind

`interpolatedStrLitKind is the node kind of interpolated string literal fragments like "value = { and }" in s!"value = {x}".

🔗def
Lean.charLitKind : Lean.SyntaxNodeKind

`char is the node kind of character literals like 'A'.

🔗def
Lean.numLitKind : Lean.SyntaxNodeKind

`num is the node kind of number literals like 42.

🔗def
Lean.scientificLitKind : Lean.SyntaxNodeKind

`scientific is the node kind of floating point literals like 1.23e-3.

🔗def
Lean.nameLitKind : Lean.SyntaxNodeKind

`name is the node kind of name literals like `foo.

🔗def
Lean.fieldIdxKind : Lean.SyntaxNodeKind

`` fieldIdx is the node kind of projection indices like the 2 in x.2.

19.4.4. Internal Kinds

🔗def
Lean.groupKind : Lean.SyntaxNodeKind

The `group kind is used for nodes that result from Lean.Parser.group. This avoids confusion with the null kind when used inside optional.

🔗def
Lean.nullKind : Lean.SyntaxNodeKind

`null is the “fallback” kind, used when no other kind applies. Null nodes result from repetition operators, and empty null nodes represent the failure of an optional parse.

The null kind is used for raw list parsers like many.

🔗def
Lean.choiceKind : Lean.SyntaxNodeKind

The `choice kind is used to represent ambiguous parse results.

The parser prioritizes longer matches over shorter ones, but there is not always a unique longest match. All the parse results are saved, and the determination of which to use is deferred until typing information is available.

🔗def
Lean.hygieneInfoKind : Lean.SyntaxNodeKind

`hygieneInfo is the node kind of the Lean.Parser.hygieneInfo parser, which produces an “invisible token” that captures the hygiene information at the current point without parsing anything.

They can be used to generate identifiers (with Lean.HygieneInfo.mkIdent) as if they were introduced in a macro's input, rather than by its implementation.

19.4.5. Source Positions🔗

Atoms, identifiers, and nodes optionally contain source information that tracks their correspondence with the original file. The parser saves source information for all tokens, but not for nodes; position information for parsed nodes is reconstructed from their first and last tokens. Not all Syntax data results from the parser: it may be the result of macro expansion, in which case it typically contains a mix of generated and parsed syntax, or it may be the result of delaborating an internal term to display it to a user. In these use cases, nodes may themselves contain source information.

Source information comes in two varieties:

Original

Original source information comes from the parser. In addition to the original source location, it also contains leading and trailing whitespace that was skipped by the parser, which allows the original string to be reconstructed. This whitespace is saved as offsets into the string representation of the original source code (that is, as Substring) to avoid having to allocate copies of substrings.

Synthetic

Synthetic source information comes from metaprograms (including macros) or from Lean's internals. Because there is no original string to be reconstructed, it does not save leading and trailing whitespace. Synthetic source positions are used to provide accurate feedback even when terms have been automatically transformed, as well as to track the correspondence between elaborated expressions and their presentation in Lean's output. A synthetic position may be marked canonical, in which case some operations that would ordinarily ignore synthetic positions will treat it as if it were not.

🔗inductive type
Lean.SourceInfo : Type

Source information that relates syntax to the context that it came from.

The primary purpose of SourceInfo is to relate the output of the parser and the macro expander to the original source file. When produced by the parser, Syntax.node does not carry source info; the parser associates it only with atoms and identifiers. If a Syntax.node is introduced by a quotation, then it has synthetic source info that both associates it with an original reference position and indicates that the original atoms in it may not originate from the Lean file under elaboration.

Source info is also used to relate Lean's output to the internal data that it represents; this is the basis for many interactive features. When used this way, it can occur on Syntax.node as well.

Constructors

original (leading : Substring) (pos : String.Pos)
  (trailing : Substring) (endPos : String.Pos) :
  Lean.SourceInfo

A token produced by the parser from original input that includes both leading and trailing whitespace as well as position information.

The leading whitespace is inferred after parsing by Syntax.updateLeading. This is because the “preceding token” is not well-defined during parsing, especially in the presence of backtracking.

synthetic (pos endPos : String.Pos)
  (canonical : Bool := false) : Lean.SourceInfo

Synthetic syntax is syntax that was produced by a metaprogram or by Lean itself (e.g. by a quotation). Synthetic syntax is annotated with a source span from the original syntax, which relates it to the source file.

The delaborator uses this constructor to store an encoded indicator of which core language expression gave rise to the syntax.

The canonical flag on synthetic syntax is enabled for syntax that is not literally part of the original input syntax but should be treated “as if” the user really wrote it for the purpose of hovers and error messages. This is usually used on identifiers in order to connect the binding site to the user's original syntax even if the name of the identifier changes during expansion, as well as on tokens that should receive targeted messages.

Generally speaking, a macro expansion should only use a given piece of input syntax in a single canonical token. An exception to this rule is when the same identifier is used to declare two binders, as in the macro expansion for dependent if:

`(if $h : $cond then $t else $e) ~>
`(dite $cond (fun $h => $t) (fun $h => $t))

In these cases, if the user hovers over h they will see information about both binding sites.

none : Lean.SourceInfo

A synthesized token without position information.

19.4.6. Inspecting Syntax

There are three primary ways to inspect Syntax values:

The Repr Instance

The Repr Syntax instance produces a very detailed representation of syntax in terms of the constructors of the Syntax type.

The ToString Instance

The ToString Syntax instance produces a compact view, representing certain syntax kinds with particular conventions that can make it easier to read at a glance. This instance suppresses source position information.

The Pretty Printer

Lean's pretty printer attempts to render the syntax as it would look in a source file, but fails if the nesting structure of the syntax doesn't match the expected shape.

Representing Syntax as Constructors

The Repr instance's representation of syntax can be inspected by quoting it in the context of Lean.Parser.Command.eval : command`#eval e` evaluates the expression `e` by compiling and evaluating it. * The command attempts to use `ToExpr`, `Repr`, or `ToString` instances to print the result. * If `e` is a monadic value of type `m ty`, then the command tries to adapt the monad `m` to one of the monads that `#eval` supports, which include `IO`, `CoreM`, `MetaM`, `TermElabM`, and `CommandElabM`. Users can define `MonadEval` instances to extend the list of supported monads. The `#eval` command gracefully degrades in capability depending on what is imported. Importing the `Lean.Elab.Command` module provides full capabilities. Due to unsoundness, `#eval` refuses to evaluate expressions that depend on `sorry`, even indirectly, since the presence of `sorry` can lead to runtime instability and crashes. This check can be overridden with the `#eval! e` command. Options: * If `eval.pp` is true (default: true) then tries to use `ToExpr` instances to make use of the usual pretty printer. Otherwise, only tries using `Repr` and `ToString` instances. * If `eval.type` is true (default: false) then pretty prints the type of the evaluated value. * If `eval.derive.repr` is true (default: true) then attempts to auto-derive a `Repr` instance when there is no other way to print the result. See also: `#reduce e` for evaluation by term reduction. #eval, which can run actions in the command elaboration monad CommandElabM. To reduce the size of the example output, the helper removeSourceInfo is used to remove source information prior to display.

partial def removeSourceInfo : Syntax Syntax | .atom _ str => .atom .none str | .ident _ str x pre => .ident .none str x pre | .node _ k children => .node .none k (children.map removeSourceInfo) | .missing => .missing Lean.Syntax.node (Lean.SourceInfo.none) `«term_+_» #[Lean.Syntax.node (Lean.SourceInfo.none) `num #[Lean.Syntax.atom (Lean.SourceInfo.none) "2"], Lean.Syntax.atom (Lean.SourceInfo.none) "+", Lean.Syntax.missing]#eval do let stx `(2 + $(.missing)) logInfo (repr (removeSourceInfo stx.raw))
Lean.Syntax.node
  (Lean.SourceInfo.none)
  `«term_+_»
  #[Lean.Syntax.node (Lean.SourceInfo.none) `num #[Lean.Syntax.atom (Lean.SourceInfo.none) "2"],
    Lean.Syntax.atom (Lean.SourceInfo.none) "+", Lean.Syntax.missing]

In the second example, macro scopes inserted by quotation are visible on the call to List.length.

Lean.Syntax.node (Lean.SourceInfo.none) `Lean.Parser.Term.app #[Lean.Syntax.ident (Lean.SourceInfo.none) "List.length".toSubstring (Lean.Name.mkNum `List.length._@.Manual.NotationsMacros.SyntaxDef._hyg 2) [Lean.Syntax.Preresolved.decl `List.length [], Lean.Syntax.Preresolved.namespace `List.length], Lean.Syntax.node (Lean.SourceInfo.none) `null #[Lean.Syntax.node (Lean.SourceInfo.none) `«term[_]» #[Lean.Syntax.atom (Lean.SourceInfo.none) "[", Lean.Syntax.node (Lean.SourceInfo.none) `null #[Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Rose\""], Lean.Syntax.atom (Lean.SourceInfo.none) ",", Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Daffodil\""], Lean.Syntax.atom (Lean.SourceInfo.none) ",", Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Lily\""]], Lean.Syntax.atom (Lean.SourceInfo.none) "]"]]]#eval do let stx `(List.length ["Rose", "Daffodil", "Lily"]) logInfo (repr (removeSourceInfo stx.raw))

The contents of the pre-resolved identifier List.length are visible here:

Lean.Syntax.node
  (Lean.SourceInfo.none)
  `Lean.Parser.Term.app
  #[Lean.Syntax.ident
      (Lean.SourceInfo.none)
      "List.length".toSubstring
      (Lean.Name.mkNum `List.length._@.Manual.NotationsMacros.SyntaxDef._hyg 2)
      [Lean.Syntax.Preresolved.decl `List.length [], Lean.Syntax.Preresolved.namespace `List.length],
    Lean.Syntax.node
      (Lean.SourceInfo.none)
      `null
      #[Lean.Syntax.node
          (Lean.SourceInfo.none)
          `«term[_]»
          #[Lean.Syntax.atom (Lean.SourceInfo.none) "[",
            Lean.Syntax.node
              (Lean.SourceInfo.none)
              `null
              #[Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Rose\""],
                Lean.Syntax.atom (Lean.SourceInfo.none) ",",
                Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Daffodil\""],
                Lean.Syntax.atom (Lean.SourceInfo.none) ",",
                Lean.Syntax.node (Lean.SourceInfo.none) `str #[Lean.Syntax.atom (Lean.SourceInfo.none) "\"Lily\""]],
            Lean.Syntax.atom (Lean.SourceInfo.none) "]"]]]

The ToString instance represents the constructors of Syntax as follows:

  • The ident constructor is represented as the underlying name. Source information and pre-resolved names are not shown.

  • The atom constructor is represented as a string.

  • The missing constructor is represented by <missing>.

  • The representation of the node constructor depends on the kind. If the kind is `null, then the node is represented by its child nodes order in square brackets. Otherwise, the node is represented by its kind followed by its child nodes, both surrounded by parentheses.

Syntax as Strings

The string representation of syntax can be inspected by quoting it in the context of Lean.Parser.Command.eval : command`#eval e` evaluates the expression `e` by compiling and evaluating it. * The command attempts to use `ToExpr`, `Repr`, or `ToString` instances to print the result. * If `e` is a monadic value of type `m ty`, then the command tries to adapt the monad `m` to one of the monads that `#eval` supports, which include `IO`, `CoreM`, `MetaM`, `TermElabM`, and `CommandElabM`. Users can define `MonadEval` instances to extend the list of supported monads. The `#eval` command gracefully degrades in capability depending on what is imported. Importing the `Lean.Elab.Command` module provides full capabilities. Due to unsoundness, `#eval` refuses to evaluate expressions that depend on `sorry`, even indirectly, since the presence of `sorry` can lead to runtime instability and crashes. This check can be overridden with the `#eval! e` command. Options: * If `eval.pp` is true (default: true) then tries to use `ToExpr` instances to make use of the usual pretty printer. Otherwise, only tries using `Repr` and `ToString` instances. * If `eval.type` is true (default: false) then pretty prints the type of the evaluated value. * If `eval.derive.repr` is true (default: true) then attempts to auto-derive a `Repr` instance when there is no other way to print the result. See also: `#reduce e` for evaluation by term reduction. #eval, which can run actions in the command elaboration monad CommandElabM.

(«term_+_» (num "2") "+" <missing>)#eval do let stx `(2 + $(.missing)) logInfo (toString stx)
(«term_+_» (num "2") "+" <missing>)

In the second example, macro scopes inserted by quotation are visible on the call to List.length.

(Term.app `List.length._@.Manual.NotationsMacros.SyntaxDef._hyg.2 [(«term[_]» "[" [(str "\"Rose\"") "," (str "\"Daffodil\"") "," (str "\"Lily\"")] "]")])#eval do let stx `(List.length ["Rose", "Daffodil", "Lily"]) logInfo (toString stx)
(Term.app
 `List.length._@.Manual.NotationsMacros.SyntaxDef._hyg.2
 [(«term[_]» "[" [(str "\"Rose\"") "," (str "\"Daffodil\"") "," (str "\"Lily\"")] "]")])

Pretty printing syntax is typically most useful when including it in a message to a user. Normally, Lean automatically invokes the pretty printer when necessary. However, ppTerm can be explicitly invoked if needed.

Pretty-Printed Syntax

The string representation of syntax can be inspected by quoting it in the context of Lean.Parser.Command.eval : command`#eval e` evaluates the expression `e` by compiling and evaluating it. * The command attempts to use `ToExpr`, `Repr`, or `ToString` instances to print the result. * If `e` is a monadic value of type `m ty`, then the command tries to adapt the monad `m` to one of the monads that `#eval` supports, which include `IO`, `CoreM`, `MetaM`, `TermElabM`, and `CommandElabM`. Users can define `MonadEval` instances to extend the list of supported monads. The `#eval` command gracefully degrades in capability depending on what is imported. Importing the `Lean.Elab.Command` module provides full capabilities. Due to unsoundness, `#eval` refuses to evaluate expressions that depend on `sorry`, even indirectly, since the presence of `sorry` can lead to runtime instability and crashes. This check can be overridden with the `#eval! e` command. Options: * If `eval.pp` is true (default: true) then tries to use `ToExpr` instances to make use of the usual pretty printer. Otherwise, only tries using `Repr` and `ToString` instances. * If `eval.type` is true (default: false) then pretty prints the type of the evaluated value. * If `eval.derive.repr` is true (default: true) then attempts to auto-derive a `Repr` instance when there is no other way to print the result. See also: `#reduce e` for evaluation by term reduction. #eval, which can run actions in the command elaboration monad CommandElabM. Because new syntax declarations also equip the pretty printer with instructions for displaying them, the pretty printer requires a configuration object. This context can be constructed with a helper:

def getPPContext : CommandElabM PPContext := do return { env := ( getEnv), opts := ( getOptions), currNamespace := ( getCurrNamespace), openDecls := ( getOpenDecls) } 2 + 5#eval show CommandElabM Unit from do let stx `(2 + 5) let fmt ppTerm ( getPPContext) stx logInfo fmt
2 + 5

In the second example, the macro scopes inserted on List.length by quotation cause it to be displayed with a dagger ().

List.length✝ ["Rose", "Daffodil", "Lily"]#eval do let stx `(List.length ["Rose", "Daffodil", "Lily"]) let fmt ppTerm ( getPPContext) stx logInfo fmt
List.length✝ ["Rose", "Daffodil", "Lily"]

Pretty printing wraps lines and inserts indentation automatically. A coercion typically converts the pretty printer's output to the type expected by logInfo, using a default layout width. The width can be controlled by explicitly calling pretty with a named argument.

List.length✝ ["Rose", "Daffodil", "Lily", "Rose", "Daffodil", "Lily", "Rose", "Daffodil", "Lily"]#eval do let flowers := #["Rose", "Daffodil", "Lily"] let manyFlowers := flowers ++ flowers ++ flowers let stx `(List.length [$(manyFlowers.map (quote (k := `term))),*]) let fmt ppTerm ( getPPContext) stx logInfo (fmt.pretty (width := 40))
List.length✝
  ["Rose", "Daffodil", "Lily", "Rose",
    "Daffodil", "Lily", "Rose",
    "Daffodil", "Lily"]

19.4.7. Typed Syntax🔗

Syntax may additionally be annotated with a type that specifies which syntax category it belongs to. The TSyntax structure contains a type-level list of syntax categories along with a syntax tree. The list of syntax categories typically contains precisely one element, in which case the list structure itself is not shown.

🔗structure
Lean.TSyntax (ks : Lean.SyntaxNodeKinds) : Type

Typed syntax, which tracks the potential kinds of the Syntax it contains.

While syntax quotations produce or expect TSyntax values of the correct kinds, this is not otherwise enforced; it can easily be circumvented by direct use of the constructor.

Constructor

Lean.TSyntax.mk

Fields

raw : Lean.Syntax

The underlying Syntax value.

🔗def
Lean.SyntaxNodeKinds : Type

SyntaxNodeKinds is a set of SyntaxNodeKind, implemented as a list.

Singleton SyntaxNodeKinds are extremely common. They are written as name literals, rather than as lists; list syntax is required only for empty or non-singleton sets of kinds.

Quasiquotations prevent the substitution of typed syntax that does not come from the correct syntactic category. For many of Lean's built-in syntactic categories, there is a set of coercions that appropriately wrap one kind of syntax for another category, such as a coercion from the syntax of string literals to the syntax of terms. Additionally, many helper functions that are only valid on some syntactic categories are defined for the appropriate typed syntax only.

The constructor of TSyntax is public, and nothing prevents users from constructing values that break internal invariants. The use of TSyntax should be seen as a way to reduce common mistakes, rather than rule them out entirely.

In addition to TSyntax, there are types that represent arrays of syntax, with or without separators. These correspond to repeated elements in syntax declarations or antiquotations.

🔗def
Lean.TSyntaxArray (ks : Lean.SyntaxNodeKinds) :
  Type

An array of syntaxes of kind ks.

🔗structure
Lean.Syntax.TSepArray
  (ks : Lean.SyntaxNodeKinds) (sep : String) :
  Type

An array of syntax elements that alternate with the given separator. Each syntax element has a kind drawn from ks.

Separator arrays result from repetition operators such as ,*. Coercions to and from Array (TSyntax ks) insert or remove separators as required. The untyped equivalent is Lean.Syntax.SepArray.

Constructor

Lean.Syntax.TSepArray.mk

Fields

elemsAndSeps : Array Lean.Syntax

The array of elements and separators, ordered like #[el1, sep1, el2, sep2, el3].

19.4.8. Aliases

A number of aliases are provided for commonly-used typed syntax varieties. These aliases allow code to be written at a higher level of abstraction.

🔗def
Lean.Syntax.Term : Type

Syntax that represents a Lean term.

🔗def
Lean.Syntax.Command : Type

Syntax that represents a command.

🔗def
Lean.Syntax.Level : Type

Syntax that represents a universe level.

🔗def
Lean.Syntax.Tactic : Type

Syntax that represents a tactic.

🔗def
Lean.Syntax.Prec : Type

Syntax that represents a precedence (e.g. for an operator).

🔗def
Lean.Syntax.Prio : Type

Syntax that represents a priority (e.g. for an instance declaration).

🔗def
Lean.Syntax.Ident : Type

Syntax that represents an identifier.

🔗def
Lean.Syntax.StrLit : Type

Syntax that represents a string literal.

🔗def
Lean.Syntax.CharLit : Type

Syntax that represents a character literal.

🔗def
Lean.Syntax.NameLit : Type

Syntax that represents a quoted name literal that begins with a back-tick.

🔗def
Lean.Syntax.NumLit : Type

Syntax that represents a numeric literal.

🔗def
Lean.Syntax.ScientificLit : Type

Syntax that represents a scientific numeric literal that may have decimal and exponential parts.

🔗def
Lean.Syntax.HygieneInfo : Type

Syntax that represents macro hygiene info.

19.4.9. Helpers for Typed Syntax🔗

For literals, Lean's parser produces a singleton node that contains an atom. The inner atom contains a string with source information, while the node's kind specifies how the atom is to be interpreted. This may involve decoding string escape sequences or interpreting base-16 numeric literals. The helpers in this section perform the correct interpretation.

🔗def
Lean.TSyntax.getId (s : Lean.Ident) : Lean.Name

Extracts the parsed name from the syntax of an identifier.

Returns Name.anonymous if the syntax is malformed.

🔗def
Lean.TSyntax.getName (s : Lean.NameLit) :
  Lean.Name

Decodes a quoted name literal, returning the name.

Returns Lean.Name.anonymous if the syntax is malformed.

🔗def
Lean.TSyntax.getNat (s : Lean.NumLit) : Nat

Interprets a numeric literal as a natural number.

Returns 0 if the syntax is malformed.

🔗def
Lean.TSyntax.getScientific
  (s : Lean.ScientificLit) : Nat × Bool × Nat

Extracts the components of a scientific numeric literal.

Returns a triple (n, sign, e) : Nat × Bool × Nat; the number's value is given by:

if sign then n * 10 ^ (-e) else n * 10 ^ e

Returns (0, false, 0) if the syntax is malformed.

🔗def
Lean.TSyntax.getString (s : Lean.StrLit) :
  String

Decodes a string literal, removing quotation marks and unescaping escaped characters.

Returns "" if the syntax is malformed.

🔗def
Lean.TSyntax.getChar (s : Lean.CharLit) : Char

Decodes a character literal.

Returns (default : Char) if the syntax is malformed.

🔗def
Lean.TSyntax.getHygieneInfo
  (s : Lean.HygieneInfo) : Lean.Name

Decodes macro hygiene information.

19.4.10. Syntax Categories🔗

Lean's parser contains a table of syntax categories, which correspond to nonterminals in a context-free grammar. Some of the most important categories are terms, commands, universe levels, priorities, precedences, and the categories that represent tokens such as literals. Typically, each syntax kind corresponds to a category. New categories can be declared using Lean.Parser.Command.syntaxCat : commanddeclare_syntax_cat.

The leading identifier behavior is an advanced feature that usually does not need to be modified. It controls the behavior of the parser when it encounters an identifier, and can sometimes cause the identifier to be treated as a non-reserved keyword instead. This is used to avoid turning the name of every tactic into a reserved keyword.

🔗inductive type
Lean.Parser.LeadingIdentBehavior : Type

Specifies how the parsing table lookup function behaves for identifiers.

The function Lean.Parser.prattParser uses two tables: one each for leading and trailing parsers. These tables map tokens to parsers. Because keyword tokens are distinct from identifier tokens, keywords and identifiers cannot be confused, even when they are syntactically identical. Specifying an alternative leading identifier behavior allows greater flexiblity and makes it possible to avoid reserved keywords in some situations.

When the leading token is syntactically an identifier, the current syntax category's LeadingIdentBehavior specifies how the parsing table lookup function behaves, and allows controlled “punning” between identifiers and keywords. This feature is used to avoid creating a reserved symbol for each built-in tactic (e.g., apply or assumption). As a result, tactic names can be used as identifiers.

Constructors

default : Lean.Parser.LeadingIdentBehavior

If the leading token is an identifier, then the parser just executes the parsers associated with the auxiliary token “ident”, which parses identifiers.

symbol : Lean.Parser.LeadingIdentBehavior

If the leading token is an identifier <foo>, and there are parsers P associated with the token <foo>, then the parser executes P. Otherwise, it executes only the parsers associated with the auxiliary token “ident”, which parses identifiers.

both : Lean.Parser.LeadingIdentBehavior

If the leading token is an identifier <foo>, then it executes the parsers associated with token <foo> and parsers associated with the auxiliary token “ident”, which parses identifiers.

19.4.11. Syntax Rules🔗

Each syntax category is associated with a set of syntax rules, which correspond to productions in a context-free grammar. Syntax rules can be defined using the Lean.Parser.Command.syntax : commandsyntax command.

syntaxSyntax Rules
command ::= ...
    | A `docComment` parses a "documentation comment" like `/-- foo -/`. This is not treated like
a regular comment (that is, as whitespace); it is parsed and forms part of the syntax tree structure.

A `docComment` node contains a `/--` atom and then the remainder of the comment, `foo -/` in this
example. Use `TSyntax.getDocString` to extract the body text from a doc string syntax node. docComment?
      attributes?
      `attrKind` matches `("scoped" <|> "local")?`, used before an attribute like `@[local simp]`. attrKind
      syntax(:prec)? ((name := ident))? ((priority := prio))? `p*` is shorthand for `many(p)`. It uses parser `p` 0 or more times, and produces a
`nullNode` containing the array of parsed results. This parser has arity 1.

If `p` has arity more than 1, it is auto-grouped in the items generated by the parser.
stx* : ident

As with operator and notation declarations, the contents of the documentation comments are shown to users while they interact with the new syntax. Attributes may be added to invoke compile-time metaprograms on the resulting definition.

Syntax rules interact with section scopes in the same manner as attributes, operators, and notations. By default, syntax rules are available to the parser in any module that transitively imports the one in which they are established, but they may be declared scoped or local to restrict their availability either to contexts in which the current namespace has been opened or to the current section scope, respectively.

When multiple syntax rules for a category can match the current input, the local longest-match rule is used to select one of them. Like notations and operators, if there is a tie for the longest match then the declared priorities are used to determine which parse result applies. If this still does not resolve the ambiguity, then all the results that tied are saved. The elaborator is expected to attempt all of them, succeeding when exactly one can be elaborated.

The syntax rule's precedence, written immediately after the Lean.Parser.Command.syntax : commandsyntax keyword, restricts the parser to use this new syntax only when the precedence context is at least the provided value. Just as with operators and notations, syntax rules may be manually provided with a name; if they are not, an otherwise-unused name is generated. Whether provided or generated, this name is used as the syntax kind in the resulting node.

The body of a syntax declaration is even more flexible than that of a notation. String literals specify atoms to match. Subterms may be drawn from any syntax category, rather than just terms, and they may be optional or repeated, with or without interleaved comma separators. Identifiers in syntax rules indicate syntax categories, rather than naming subterms as they do in notations.

Finally, the syntax rule specifies which syntax category it extends. It is an error to declare a syntax rule in a nonexistent category.

syntaxSyntax Specifiers

The syntactic category stx is the grammar of specifiers that may occur in the body of a Lean.Parser.Command.syntax : commandsyntax command.

String literals are parsed as atoms (including both keywords such as if, #eval, or where):

stx ::=
    str

Leading and trailing spaces in the strings do not affect parsing, but they cause Lean to insert spaces in the corresponding position when displaying the syntax in proof states and error messages. Ordinarily, valid identifiers occurring as atoms in syntax rules become reserved keywords. Preceding a string literal with an ampersand (&) suppresses this behavior:

stx ::= ...
    | &str

Identifiers specify the syntactic category expected in a given position, and may optionally provide a precedence:

stx ::= ...
    | ident(:prec)?

The * modifier is the Kleene star, matching zero or more repetitions of the preceding syntax. It can also be written using many.

stx ::= ...
    | `p*` is shorthand for `many(p)`. It uses parser `p` 0 or more times, and produces a
`nullNode` containing the array of parsed results. This parser has arity 1.

If `p` has arity more than 1, it is auto-grouped in the items generated by the parser.
stx *

The + modifier matches one or more repetitions of the preceding syntax. It can also be written using many1.

stx ::= ...
    | `p+` is shorthand for `many1(p)`. It uses parser `p` 1 or more times, and produces a
`nullNode` containing the array of parsed results. This parser has arity 1.

If `p` has arity more than 1, it is auto-grouped in the items generated by the parser.
stx +

The ? modifier makes a subterm optional, and matches zero or one, but not more, repetitions of the preceding syntax. It can also be written as optional.

stx ::= ...
    | `(p)?` is shorthand for `optional(p)`. It uses parser `p` 0 or 1 times, and produces a
`nullNode` containing the array of parsed results. This parser has arity 1.

`p` is allowed to have arity n > 1 (in which case the node will have either 0 or n children),
but if it has arity 0 then the result will be ambiguous.

Because `?` is an identifier character, `ident?` will not work as intended.
You have to write either `ident ?` or `(ident)?` for it to parse as the `?` combinator
applied to the `ident` parser.
stx ?
stx ::= ...
    | optional(stx)

The ,* modifier matches zero or more repetitions of the preceding syntax with interleaved commas. It can also be written using sepBy.

stx ::= ...
    | `p,*` is shorthand for `sepBy(p, ",")`. It parses 0 or more occurrences of
`p` separated by `,`, that is: `empty | p | p,p | p,p,p | ...`.

It produces a `nullNode` containing a `SepArray` with the interleaved parser
results. It has arity 1, and auto-groups its component parser if needed.
stx ,*

The ,+ modifier matches one or more repetitions of the preceding syntax with interleaved commas. It can also be written using sepBy1.

stx ::= ...
    | `p,+` is shorthand for `sepBy(p, ",")`. It parses 1 or more occurrences of
`p` separated by `,`, that is: `p | p,p | p,p,p | ...`.

It produces a `nullNode` containing a `SepArray` with the interleaved parser
results. It has arity 1, and auto-groups its component parser if needed.
stx ,+

The ,*,? modifier matches zero or more repetitions of the preceding syntax with interleaved commas, allowing an optional trailing comma after the final repetition. It can also be written using sepBy with the allowTrailingSep modifier.

stx ::= ...
    | `p,*,?` is shorthand for `sepBy(p, ",", allowTrailingSep)`.
It parses 0 or more occurrences of `p` separated by `,`, possibly including
a trailing `,`, that is: `empty | p | p, | p,p | p,p, | p,p,p | ...`.

It produces a `nullNode` containing a `SepArray` with the interleaved parser
results. It has arity 1, and auto-groups its component parser if needed.
stx ,*,?

The ,+,? modifier matches one or more repetitions of the preceding syntax with interleaved commas, allowing an optional trailing comma after the final repetition. It can also be written using sepBy1 with the allowTrailingSep modifier.

stx ::= ...
    | `p,+,?` is shorthand for `sepBy1(p, ",", allowTrailingSep)`.
It parses 1 or more occurrences of `p` separated by `,`, possibly including
a trailing `,`, that is: `p | p, | p,p | p,p, | p,p,p | ...`.

It produces a `nullNode` containing a `SepArray` with the interleaved parser
results. It has arity 1, and auto-groups its component parser if needed.
stx ,+,?

The <|> operator, which can be written orelse, matches either syntax. However, if the first branch consumes any tokens, then it is committed to, and failures will not be backtracked:

stx ::= ...
    | `p1 <|> p2` is shorthand for `orelse(p1, p2)`, and parses either `p1` or `p2`.
It does not backtrack, meaning that if `p1` consumes at least one token then
`p2` will not be tried. Therefore, the parsers should all differ in their first
token. The `atomic(p)` parser combinator can be used to locally backtrack a parser.
(For full backtracking, consider using extensible syntax classes instead.)

On success, if the inner parser does not generate exactly one node, it will be
automatically wrapped in a `group` node, so the result will always be arity 1.

The `<|>` combinator does not generate a node of its own, and in particular
does not tag the inner parsers to distinguish them, which can present a problem
when reconstructing the parse. A well formed `<|>` parser should use disjoint
node kinds for `p1` and `p2`.
stx <|> stx
stx ::= ...
    | orelse(stx, stx)

The ! operator matches the complement of its argument. If its argument fails, then it succeeds, resetting the parsing state.

stx ::= ...
    | `!p` parses the negation of `p`. That is, it fails if `p` succeeds, and
otherwise parses nothing. It has arity 0.
! stx

Syntax specifiers may be grouped using parentheses.

stx ::= ...
    | (stx)

Repetitions may be defined using many and many1. The latter requires at least one instance of the repeated syntax.

stx ::= ...
    | many(stx)
stx ::= ...
    | many1(stx)

Repetitions with separators may be defined using sepBy and sepBy1, which respectively match zero or more occurrences and one or more occurrences, separated by some other syntax. They come in three varieties:

  • The two-parameter version uses the atom provided in the string literal to parse the separators, and does not allow trailing separators.

  • The three-parameter version uses the third parameter to parse the separators, using the atom for pretty-printing.

  • The four-parameter version optionally allows the separator to occur an extra time at the end of the sequence. The fourth argument must always literally be the keyword allowTrailingSep.

stx ::= ...
    | sepBy(stx, str)
stx ::= ...
    | sepBy(stx, str, stx)
stx ::= ...
    | sepBy(stx, str, stx, allowTrailingSep)
stx ::= ...
    | sepBy1(stx, str)
stx ::= ...
    | sepBy1(stx, str, stx)
stx ::= ...
    | sepBy1(stx, str, stx, allowTrailingSep)
Parsing Matched Parentheses and Brackets

A language that consists of matched parentheses and brackets can be defined using syntax rules. The first step is to declare a new syntax category:

declare_syntax_cat balanced

Next, rules can be added for parentheses and square brackets. To rule out empty strings, the base cases consist of empty pairs.

syntax "(" ")" : balanced syntax "[" "]" : balanced syntax "(" balanced ")" : balanced syntax "[" balanced "]" : balanced syntax balanced balanced : balanced

In order to invoke Lean's parser on these rules, there must also be an embedding from the new syntax category into one that may already be parsed:

syntax (name := termBalanced) "balanced " balanced : term

These terms cannot be elaborated, but reaching an elaboration error indicates that parsing succeeded:

/-- error: elaboration function for 'termBalanced' has not been implemented balanced () -/ #guard_msgs in example := balanced () /-- error: elaboration function for 'termBalanced' has not been implemented balanced [] -/ #guard_msgs in example := balanced [] /-- error: elaboration function for 'termBalanced' has not been implemented balanced [[]()([])] -/ #guard_msgs in example := balanced [[] () ([])]

Similarly, parsing fails when they are mismatched:

example := balanced [() (unexpected token ']'; expected ')' or balanced]]
<example>:1:25-1:26: unexpected token ']'; expected ')' or balanced
Parsing Comma-Separated Repetitions

A variant of list literals that requires double square brackets and allows a trailing comma can be added with the following syntax:

syntax "[[" term,*,? "]]" : term

Adding a macro that describes how to translate it into an ordinary list literal allows it to be used in tests.

macro_rules | `(term|[[$e:term,*]]) => `([$e,*]) ["Dandelion", "Thistle"]#eval [["Dandelion", "Thistle",]]
["Dandelion", "Thistle"]

19.4.12. Indentation🔗

Internally, the parser maintains a saved source position. Syntax rules may include instructions that interact with these saved positions, causing parsing to fail when a condition is not met. Indentation-sensitive constructs, such as Lean.Parser.Term.do : termdo, save a source position, parse their constituent parts while taking this saved position into account, and then restore the original position.

In particular, Indentation-sensitvity is specified by combining withPosition or withPositionAfterLinebreak, which save the source position at the start of parsing some other syntax, with colGt, colGe, and colEq, which compare the current column with the column from the most recently-saved position. lineEq can also be used to ensure that two positions are on the same line in the source file.

🔗parser alias
withPosition(p)

Arity is sum of arguments' arities

withPosition(p) runs p while setting the "saved position" to the current position. This has no effect on its own, but various other parsers access this position to achieve some composite effect:

  • colGt, colGe, colEq compare the column of the saved position to the current position, used to implement Python-style indentation sensitive blocks

  • lineEq ensures that the current position is still on the same line as the saved position, used to implement composite tokens

The saved position is only available in the read-only state, which is why this is a scoping parser: after the withPosition(..) block the saved position will be restored to its original value.

This parser has the same arity as p - it just forwards the results of p.

🔗parser alias
withoutPosition(p)

Arity is sum of arguments' arities

withoutPosition(p) runs p without the saved position, meaning that position-checking parsers like colGt will have no effect. This is usually used by bracketing constructs like (...) so that the user can locally override whitespace sensitivity.

This parser has the same arity as p - it just forwards the results of p.

🔗parser alias
withPositionAfterLinebreak
  • Arity: 1
  • Automatically wraps arguments in a null node unless there's exactly one
🔗parser alias
colGt
  • Arity: 0
  • Automatically wraps arguments in a null node unless there's exactly one

The colGt parser requires that the next token starts a strictly greater column than the saved position (see withPosition). This can be used for whitespace sensitive syntax for the arguments to a tactic, to ensure that the following tactic is not interpreted as an argument.

example (x : False) : False := by
  revert x
  exact id

Here, the revert tactic is followed by a list of colGt ident, because otherwise it would interpret exact as an identifier and try to revert a variable named exact.

This parser has arity 0 - it does not capture anything.

🔗parser alias
colGe
  • Arity: 0
  • Automatically wraps arguments in a null node unless there's exactly one

The colGe parser requires that the next token starts from at least the column of the saved position (see withPosition), but allows it to be more indented. This can be used for whitespace sensitive syntax to ensure that a block does not go outside a certain indentation scope. For example it is used in the lean grammar for else if, to ensure that the else is not less indented than the if it matches with.

This parser has arity 0 - it does not capture anything.

🔗parser alias
colEq
  • Arity: 0
  • Automatically wraps arguments in a null node unless there's exactly one

The colEq parser ensures that the next token starts at exactly the column of the saved position (see withPosition). This can be used to do whitespace sensitive syntax like a by block or do block, where all the lines have to line up.

This parser has arity 0 - it does not capture anything.

🔗parser alias
lineEq
  • Arity: 0
  • Automatically wraps arguments in a null node unless there's exactly one

The lineEq parser requires that the current token is on the same line as the saved position (see withPosition). This can be used to ensure that composite tokens are not "broken up" across different lines. For example, else if is parsed using lineEq to ensure that the two tokens are on the same line.

This parser has arity 0 - it does not capture anything.

Aligned Columns

This syntax for saving notes takes a bulleted list of items, each of which must be aligned at the same column.

syntax "note " ppLine withPosition((colEq "• " str ppLine)+) : term

There is no elaborator or macro associated with this syntax, but the following example is accepted by the parser:

#check elaboration function for '«termNote__•__»' has not been implemented note • "One" • "Two" note "One" "Two"
elaboration function for '«termNote__•__»' has not been implemented
  note
    • "One"
    • "Two"
    

The syntax does not require that the list is indented with respect to the opening token, which would require an extra withPosition and a colGt.

#check elaboration function for '«termNote__•__»' has not been implemented note • "One" • "Two" note "One" "Two"
elaboration function for '«termNote__•__»' has not been implemented
  note
    • "One"
    • "Two"
    

The following examples are not syntactically valid because the columns of the bullet points do not match.

#check  note    • "One"   expected end of input "Two"
<example>:4:3-4:4: expected end of input
#check  note   • "One"     expected end of input "Two"
<example>:4:5-4:6: expected end of input