Display the given message if a
is shared, that is, RC(a) > 1
20.1. Reference Counting
Lean uses reference counting for memory management. Each allocated object maintains a count of how many other objects refer to it. When a new reference is added, the count is incremented, and when a reference is dropped, the count is decremented. When a reference count reaches zero, the object is no longer reachable and can play no part in the further execution of the program. It is deallocated and all of its references to other objects are dropped, which may trigger further deallocations.
Reference counting provides a number of benefits:
- Re-Use of Memory
If an object's reference count drops to zero just as another of the same size is to be allocated, then the original object's memory can be safely re-used for the new object. As a result, many common data-structure traversals (such as
List.map
) do not need to allocate memory when there is exactly one reference to the data structure to be traversed.- Opportunistic In-Place Updates
Primitive types, such as strings and arrays, may provide operations that copy shared data but modify unshared data in-place. As long as they hold the only reference to the value being modified, many operations on these primitive types will modify it rather than copy it. This can lead to substantial performance benefits. Carefully-written
Array
code avoids the performance overhead of immutable data structures while maintaining the ease of reasoning provided by pure functions.- Predictability
Reference counts are decremented at predictable times. As a result, reference-counted objects can be used to manage other resources, such as file handles. In Lean, a
Handle
does not need to be explicitly closed because it is closed immediately when it is no longer accessible.- Simpler FFI
Objects managed with reference counting don't need to be relocated as part of reclaiming unused memory. This greatly simplifies interaction with code written in other languages, such as C.
The traditional drawbacks of reference counting include the performance overhead due to updating reference counts along with the inability to recognize and deallocate cyclic data. The former drawback is minimized by an analysis based on borrowing that allows many reference count updates to be elided. Nevertheless, multi-threaded code requires that reference count updates are synchronized between threads, which also imposes a substantial overhead. To reduce this overhead, Lean values are partitioned into those which are reachable from multiple threads and those which are not. Single-threaded reference counts can be updated much faster than multi-threaded reference counts, and many values are accessed only on a single thread. Together, these techniques greatly reduce the performance overhead of reference counting. Because Lean cannot create cyclic data, no technique is needed to detect it. Ullrich and de Moura (2019)Sebastian Ullrich and Leonardo de Moura, 2019. “Counting Immutable Beans: Reference Counting Optimized for Purely Functional Programming”. In Proceedings of the 31st Symposium on Implementation and Application of Functional Languages (IFL 2019). provide more details on the implementation of reference counting in Lean.
20.1.1. Observing Uniqueness
Ensuring that arrays and strings are uniquely referenced is key to writing fast code in Lean.
The primitive dbgTraceIfShared
can be used to check whether a data structure is aliased.
When called, it returns its argument unchanged, printing the provided trace message if the argument's reference count is greater than one.
Due to the specifics of how Lean.Parser.Command.eval : command
`#eval e` evaluates the expression `e` by compiling and evaluating it.
* The command attempts to use `ToExpr`, `Repr`, or `ToString` instances to print the result.
* If `e` is a monadic value of type `m ty`, then the command tries to adapt the monad `m`
to one of the monads that `#eval` supports, which include `IO`, `CoreM`, `MetaM`, `TermElabM`, and `CommandElabM`.
Users can define `MonadEval` instances to extend the list of supported monads.
The `#eval` command gracefully degrades in capability depending on what is imported.
Importing the `Lean.Elab.Command` module provides full capabilities.
Due to unsoundness, `#eval` refuses to evaluate expressions that depend on `sorry`, even indirectly,
since the presence of `sorry` can lead to runtime instability and crashes.
This check can be overridden with the `#eval! e` command.
Options:
* If `eval.pp` is true (default: true) then tries to use `ToExpr` instances to make use of the
usual pretty printer. Otherwise, only tries using `Repr` and `ToString` instances.
* If `eval.type` is true (default: false) then pretty prints the type of the evaluated value.
* If `eval.derive.repr` is true (default: true) then attempts to auto-derive a `Repr` instance
when there is no other way to print the result.
See also: `#reduce e` for evaluation by term reduction.
#eval
is implemented, using dbgTraceIfShared
with Lean.Parser.Command.eval : command
`#eval e` evaluates the expression `e` by compiling and evaluating it.
* The command attempts to use `ToExpr`, `Repr`, or `ToString` instances to print the result.
* If `e` is a monadic value of type `m ty`, then the command tries to adapt the monad `m`
to one of the monads that `#eval` supports, which include `IO`, `CoreM`, `MetaM`, `TermElabM`, and `CommandElabM`.
Users can define `MonadEval` instances to extend the list of supported monads.
The `#eval` command gracefully degrades in capability depending on what is imported.
Importing the `Lean.Elab.Command` module provides full capabilities.
Due to unsoundness, `#eval` refuses to evaluate expressions that depend on `sorry`, even indirectly,
since the presence of `sorry` can lead to runtime instability and crashes.
This check can be overridden with the `#eval! e` command.
Options:
* If `eval.pp` is true (default: true) then tries to use `ToExpr` instances to make use of the
usual pretty printer. Otherwise, only tries using `Repr` and `ToString` instances.
* If `eval.type` is true (default: false) then pretty prints the type of the evaluated value.
* If `eval.derive.repr` is true (default: true) then attempts to auto-derive a `Repr` instance
when there is no other way to print the result.
See also: `#reduce e` for evaluation by term reduction.
#eval
can be misleading.
Instead, it should be used in code that's explicitly compiled and run.
Observing Uniqueness
This program reads a line of input from the user, printing it after replacing its first character with a space.
Replacing characters in a string uses an in-place update if the string is not shared and the characters are both contained in the 7-bit ASCII subset of Unicode.
The dbgTraceIfShared
call does nothing, indicating that the string will indeed be updated in place rather than copied.
def process (str : String) : IO Unit := do
IO.println ((dbgTraceIfShared "String update" str).set 0 ' ')
def main : IO Unit := do
let line := (← (← IO.getStdin).getLine).trim
process line
When run with this input:
stdin
Here is input.
the program emits:
stdout
ere is input.
with an empty standard error output:
stderr
<empty>
This version of the program retains a reference to the original string, which necessitates copying the string in the call to String.set
.
This fact is visible in its standard error output.
def process (str : String) : IO Unit := do
IO.println ((dbgTraceIfShared "String update" str).set 0 ' ')
def main : IO Unit := do
let line := (← (← IO.getStdin).getLine).trim
process line
IO.println "Original input:"
IO.println line
When run with this input:
stdin
Here is input.
the program emits:
stdout
ere is input.
Original input:
Here is input.
In its standard error, the message passed to dbgTraceIfShared
is visible.
stderr
shared RC String update
20.1.2. Compiler IR
The compiler option trace.compiler.ir.result
can be used to inspect the compiler's intermediate representation (IR) for a function.
In this intermediate representation, reference counting, allocation, and reuse are explicit:
-
The
isShared
operator checks whether a reference count is1
. -
ctor_
n
allocates then
th constructor of a type. -
proj_
n
retrieves then
th field from a constructor value. -
set
x
[
n
]
mutates then
th field of the constructor inx
. -
ret
x
returns the value inx
.
The specifics of reference count manipulations can depend on the results of optimization passes such as inlining. While the vast majority of Lean code doesn't require this kind of attention to achieve good performance, knowing how to diagnose unique reference issues can be very important when writing performance-critical code.
trace.compiler.ir.result
Default value: false
(trace) enable/disable tracing for the given module and submodules
Reference Counts in IR
Compiler IR can be used to observe when reference counts are incremented, which can help diagnose situations when a value is expected to have a unique incoming reference, but is in fact shared.
Here, process
and process'
each take a string as a parameter and modify it with String.set
, returning a pair of strings.
While process
returns a constant string as the second element of the pair, process'
returns the original string.
set_option trace.compiler.ir.result true
def process (str : String) : String × String :=
(str.set 0 ' ', "")
def process' (str : String) : String × String:=
(str.set 0 ' ', str)
The IR for process
includes no inc
or dec
instructions.
If the incoming string x_1
is a unique reference, then it is still a unique reference when passed to String.set
, which can then use in-place modification:
[result] def process._closed_1 : obj := let x_1 : obj := ""; ret x_1 def process (x_1 : obj) : obj := let x_2 : u32 := 32; let x_3 : obj := 0; let x_4 : obj := String.set x_1 x_3 x_2; let x_5 : obj := process._closed_1; let x_6 : obj := ctor_0[Prod.mk] x_4 x_5; ret x_6
The IR for process'
, on the other hand, increments the reference count of the string just before calling String.set
.
Thus, the modified string x_4
is a copy, regardless of whether the reference to x_1
is unique:
[result] def process' (x_1 : obj) : obj := let x_2 : u32 := 32; let x_3 : obj := 0; inc x_1; let x_4 : obj := String.set x_1 x_3 x_2; let x_5 : obj := ctor_0[Prod.mk] x_4 x_1; ret x_5
Memory Re-Use in IR
The function discardElems
is a simplified version of List.map
that replaces every element in a list with ()
.
Inspecting its intermediate representation demonstrates that it will re-use the list's memory when its reference is unique.
set_option trace.compiler.ir.result true
def discardElems : List α → List Unit
| [] => []
| x :: xs => () :: discardElems xs
This emits the following IR:
[result] def discardElems._rarg (x_1 : obj) : obj := case x_1 : obj of List.nil → let x_2 : obj := ctor_0[List.nil]; ret x_2 List.cons → let x_3 : u8 := isShared x_1; case x_3 : u8 of Bool.false → let x_4 : obj := proj[1] x_1; let x_5 : obj := proj[0] x_1; dec x_5; let x_6 : obj := discardElems._rarg x_4; let x_7 : obj := ctor_0[PUnit.unit]; set x_1[1] := x_6; set x_1[0] := x_7; ret x_1 Bool.true → let x_8 : obj := proj[1] x_1; inc x_8; dec x_1; let x_9 : obj := discardElems._rarg x_8; let x_10 : obj := ctor_0[PUnit.unit]; let x_11 : obj := ctor_1[List.cons] x_10 x_9; ret x_11 def discardElems (x_1 : ◾) : obj := let x_2 : obj := pap discardElems._rarg; ret x_2
In the IR, the List.cons
case explicitly checks whether the argument value is shared (i.e. whether it's reference count is greater than one).
If the reference is unique, the reference count of the discarded list element x_5
is decremented and the constructor value is reused.
If it is shared, a new List.cons
is allocated in x_11
for the result.