One-hole contexts
Post Metadata
A one-hole context is a simple data structure which manages term rewriting. Given a term \(t\) and a subterm \(s\) inside \(t\) that we want to rewrite, we can create a one-hole context that represents “\(t\) with a hole at \(s\)”. After separately rewriting \(s\) into \(s’\), we can then plug \(s’\) into that hole, to reconstruct \(t\) but with \(s\) swapped with \(s’\).
It’s fairly straightforward to manually create a one-hole context for an individual term or type, but it’s interesting to see how this construction can be generalized to generic algebraic data types. In this exploration, we’ll work through a few simple examples and touch on a surprising connection between these contexts and the rules of (continuous) differentiation from introductory calculus.
Term rewriting
Suppose we want to simplify an abstract syntax tree.
In particular, let’s say we have a binary Term class where a term is either a Const,
which has no children and contains an integer val,
or an Op, which holds two child terms and a string op.
from dataclasses import dataclass
class Term:
pass
@dataclass
class Const(Term):
val: int
@dataclass
class Op(Term):
op: str
l: Term
r: Term
The expression 50 + ((((10 + 20) + 30) + -30) + -10) can be represented by this syntax tree:
which can be encoded as this term:
Op('+', Const(50),
Op('+', Op('+', Op('+', Op('+',
Const(10), Const(20)), Const(30)), Const(-30)), Const(-10)))
Let’s say we only know how to cancel a Const integer with its negative,
and so we would like to move the 30 / -30 and 10 / -10 close to each other for cancellation.
We might try to right-associate the orange + node, which we’ll call the focus.
How can we rewrite the tree at this focus?
Rewriting mutable terms
If Term is mutable then this is easy: we just directly navigate to the focus (at term.r.l) and perform the rewrites in-place.
def right_associate_plus_inplace(t: Term):
match t:
case Op('+', Op('+', l, m), r):
t.l = l
t.r = Op('+', m, r)
case _:
raise NotImplementedError
term = Op('+', Const(50), Op('+', Op('+', Op('+', Op('+',
Const(10), Const(20)), Const(30)), Const(-30)), Const(-10)))
right_associate_plus_inplace(term.r.l)
right_associate_plus_inplace(term.r.l)
After right-associating at the focus twice, we get:
This is great if we have mutable data structures where we just want to perform in-place updates.
Rewriting immutable terms
However, what if Term is immutable?
This immutability could be helpful, for example, if we want to roll back rewrites or reuse old versions of the term.
An immediate way we could implement this on an immutable data structure is to rebuild the entire term each time we want to rewrite it.
In order to reconstruct a Term after rewriting, we have to keep a log of the operands we pass by as we navigate from the root to the specific subtree that we want to rewrite at (i.e, the focus).
So if we are navigating into the left subtree of an Op, we have to record: (1) the fact that we went into the left subtree and not the right one; (2) the op value of the node; and (3) the right subtree that we skipped.
Once we are done rewriting, we can reconstruct the entire tree by going back up the log to rebuild each Op from the focus to the original root.
def right_associate_plus(t: Tree):
match t:
case Op('+', Op('+', l, m), r):
return Op('+', l, ('+', m, r))
from typing import LiteralString
type Step = LiteralString['l'] | LiteralString['r']
def pure_update(t: Term, path: list[Step], rewrite):
step_log = []
for step in path:
match step:
case 'l':
step_log.append((step, t.op, t.r))
t = t.l
case 'r':
step_log.append((step, t.op, t.l))
t = t.r
new_tree = rewrite(t)
for (step, op, other_tree) in step_log[::-1]:
match step:
case 'l':
new_tree = Op(op, new_tree, other_tree)
case 'r':
new_tree = Op(op, other_tree, new_tree)
return new_tree
term = Op('+', Const(50), Op('+', Op('+', Op('+', Op('+',
Const(10), Const(20)), Const(30)), Const(-30)), Const(-10)))
term2 = pure_update(term, ['r', 'l'], right_associate_plus)
term3 = pure_update(term2, ['r', 'l'], right_associate_plus)
This is great! This algorithm allows us to rewrite immutable Terms, although each time we call pure_update we rebuild the entire term.
As such this algorithm can be unnecessarily costly for deeper, repeated rewrites.
One-Hole Contexts
For the cases where we want to repeatedly rewrite at the same location, we can factor out this rebuilding process by pre-calculating the step_log at a particular focus.
@dataclass
class OneHoleContext:
step_log: list[tuple[Step, str, Tree]]
@classmethod
def create(cls, t: Tree, path: list[Step]):
step_log = []
for step in path:
match step:
case 'l':
step_log.append((step, t.op, t.r))
case 'r':
step_log.append((step, t.op, t.l))
t = t.r
return OneHoleContext(step_log), t
def fill(self, new_tree):
for step, a, other_tree in self.step_log[::-1]:
match step:
case 'l':
new_tree = Op(a, new_tree, other_tree)
case 'r':
new_tree = Op(a, other_tree, new_tree)
return new_tree
term = Op('+', Const(50), Op('+', Op('+', Op('+', Op('+',
Const(10), Const(20)), Const(30)), Const(-30)), Const(-10)))
context, subterm = OneHoleContext.create(term, ['r', 'l'])
subterm2 = right_associate_plus(subterm)
subterm3 = right_associate_plus(subterm2)
new_term = context.fill(subterm3)
We have factored out the step_log into a class called OneHoleContext, which contains operations for creating the log from a path to a subterm of interest, and reconstructing the term given a new subterm at the end of that path.
Intuitively a OneHoleContext can be thought of as a Term with a hole:
Concretely, this particular one-hole context is encoded as:
OneHoleContext([
('r', '+', Const(50)),
('l', '+', Const(-10)),
])
To summarize, we decompose the initial term into a one-hole context and a subterm:
Then we rewrite the subterm until we get a final rewritten subterm:
And then we fill in hole in the context with the new subterm, at which point we reconstruct the tree:
So this is great! Using a OneHoleContext seems like a good solution for rewriting our Ops and Leafs.
One question remains: our implementation of OneHoleContext is a little specialized to our particular Term grammar — is it possible automatically generate a OneHoleContext from an arbitrary Term grammar?
For that matter, our step_log is of type list[tuple[Step, int, Tree]] for our particular Terms.
But what is the type of step_log in general? Can we express it in terms of the grammar of Terms?
Deriving one-hole contexts
Let’s see if we can schematically derive the type of step_log from Terms, so that we might generalize it.
Our Term type inductively defines a binary tree with \(\text{int}\)-type data at the leaves and \(\text{str}\)-type data at the internal nodes.
So algebraically, Term can be defined by \(T := \text{int} + \text{str} \times T^2\),
that is, a Term type is either (\(+\)) a terminal int type representing a Const, or a tuple (\(\times\)) of type (str, Term, Term) representing an Op.
Our specific one-hole context is a Term with a terminal hole.
To put a hole in a Term tree, we replace the tree with a hole if the tree is a terminal leaf, or else we put a hole in either the left or right subtree if the tree is an Op.
A one-hole context \(C[T]\) is then either: (1) a terminal hole; (2) an Op where the left subtree has a hole; or (3) an Op where the right subtree has a hole.
In symbols, \(C[T] := 1 + \text{str} \times C[T] \times T + \text{str} \times T \times C[T]\), where \(1\) is a unit type with one inhabitant.
The position of the hole is fully determined by the sequence of “left” choices and “right” choices that we take until we hit the terminal hole, so we can equivalently characterize this as a list of either: (1) a “left” choice with associated metadata; or (2) a “right” choice with associated metadata. Symbolically, \(C[T] = \text{list}\left[1 \times \text{str} \times T + 1 \times \text{str} \times T\right]\).
The type of associated metadata is the same regardless of choice, so we can write our one-hole contexts as a list of “left” or “right” choice, string data, and tree data.
That is, \(C[T] = \text{list}\left[(1 + 1) \times \text{str} \times T\right]\),
and so this derivation for the type of Term contexts recovers our step_log type: list[tuple[Step, int, Tree]].
We can try to go through the same exercise with a different data type.
Let an NTree be either a Const(val: int) or an N_Op(op: str, t: list[NTree]).
We can define an algebraically NTree as \(N := \text{int} + \text{str} \times \text{list}[N]\).
An NTree with a hole is the either: (1) a terminal hole; or (2) an N_Op where one of the subtrees has a hole.
That is, a one-hole context for NTrees \(C[N]\) is defined by \(C[N] := 1 + \text{str} \times \text{list}[N] \times C[N]\times \text{list}[N]\).
The metadata for the N_Op case is then a op string, and a list of NTrees before the one with the hole, and a list of NTrees after the one with the hole: \(C[N] = \text{list}[\text{str} \times \text{list}[N] \times \text{list}[N]]\).
Therefore the type of one-hole contexts for NTrees is list[tuple[str, list[NTree], list[NTree]]].
The type of one-hole contexts
Formally, how might \(T := \text{int} + \text{str} \times T^2\) relate to \(C[T] = \text{list}\left[(1 + 1) \times \text{str} \times T\right]\)? Or \(N := \text{int} + \text{str} \times \text{list}[N]\) vs \(C[N] = \text{list}[\text{str} \times \text{list}[N] \times \text{list}[N]]\)?
From the classic paper The Derivative of a Regular Type is its Type of One-Hole Contexts, for certain classes of inductive algebraic types, the type of its one-hole contexts follow the same rules as the rules for continuous differentiation that is taught in elementary calculus.
For example, in \(T := \text{int} + \text{str} \times T^2\), we can view the right hand side as an algebraic function and take it’s derivative with respect to \(T\). This derivative is \(2 \times \text{str} \times T = (1 + 1)\times \text{str} \times T\), which is exactly the type for an item inside of a \(C[T]\) list. This is incredible!
We can even motivate the lists that appear by considering the derivative of the function \(T(x, y) := F(x, y, T(x, y))\) for \(F(x, y, T) := y + xT^2\).
By the multivariate chain rule, the partial derivative of \(T\) with respect to \(y\) is \(T_y = F_y + F_T T_y\),
and so solving for \(T_y\), we find that:
In particular for this particular \(F\), we have \(F_y = 1\) and \(F_T = 2xT\), so that:
\[T_y = \frac{1}{1 - 2xT} = 1 + (2xT) + (2xT)^2 + \dots,\]where we suggestively write the function as a geometric series.
By interpreting \(x\) as \(\text{str}\) and \(y\) as \(\text{int}\), we could interpret this as saying:
the derivative of the tree Term at an \(\text{int}\)-type Const location is equal to either an empty list, or a single tuple of type \(2 \times \text{str} \times T\), or a pair of those tuples, or a triple, and so forth.
That is, the one-hole context of \(T\), where we replace a leaf of \(T\) with a hole to be filled in later, is equal in information to a list of elements of type \(F_T = 2 \times \text{str} \times T\).
Applying these rules to \(N(x, y) := G(x, y, N(x, y))\) for \(G(x, y, N) := y + x(1 + N + N^2 + \dots) = y + x\frac{1}{1-N}\), we get \(G_y = 1\) and \(G_N = x\left(\frac{1}{1-N}\right)^2= x\left(1 + N + N^2 + \dots\right)^2\), and so:
\[N_y = 1 + \left(x\left(1 + N + N^2 + \dots\right)^2\right) + \left( x\left(1 + N + N^2 + \dots\right)^2 \right)^2 + \dots,\]thereby recovering our \(C[N]\) type.
We can view our prior schematic derivation as visually performing differentiation: the binary Op case of \(C[T]\) is derived, formally, by a “product rule”-like transformation from \(\text{str} \times T^2 = \text{str} \times T \times T\) to \(\text{str} \times C[T] \times T + \text{str} \times T \times C[T]\).
This reasoning is only suggestive for why these rules should work out (since what does it even mean to subtract or divide by a type?), and the paper itself shows the soundness of these rules, but the idea of taking a derivative at all as being related to term substitution is pretty magical!
This is the end of the post, but for those with a mathematical background, we go over a connection in math that formally connects tree manipulation with continuous derivatives.
Postscript: generating functions
There is one place in math where taking derivatives is directly related to tree manipulation: that of generating functions. We provide a (very!) abbreviated overview of the topic as relates to one-hole contexts, and encourage the interested reader to dig into this further.
In combinatorics, we often have some family of structures indexed by \(n\), and we want to count the number of those structures of size \(n\). For example, if we are studying the family of unlabeled, rooted, and ordered binary trees \(\mathcal{T}_n\), we might want to count the number of trees with \(n\) nodes \(|\mathcal{T}_n| = t_n\).
It’s useful to compactly represent \(t_n\) using the formal power series \(T(x) := \sum_{n=0}^{\infty} {t_n}x^n\). \(T\) is called the (ordinary) generating function of \(t_n\), and as a formal sum, \(T\) just acts as a bookkeeping device for the individual \(t_n\)s: the \(x^n\) coefficient of \(T\) is \(t_n\). But this can be useful for calculation since basic operations on \(T\) correspond to natural combinatorial operations on \(\mathcal{T}_n\).
For example, a tree in \(\mathcal{T}_n\) is either the empty tree or a tree with a root and two subtrees of size \(k\) and \((n-1)-k\) for some \(0 \leq k \leq n-1\), and this decomposition is a bijection. We can therefore use this correspondence to recursively count the number of trees of size \(n\):
\[t_n = \begin{cases} 1 & n=0 \\\\ \sum_{k=0}^{n-1} t_k t_{n-1-k} & n > 0\\\\ \end{cases},\]or more succinctly,
\[\begin{align*} T(x) &= \sum_{n=0}^\infty t_n x^n \\ &= 1 + \sum_{n=1}^\infty \sum_{k=0}^{n-1}t_k t_{n-1-k}x^n \\ &= 1 + x\sum_{m=0}^\infty \sum_{k=0}^{m}t_k t_{m-k}x^m \\ &= 1 + x\sum_{m=0}^\infty \sum_{j+k=m} t_k x^k t_{j} x^{j} \\ &= 1 + xT(x)^2. \end{align*}\]Formally this equation is very similar to our Term definition from earlier, but we can interpret the equation combinatorially as saying a tree is either empty, or a root of size \(1\) along with a pair of two subtrees (categorized by total size).
As it turns out, this functional equation is the defining equation for the Catalan numbers, which we can solve for in closed form as \(T(x) = \frac{1 - \sqrt{1 - 4x}}{2x}\).
To handle trees with a hole, we can look at other generating functions, such as exponential generating functions \(\sum \frac{\widehat{t_n}}{n!}x^n\) and multivariate generating functions \(\widehat{T}(x, y) := \sum \frac{\widehat{t_{n, k}}}{n! k!}x^n y^k\). These are good for counting labeled structures, such as the set of labeled trees \(\widehat{\mathcal{T}_{n, k}}\) with \(n\) internal nodes and \(k\) leaves. In this case, a labeled tree is a tree where each node and leaf has a separate ID, analogous to a pointer address or register, and we are interested in counting the number \(\widehat{t_{n, k}}\) of trees, with the \(n\) internal nodes bijectively pointing to a fixed set of \(n\) registers/labels (traditionally the labels are the set \(\{1, 2, \dots, n\}\)), and the \(k\) leaves pointing to a separate fixed set of \(k\) labels.
The formal derivative \(\frac{\partial}{\partial y} \widehat{T}(x, y) = \sum \frac{\widehat{t_{n, k+1}}}{n!k!}x^ny^k\) shifts the \(y\)-coefficients over by one, and corresponds to counting how many structures there are if we have an extra leaf label to work with. Combinatorially, we can think of this extra label as a “hole”; that is, \(\frac{\partial}{\partial y} \widehat{T}\) counts how many labeled trees there are with one of the leaves designated as a “hole”.
We can set up another recursive equation to count trees: \(\widehat{T}(x, y) := F(x, y, \widehat{T}(x, y))\), for \(F(x, y, T) = y + xT^2\), which says that a labeled tree \(\widehat{\mathcal{T}_{n, k}}\) is either a \(y\)-leaf or an \(x\)-internal node along with two subtrees. Then by the same reasoning as before, \(\widehat{T}_y = F_y + F_T \widehat{T}_y\) and so:
\[\widehat{T}_y = \frac{F_y}{1 - F_T} = \frac{1}{1 - 2x\widehat{T}} = 1 + (2x\widehat{T}) + (2x\widehat{T})^2 + \dots,\]directly corresponding to our previous type differentiation. Under the combinatorial interpretation, the number of labeled trees with an extra “hole” label for a leaf is equal to the sum of: the number of empty lists; plus the number of structures made up of a boolean choice, a choice of root label, and a subtree; plus the number of structures made up of pairs of those choices; plus triples; and so forth (categorized by the number of nodes and leaves).
Similarly, the number of labeled multi-ary trees is counted by \(\widehat{N}(x, y) := G(x, y, \widehat{N}(x, y))\) for \(G(x, y, N) := y + x\frac{1}{1-N}\). Like before, \(\widehat{N}_y = G_y + G_N \widehat{N}_y\) is solved by:
\[\widehat{N}_y = 1 + \left(x\left(1 + \widehat{N} + \widehat{N}^2 + \dots\right)^2\right) + \left( x\left(1 + \widehat{N} + \widehat{N}^2 + \dots\right)^2 \right)^2 + \dots.\]Despite being used for a seemingly unrelated task of counting, these generating functions directly give a combinatorial analogue for calculating the type of one-hole contexts via differentiation. Wowza!
Further reading
If any of the topics in this post piques your interest, you can dig into them further by lookup up:
- Zippers, another functional data structure for unpacking container types.
- The symbolic method in combinatorics, and the theory of combinatorial species, which provide a formal language and categorical foundations for manipulating schematic diagrams via generating functions.
- “Five and a half derivatives”, an overview of other derivatives that show up in programming languages.
- Brent Yorgey’s PhD thesis, which directly connects combinatorial species with algebraic data types using category theory.