Parameterized Complexity of Running an E-Graph
Post Metadata
E-Graphs are a data structures that allow us to represent many equivalent programs at once. This post won’t get into background on e-graphs, so check out previous blog posts or the original egg talk. A variety of research projects at PLSE use e-graphs because they are a powerful and flexible tool for performing program optimization. Unfortunately, e-graphs suffer from a serious problem: their run-time performance is unpredictable.
In practice, e-graph performance issues are often solved by using resource limits, such as timeouts or size limits. Additionally, some projects use dynamic schedulers to prevent some rewrite rules from eating up all of these resources. While these simple and direct solutions make the performance predictable, they also make the e-graph’s results unpredictable. Resource limits make it difficult to guarantee that any particular program optimizations will be applied. Worse, adding new program optimizations results in the resource limits being triggered earlier, preventing optimizations that were previously applied.
This blog post is motivated by the desire to have predictable run-time performance and predictable optimization results from e-graphs. These two properties are especially important in the context of compilers, which perform program optimization before lowering the program to machine code. We’ve been thinking about this problem in the context of eggcc, an e-graph based compiler for Bril programs.
Stick around for a wild ride, and we’ll conclude that eggcc’s e-graph gets bigger quadratically in the number of rules. Given eggcc’s particular set of optimization rules, we can give an upper bound on the size of the e-graph of \(1000\) times the size of the original program.
Key Insight
At first, it may seem that resource limits are inevitable. A single application of a single rewrite rule may perform an exponential number of matches over the e-graph, resulting in an exponential number of new e-nodes. Running the rewrite rule again can result in an exponential number of matches. The run-time complexity of running an e-graph (even with a single rule) seems to be super-exponential in the number of iterations.
In practice, however, many useful rewrites do not perform an exponential number of matches. For example, eggcc’s loop unrolling rule matches each loop e-node in the e-graph at most once. I call these rules linear-matching, since they match a constant number of times per e-node. In fact, it’s often a performance bug rather than an intended feature when a rewrite rule performs exponential matching. On the other hand, some common rewrite rules such as associativity and commutativity do result in exponential matching (see the Herbie tool’s rewrite rules).
The key insight of this blog post is that using linear-matching rules results in a much better complexity bound compared to the super-exponential bound of the general case. It’s hard to ensure rewrite rules are linear-matching, but you can also use schedulers to ensure it. In the eggcc tool, our rules (besides program analysis) perform just as well when limited to one match per e-node. Since rewrite rules are small, we also assume that each rule adds a constant number of new e-nodes.
Later, we’ll show one more trick that further reduces the complexity: a new parameter describing the overlap between matches for different rewrite rules. We’ll call this the disjointness parameter. For now, let’s find a complexity bound for running an e-graph with linear-matching rules.
Complexity Bound
The matching phase is the most expensive part of running an e-graph. The egglog language is backed by a database of e-nodes, performing database queries in order to find matches. It uses a worst-case optimal join algorithm, which has a cost proportional to the AGM bound on the query. The AGM bound is an upper bound on the number of matches that a query can produce, based on the query and the database. In this blog post, we’ll assume an implementation that uses egglog or something similar. We’ll write the AGM bound as as \(a(x)\) where \(x\) is the number of e-nodes in the e-graph. We assume \(a\) is also computed in terms of the most complex query in the optimization ruleset. In practice, our queries run much faster than the theoretical AGM bound.
The overall complexity of running an e-graph with an initial size of \(I\), \(r\) rewrite rules, and \(n\) iterations is \(O(a(I*r^{n-1})*r)\). The first iteration takes \(O(a(I)*r)\) time to match and the resulting egraph has size \(O(I*r)\). The second iteration takes \(O(a(I*r)*r)\) time to match, and the resulting egraph has size \(O(I*r^2)\) after actions are applied. The last iteration’s matching phase dominates the complexity, resulting in the \(O(a(I*r^{n-1})*r)\) bound.
In eggcc, we typically use a number of iterations \(n = 2\), resulting in a complexity bound of \(O(a(I*r)*r)\). Avoiding resource limits or schedulers allows us to confidently say that all combinations of two rewrite rule steps are explored.
Disjointness Parameter
The above bound is a bit pessimistic,
mostly because rules don’t usually match against
all the e-nodes in the e-graph.
They usually have a top-level operator such
as a loop or an addition.
For example, a rule that matches on a loop
loop(inputs, body) = ...
or a rule
that performs constant folding for addition add(const(x), const(y)) = ...
.
This means that the number of matches for an e-graph
with size \(I\) is usually much smaller than \(I\) for each rule.
In fact, we can group multiple rules together into a single ruleset matching match-disjoint* sets of root e-nodes. Our loop rule and addition rule are match-disjoint, and can be grouped together. For an entire ruleset, we only get $I$ matches for an e-graph with \(I\) e-nodes.
Here’s where my favorite trick comes in: parameterized complexity, a technique where you describe an aspect of the problem and give a more precise complexity bound based on the description. In this case, we define the disjointness parameter \(d\) for a ruleset as the minimal size of a disjoint partition of the rules. A disjoint partition of rules is a partition of the rules such that each set of rules in the partition is match-disjoint.
What this means in practice is we can group our rules into rulesets, proving by hand that they match different e-nodes. Then, we get a much better complexity bound: \(O(a(I*d^{n-1})*r)\). Note that we substituted \(d\) for \(r\), but retained \(r\) in the final matching phase since each rule is still matched separately.
Eggcc-Specific Concerns
Unfortunately, the above analysis isn’t quite right for eggcc. Specifically, we made a few assumptions that don’t hold in practice:
- We ignored analysis, rulesets run for many iterations to learn facts about programs in the e-graph.
- We ignored helpers like substitution, which allow a single rule application to add many new e-nodes.
We’ll continute to ignore analysis, which can also be implemented as separate algorithms instead of as rules over the e-graph. The main problem is substitution.
Eggcc’s representation of programs uses regions, adapted from RVSDGs. Some rules, like loop unrolling, substitute an e-class for the argument of another e-class, essentially copying an entire sub-egraph. Other rules, like constant folding for addition, add equalties between two terms in the same region without substituting.
To analyze eggcc specifically, we break our rules into two rulesets with sizes \(s\) and \(p\). These represent substitution-performing and non-substitution performing rules. We also give a new parameter \(d_s\) for the disjointness of the substitution ruleset and a new parameter \(d_p\) for the disjointness of the non-substitution ruleset.
We also break our e-graph into one e-graph per region, using $M$ as the size of the largest region in the initial program and $R$ as the number of regions in the e-graph.
- During the first iteration, we get \(O(a(I)*(s + p))\) matches,
resulting in an e-graph with \(O(R * d_s)\) regions and \(O(M * d_p)\) e-nodes
per region.
- The resulting e-graph has a size \(O(R * M * d_s * d_p)\).
- During the second iteration, we get \(O(a(R * M * d_s * d_p) * (s + p))\),
- The resulting e-graph has a size \(O(R * M * d_s^2 * d_p^2)\).
For two iterations, eggcc’s final complexity is the sum of these two:
\[O(a(R * M * d_s * d_p) * (s + p) + R * M * d_s^2 * d_p^2)\]In practice, \(R * M\) is roughly the size of the original program.
Using really rough estimates, we have values of \(s = 11\), \(p = 20\), \(d_s = 5\), and \(d_p = 6\) for the current implementation of eggcc. Plugging these in, we expect eggcc to take time \(a(R * M * 30) * 31 + R * M * 900\). This sounds about right for eggcc, which is much slower than traditional compilers like LLVM but can still handle a 2.5k line raytracer.
Reflection
This post shows a particular path for getting a practical runtime, even when using an e-graph. I hope that it gives hope that careful co-design of implementation and complexity analysis can lead to more reliable systems in formal methods. In our case, we leveraged domain knowledge about our rules (regions, matching only once per e-node, etc) to achieve a practical runtime.
It also highlights the importance of fast queries over the database. The dominant term in the complexity bound is certainly the \(a(R * M * d_s * d_p) * (s + p)\) matching time. In practice, we’ve seen a few complex queries taking over 20 seconds to run. In these cases, we’ve rewritten the query (basically randomly) until our e-graph implementation picked a different query plan that was much faster. In the future, we hope to improve our query planning and facilities for debugging performance problems.
Future directions could include automatic disjointness detection of different rules, or more interesting schedulers that enforce one match per e-node. We could also prevent regions from getting too large, since these are probably not useful anyways.