Simplifying Addition and Multiplication in Polynomials
Post Metadata
In my own research on Computer Algebra Systems (CAS), I often work some pretty cool data structures š. In this post, I want to show off one thatās responsible for simplifying polynomials. But first, I have a little secretā¦In high-school, I thought polynomials were boring! The hours I spent finding roots dragged on and on. They didnāt make pretty graphs like the trig functions. Itās only at this point in my life that I more fully appreciate them.
Simply put, polynomials make the world go round! Computing a transcendental function? Do a Taylor (or Chebyshev) approximation. Rendering letters on a screen? Interpolate points with Bezier Curves (with fun videos here and here). Integrating a function? Do quadrature with Legendre Polynomials. They even show up in places you, at first, might not expect. Alright, alright, letās simplify some polynomials!!
A Basic Tree
Our goal is to simplify the polynomial expression:
\[(-3x) + 2z + x + z + 2x + y^2\]into
\[y^2 + 3z\]Doing this by hand is natural. Because addition is associative and commutative we can scan the list of monomials and āgroup the like termsā, reorganizing the list of additions as needed. From there we realize that \((-3x) + x + 2x\) becomes \((-3 + 1 + 2)x\) which is zero and \(2z + z\) becomes \((2 + 1)z = 3z\).
This seems straight forward, but on a computer, computation is often represented as a tree. In this case:
Yikes! All of a sudden, that āsimple scanā becomes non-obvious. Lots of great research goes into simplifying these general expressions. To do so, they use sophisticated data structures and algorithms. Fortunately, we have a narrower focus of only polynomials. Letās try to follow our human intuition and really optimize for this specific use case.
Our starting point is this recursive data structure:
class Node:
pass
class Num(Node):
value: int
class Var(Node):
name: str
# A Binary Node!
class Add(Node):
lhs: Node
rhs: Node
class Mul(Node):
lhs: Node
rhs: Node
With the polynomial in question being constructed like
x = Var("x")
y = Var("y")
z = Var("z")
two = Num(2)
neg_three = Num(-3)
p = Add(
Add(
Mul(neg_three, x),
Mul(two, z)),
Add(
Add(x, z),
Add(
Mul(two, x),
Mul(y, y))))
Design guided by human intuition
The ātrickā of our human algorithm is treating all the monomials on the same ālevelā of a tree and scanning across that level. We then reordered monomials to āgroup like termsā. Finally, we simplified those āgroupsā. At each step, we used our intuition to apply associativity, commutativity, and distributivity (ACD) rules to do this. Our goal is to turn this intuition into an implementation.
Letās clarify our plan:
- flatten tree
- scan tree and group ālike termsā
- simplify coefficients of ālike termsā
Next, we translate the notion of āflatā to code. This isnāt obvious, but to get started, letās keep things simple and use a list
:
class Add(Node):
- lhs: Node
- rhs: Node
+ terms: list[Node]
The insight is that the additions can be performed in any order. The original tree structure nests the additions, prescribing a specific evaluation order and obscuring the polynomial structure. Instead, we keep a list of all the terms added together, flattening the tree. We can decide the order of evaluation later. To be clear, there is a natural order of processing from left to right, but we could choose something else like a parallel reduction. Any order is valid! Regardless, the resulting data structure with all the Add
s flattened looks like:
Now weāre ready for step 2 of our simplification procedure (āgroupā like terms). In my mind, this is putting monomials which have the same variable and power (i.e. \(x^2\)), into a bucket.
def collect_like_terms(add: Add):
like_terms: dict[Node, list[Coeff]]
like_terms = {}
for monomial in add:
# This helper does:
# y * 2 => (y, 2)
# 3 * x**2 => (x**2, 3)
(term, coeff) = split_coeff(monomial)
# put the coefficients of "like terms"
# into the same list
like_terms[term].append(coeff)
return like_terms
Finally, we can simplify the coefficient (transform \((-3x) + x + 2x\) to \((-3 + 1 + 2)x\)):
def simplify(add: Add):
new_add = []
like_terms = collect_like_terms(add)
# for each group
for (term, coeffs) in like_terms.items():
# simplify coeffs
new_coeff = sum(coeffs)
new_monomial = Mul(new_coeff, term)
# add it to our new polynomial
new_add.append(new_monomial)
return Add(new_add)
The asymptotic time complexity is \(O(2n)\) with \(n\) being the number of terms in our polynomial. This is pretty cool! and certainly better than the naive tree. But thereās a problem: in practice, polynomials can become large! To avoid using a lot of memory (and waste computation on redundant expressions), we want to simplify on every time we construct an Add
node. This means each incremental addition does a full pass over all terms, which is too costly. Fortunately, we can do better, \(O(2)\) better!!
Leaning into the polynomial
Polynomials have a very specific structure we can take advantage of: All the terms follow the pattern \(c_0x^0 + c_1x^1 + c_2x^2 + \dots + c_nx^n\). We want to group ālike termsā (things that have the same power of \(x)\). Ideally, we want \(ax + bx\) factored into \((a+b)x\). Letās unpack the multiplication and sum into two arrays:
class Add(Node):
terms: list[Node]
+ coeffs: list[int]
With our working example being encoded like:
terms = [ x, z, x, z, x, y**2]
coeffs = [-3, 2, 1, 1, 2, 1]
Where the term and coefficient that share the same index in the array are multiplied together. Weāve effectively āinlinedā the monomial into the Add
as well.
Better yet, we can more easily compute the terms that can be āgroupedā together, ideally like:
terms = [x, z, y**2]
coeffs = [0, 3, 1]
Where the coefficient 0 should be dropped.
Our task now is to design an algorithm that takes the expanded term list and generates the condensed term list. How about this:
def fast_simplify(terms, coeffs):
new_coeffs: dict[Term, Coeff] = {}
for (term, coeff) in zip(terms, coeffs):
like_terms[term] += coeff
return Add(
list(new_coeffs.key()),
list(new_coeff.values()))
Not bad, we only have a single scan over the Add
instead of two, but we still have a linear simplification algorithm thatās run on every single addition š. Interestingly, weāve used a dict
to construct a set of terms and remember how many of that term itās seen. Such a data structure is called a multiset. Seems pretty useful for simplifying polynomialsā¦
Hereās the core insight: instead of computing the multiset when optimizing, always store it. In code, this looks like:
class Add(Node):
- terms: list[Node]
- coeffs: list[int]
+ terms: dict[Node, int]
Where each monomial is encoded as a key-value pair. Pretty slick! So slick, in fact, that we donāt even need a separate simplify function. The addition function is enough:
def add_monomial(lhs: Add, rhs: Node):
# if rhs is in lhs
# return the coefficient
# otherwise return 0
count = 1 + lhs.terms.get(rhs, 0)
lhs.terms.insert(rhs, count)
Which only requires two hashes of rhs
, \(O(2)\) achieved! Itās always simplified! Something else I want to highlight is the .get(rhs, 0)
call. This creates the illusion that all possible monomials are contained in the set, but some have a count/coefficient of zero. Very smooth indeed!
Representing Addition with Multiset Operations
Stopping to reflect for a moment, weāve mapped our problem onto an existing data structure. Thatās super useful because we can likely import an existing implementation. To really showcase this, letās think of summing two Add
nodes together.
Letās start by writing out our expected inputs and outputs:
x = Var("x")
y = Var("y")
z = Var("z")
# x + 2*y
lhs = {
x : 1,
y : 2,
}
# z + 3*x
rhs = {
z : 1,
x : 3,
}
# (lhs + rhs) becomes:
# 4*x + 2*y + z
result = {
x : 4,
y : 2,
z: 1,
}
Our observations are:
- if both
lhs
andrhs
have a term, include the term by adding the respective coefficients, - if a term exists in only one side, include it unchanged.
This is a multiset union! Not only can we represent simplification as building a multiset, we can represent addition with multiset operations as well!
Customizing the multiset for Algebra
Now that we have some operations defined, letās test this structure out. What happens when we add a sequence of integers?
accumulator = Add()
for data in [1,2,3,4]:
add_monomial(accumulator, data)
resulting in:
term | coefficient |
---|---|
1 | 1 |
2 | 1 |
3 | 1 |
4 | 1 |
Considering we know how to add integers, it feels wasteful to use eight values to store a single number. Worse yet, what happen when we add a sequence of zeros (0+0+0
)?
expression | coefficient |
---|---|
0 | 3 |
Weāre using two numbers to encode no information. Now, in the abstract case of a multiset, both cases are perfectly fine. Weāve recorded that the set contains three zeros. But letās remind ourselves of our goals: weāre modeling algebra! The items in the multiset have a specific interpretation. In this case, itās the addition of all the keys multiplied by their corresponding value. Because we have this domain-specific knowledge, we can optimize away all that redundant information. The trick is to have a dedicated accumulator outside the dict
class Add:
+ accumulator: int
terms: dict[Node, int]
def add_monomial(lhs: Add, rhs: Node):Ā©
+ if isinstance(rhs, Num):
+ lhs.accumulator += rhs
+ return
count = 1 + lhs.terms.get(rhs, 0)
lhs.terms.insert(rhs, count)
Now we always accumulate literal values into this special field, preventing them from being entered into the data structure. I picture this structure as a generalized fused multiply-add.
So far so good, but the typing of terms: dict[Node, int]
feels limitedā¦ What if I need to compute 3/2 + (x / 2)
?
Hereās our next insight: instead of storing an integer values, store fractions!
+ from fractions import Fraction
class Add:
- accumulator: int
+ accumulator: Fraction
- terms: dict[Node, int]
+ terms: dict[Node, Fraction]
Our Add
node is quite expressive!
Multiplication
The multiset design relies on the AC rules of addition for real numbers. Multiplication has these properties as well! Amazingly, we can immediately repurpose our data structure!
- class Add:
+ class Mul:
constant: Fraction
terms: dict[Node, Fraction]
- def add_monomial(lhs: Add, rhs: Node):
+ def mul_expr(lhs: Mul, rhs: Node):
But whatās the interpretation of this data structure? For addition, the keys were unique terms and the values were the coefficients. Now, we have repeated multiplication, which sounds like exponentiation: (x * x * x
to x ** 3
). So the coefficients become powers! Carrying on this thread, negative multiplicity should encode division (1 / y
to y ** -1
) and fractional values would mean roots (sqrt(x)
to x ** 1/2
)! A more complicated example would be:
expression | multiplicity |
---|---|
y | -1/2 |
x+y | 1 |
Whatās still missing?
With just this design, we can model, Addition, Subtraction, Multiplication, Division, Exponentiation, Roots. Everything we need to compactly represent polynomials!! Unfortunately, there are a few gotchaās
Function Inverses
CASs are huge, sophisticated systems, but they are also very subtle. For instance, sqrt(x ** 2)
maps to x ** 2 ** 1/2
which feels like it should be x
ā¦ but this is wrong, itās abs(x)
. How should a system detect such an edge case?
Normalization
Strange redundancies creep into the system when composing addition and multiplication. For example, the special constant in a Mul
node corresponds to an Add
node dict value. Working this out more explicitly:
# x * (y**2)
mul_terms = {Var("x"): 1, Var("y"): 2}
prod0 = Mul(1, mul_terms.copy())
# (2/3) * mul_terms
prod1 = Mul(2/3, mul_terms.copy())
# 1 + prod0 + prod1
acc = Add(1)
add_monomial(acc, prod0)
add_monomial(acc, prod1)
We want to get \(1 + \frac{5}{3} xy^2\)
But instead we get the expression \(1 + \frac{2}{3} xy^2 + xy^2\)
The constant term in prod1
(2/3
) obscures the shared components of prod1
and prod0
. Unfortunately, we need careful crafted rules to prevent such cases from ever occurring. In this case, we can āunpackā the Mul.constant
term and move it to Add.terms
. Optimizers are filled with these sorts of normalization rules.
Conclusion
Reflecting on our journey, weāve started with a problem and iteratively developed a sophisticated strategy for optimizing it!
Our human intuition served us well, but ultimately, we encoded our problem into an existing data structure (i.e. realized our problem can be mapped to a multiset).
Lastly, we tweaked that data structure to suit our applicationās needs (extended the dict
values to Fractions). Encoding domain-specific knowledge is a powerful tool for optimization indeed.
I hope you enjoyed the ride! Make sure to check out how data structures like this are used in real systems (SymEngine) for addition and multiplication