2Sum Algorithms

Post Metadata

Time for yet another floating-point blog post! Since this is my third blog post of the year, I’d like to keep this one brief. This time, I’ll be discussing “2Sum” algorithms, an obsession of mine for a number of years now, consuming a large part of an internship I did at Intel during the summer of 2023. The classic 2Sum algorithm1, attributed to Knuth and Møller, computes the floating-point addition and error term of two floating-point numbers exactly.

import fpy2 as fp

@fp.fpy
def two_sum(x: fp.Real, y: fp.Real):
    s = x + y
    xx = s - y
    yy = s - xx
    ex = x - xx
    ey = y - yy
    e = ex + ey
    return s, e

Above is the 2Sum algorithm implemented using the FPy (fpy2) package, my new domain-specific language/library for simulating numerical algorithms under different number systems. Incredibly, this algorithms relies solely on six floating-point operations, instead of hacky bitwise operations that one might expect. As a concrete example,

s, e = two_sum(1e+16, 1.0)
print(s, e) # Float('1e+16') Float('1.0')

the addition 1e+16 + 1.0 would return 1e+16 since 1.0 is too small and is thus rounded off. In contrast, the 2Sum algorithm recovers this error term exactly. The fundamental property of the 2Sum algorithm is as follows:

Property. Let s, e = two_sum(x, y). Assuming that x and y are finite floating-point numbers,

  1. s is the floating-point addition of x and y;
  2. e is a floating-point number, provided that no overflow occurs;
  3. s + e = x + y where + is exact addition over real numbers, provided that no overflow occurs.

Techniques to extend the precision of floating-point computations, such as double-double arithmetic, rely on 2Sum’s “no information loss” property (s + e = x + y). As a side note, PLSE alumnus Pavel Panchekha has an excellent series of blog posts on optimizing the 2Sum algorithm for x86 and ARM chips.

Problems with 2Sum

The 2Sum algorithm experiences overflow whenever the underlying floating-point addition x + y overflows.

LARGE = fp.FP64.maxval() # largest double-precision float
EPS = fp.Float(c=1, exp=fp.FP64.expmax) # 2 ** 971
s, e = two_sum(LARGE, EPS)
print(s, e) # Float('+inf') Float('+nan')

What is less obvious is that the 2Sum algorithm can overflow even when the addition does not overflow!

LARGE = fp.FP64.maxval() # largest double-precision float
EPS2 = fp.Float(s=True, c=3, exp=fp.FP64.expmax-1) # -3/2 * (2 ** 971)
s, e = two_sum(LARGE, EPS2)
print(s, e) # Float('1.7976931348623155e+308') Float('+nan')

For the example above, it’s the subtraction xx = s - y that experiences overflow instead. Propagating the infinity through the rest of the algorithm, the final error term e is Float('+inf') - Float('-inf'), which is NaN by the rules of IEEE 754 floating-point arithmetic.

This is an uncommon pitfall of the 2Sum algorithm, but it’s an interesting case where the implementation of the 2Sum algorithm differs from the intended behavior that one might expect. With FPy, we can express the specification of 2Sum and recover the error term:

@fp.fpy
def two_sum_ideal(x: fp.Real, y: fp.Real):
    s = x + y
    with fp.REAL:
        e = s - (x + y)
    return s, e

In the program above, the with fp.REAL statement means the nested block is executed without any rounding:

s, e = two_sum_ideal(LARGE, EPS2)
print(s, e) # Float('1.7976931348623153e+308') Float('9.9792015476735991e+291')

The two_sum_ideal function produces the same result as two_sum whenever the arguments are finite and two_sum does not experience overflow. However, the two_sum_ideal function correctly computes the error term e as long as s is finite. In fact, e is always representable in these cases, suggesting a hardware implementation of the (ideal) 2Sum algorithm is possible.

However, no 2Sum algorithm exists for directed rounding modes, such as round to zero, round away from zero, round to positive infinity, and round to negative infinity.

s, e = two_sum_ideal(1, -1e-200, ctx=fp.FP64.with_params(rm=fp.RM.RTZ))
print(s, e) # Float('0.99999999999999989') Float('-1.1102230246251565404236316680908203124999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999e-16')

The error term e is not representable in double-precision floating-point since its precision p = 664 is far above the limit of p = 53 bits.

Faithful 2Sum

However, if we drop the first property of the 2Sum algorithm, that the sum s must be a particular floating-point addition, then we can satisfy the second property again. For example, we could decide that the sum s only be faithfully rounded, that is, it could either round up or down to the nearest floating-point number in that direction.

@fp.fpy
def two_sum_faithful(x: fp.Real, y: fp.Real):
    with fp.FP64.with_params(rm=fp.RM.RTZ):
        rtz = x + y
        with fp.REAL:
            e_rtz = rtz - (x + y)
    with fp.FP64.with_params(rm=fp.RM.RAZ):
        raz = x + y
        with fp.REAL:
            e_raz = raz - (x + y)
    if fp.round(e_rtz) == e_rtz:
        # the RTZ error term is representable
        t = rtz, e_rtz
    else:
        # otherwise, the RAZ error term must be representable
        t = raz, e_raz
    return t

Above is an implementation that prefers the round-to-zero result unless the round-to-zero error term is not representable. At least one of the error terms must be representable—a property that I won’t prove here—which makes a “faithful” 2Sum algorithm possible in practice.

Conclusion

Hopefully, this post points on some interesting corner cases of the classic 2Sum algorithm and gives a small taste of similar algorithms with different properties. Unfortunately, much of my work on 2Sum algorithms is closed-source so I couldn’t write as much as I wanted to. However, I’m always happy to discuss the topic and look forward to more research in this area.

Footnotes

  1. Attribution comes from the “Handbook of Floating-Point Arithmetic” by Jean-Michel Muller et al.