Exploring In-Synthesis Verification of Hardware
Post Metadata
Getting hardware right is crucial – without correct hardware, there’s no way to guarantee that software will work as intended. To raise the stakes even higher, hardware designers really only have one shot to get it right – once the hardware’s made, you can’t really patch it! To address this importance, there are many tools which use formal methods to give strong guarantees that hardware is correct. In practice, this often entails writing equivalence proofs showing that hardware behaves the same as some high-level model. In this post, we’ll take a look at the challenges I ran into while making Gator, a project I worked on which explored how to make these proofs easier!
Post-Synthesis Verification
As stated above, correct hardware is really important. To have stronger guarantees that hardware is correct, engineers often use external tools to formally verify that their hardware is correct. But what does it actually look like to formally verify the correctness of hardware? I’ll give a high-level example that roughly mirrors what I did during my time as a verification intern at Intel.
To start, we need some hardware. Let’s say we have some accelerator
that computes the sum of four numbers, which are a
, b
, c
, and d
.
In my experience, this is implemented in low-level Verilog.
Once the hardware team has a design for this accelerator, we’ll
also need a high-level model of what the accelerator should do.
This can be written in any language, so let’s choose Python.
A high-level model for this accelerator might look like:
# Returns the sum of four numbers at time `t`.
def model(a, b, c, d, t):
return a + b + c + d
Great! Now we have a hardware design and a high-level model. The next step is to write a proof that the hardware design and the model are functionally equivalent. This proof is a formal verification of the hardware design. There are many different ways to write this proof, but here’s roughly what I wrote when I was at Intel:
forall a, b, c, d, t > 0:
assert(sim_accelerator(a, b, c, d, t) == model(a, b, c, d, t))
This proof is a simple assertion that the output of the hardware design is equal to the output of the high-level model. If the assertion passes, then we can be confident that the hardware design is correct.
Notice the funky forall
I tacked on at the beginning.
This represents the fact that these proofs need to make
sure that the hardware design is correct for all possible
inputs and all possible times. The time aspect
is particularly important, because it’s different from
software verification.
From here, we can send this proof to tools like VossII which use techniques like symbolic trajectory evaluation to verify the proof. This style of verification is called “post-synthesis verification” because it happens after the hardware design process.
So, what’s the difficulty in doing this? The reality is that creating these models and writing these proofs is hard – far harder than the simple example above! Working with these tools not only requires intricate knowledge of hardware design, but also a deep understanding of formal methods. This is a high barrier to entry if you want a guarantee that your hardware is correct!
Gator: Exploring In-Synthesis Verification
How do we it easier to have correct hardware? One way is to move the verification process earlier in the design process. In both software and hardware, this has been the motivation behind creating compilers that are correct by construction. That is, the compiler itself is verified using formal methods to ensure that the output of the compiler is correct. In a hardware context, this means that the output of the compiler (i.e., the hardware design) is functionally equivalent to the high-level model, or the source code fed into the compiler.
This idea isn’t new – verified compilation has been around for a while. But, mainstream FPGA tools aren’t formally verified, and surveys exist which show that FPGA tools are buggy!
At UW, I was very lucky to work on FPGA compilation tool called Lakeroad which touches on this problem. I’ll defer to our paper for the full details, but one of the benefits of using Lakeroad compared to other tools is that it provides equivalence guarantees for the first few cycles. That is, there’s an implicit bounded model check each time you compile your hardware design.
But what if we could go further? Can we make it so that we have a guarantee that the hardware design is correct for all cycles? This is the question that I explored for my master’s thesis.
Before we can chat about the challenges of this problem, let’s first understand a bit more about the tools I used to explore this problem. Gator is built on top of Rosette, a solver-aided programming language designed for the type of problems Gator is trying to solve (in addition to many others!). I’ll be eliding a lot of details, but the basic idea is that Gator (and Lakeroad) use Rosette’s synthesize function, which for our purposes is a black box that essentially means “find me a program that satisfies these constraints.”
So, how do we use synthesize
in the context of hardware compilation?
In Lakeroad, the call to synthesize
relied on an interpreter for
a specialized IR we were using. The overall constraint we needed is that
the hardware must be equivalent to the high-level model for the first k
cycles. In effect, the call that we made to synthesize
looked something like this:
def interp(model, t):
# Interpret the hardware design at time `t`.
constraints = [
assert(interp(hardware, 0) == interp(model, 0)),
assert(interp(hardware, 1) == interp(model, 1)),
...
assert(interp(hardware, k - 1) == interp(model, k - 1))
]
synthesize(constraints)
The key idea here is embedded in the constraints we’re passing to synthesize
.
synthesize
must find a program that satisfies all of these constraints,
which means that the low level hardware design it’s synthesizing
must be equivalent to the high level model for the first k
cycles.
Gator is about extending this idea to all cycles. In effect, it would be nice
if we could say, for some arbitrary t >= 0, that interpreting the
hardware design and the high level model are equivalent.
How do we do this? In a proof assistant, or on paper, this is an inductive proof.
That is, after proving the base case (t = 0), we assume that the
equivalence holds for some arbitrary t = n, and then show that it holds
for t = n + 1. With our current machinery, we might use Rosette’s
assume
function to do this as follows:
constraints = [
# Our base case.
assert(interp(hardware, 0) == interp(model, 0)),
# Our inductive step.
assume(interp(hardware, n) == interp(model, n)),
assert(interp(hardware, n + 1) == interp(model, n + 1))
]
synthesize(constraints)
This is where we run into some challenges. Let’s talk about them!
Challenge 1: Loops!
The first challenge in this approach is that hardware designs often have loops. Registers, for example, are hardware components that store values across cycles. A register might be wired into a loop. One particularly simple example would be a register wired into itself. When trying to reason about interpreting the hardware design, we might have code like this:
def interp_register(register, t):
if t == 0:
return 0
return interp_register(register, t - 1)
This type of code is typically not a problem when given a
concrete value for t
, because the recursion will eventually
bottom out at t = 0
. However, in the constraints
we’re passing above, the inductive step
requires us to reason about arbitrary (i.e., symbolic) values
of t
. In effect, what this means when we actually
run the code above is that we infinitely recurse
and never hit the base case. This is a big problem.
One way of solving this problem is to create
a “cache” of sorts which matches specific symbolic
values of t
to the values we’re assuming the hardware
will have. In other words, if we wanted to encode
that at time t
the value of a register is x
, instead
of writing:
assume(interp_register(register, t) == x)
we might modify our interpreter to take a cache:
cache = {
# Notice that both the hardware and the model
# are equal to `x` at time `t`!
(hardware_reg, t): x
(model_reg, t): x
}
def interp(prog, t, cache):
# Compute the value of the model at time `t`.
if (prog , t) in cache:
return cache[(prog, t)]
...
This mechanism allows us to manually encode inductive hypotheses about the hardware design. In effect, we’re manually encoding the inductive proof that we would write on paper.
Challenge 2: Inductive Hypotheses
The solution above works in my head, but in practice it’s a bit more complicated. I see two main problems with having a cache:
- The cache is hard to create. When using a verified compiler, the user shouldn’t have to manually specify inductive hypotheses. Specifying these hypotheses is a non-trivial task, and is similar to a lot of the verification work Gator is trying to automate!
- Getting the cache right is hard. In practice,
many hardware designs are pipelined, meaning
that it’s not always enough to specify
the value of a single instruction at time
t
; we might need to specify the value of multiple depending on the hardware design.
We’re reaching the limit of what I explored at my time at UW, but one possible solution to this problem is to encode hardware designs using uninterpreted functions.
The idea is: for each hardware component, create
an uninterpreted function from nat -> bitvector
,
where nat
is the time at which we’re interpreting
the hardware design. From here, the hope is that
by encoding that interp(node, t) == x
, Rosette
will be able to “push” the inductive hypotheses
through the design, without manually specifying
the cache.
I’ve just scratched the surface of the challenges that
Gator faces. Come September, I’ll be starting my PhD
at UC San Diego, where I probably won’t be working on Gator.
But I’d still love to chat about it! If you’re interested,
feel free to reach out to me at a7cheung@ucsd.edu
.
Ideally, we’ll meet somewhere in La Jolla and chat about
it over tacos.