Compilers do reachability analysis (also known as dead code elimination)
as an optimization that cuts down the size of the produced
(machine/byte)-code. Mikoláš had the idea that *dead-code should
result in a warning because it is likely a bug and, moreover, if
we use code annotations we can do a more precise analysis of
what is reached and what is not*. So we implemented this in the
context of JML-annotated Java.

//@ requires x >= 0; int f(int x) { if (x < 0) return 1 / 0; return 1; }

The bytecode produced by the Sun Java compiler is the following.

int f(int); Code: 0: iload_1 1: ifge 8 4: iconst_1 5: iconst_0 6: idiv 7: ireturn 8: iconst_1 9: ireturn

As you can see, the bad division ends up in the bytecode and no warning
is produced. But from the comments we see that the intention of the
programmer is to push the obligation of providing a non-negative *x*
to the clients of *f*. What a smart compiler should do is check at all
call-sites that this obligation is fulfilled, detect that the condition
in the **if** statement will always be false, and provide a warning
saying so. That's basically what we do. (Note that we don't generate
bytecode; that's left to the Java compiler.)

The first (boring) phase is to reduce the Java code to an intermediate
representation that is much simpler. I'll handwave over this part. Suffice
to say that there are many smart people working on how to do it and that
current implementations (in ESC/Java and Spec^{#}, for example) are
satisfactory.

The intermediate representation we get to work on is an acyclic control
flow graph labeled with first order logic formulas. A node in this
graph represents a statement in the program (roughly). The state of
a program that executed a certain path in the flow graph is described by
the conjunction of all formulas on that path. If we don't care about how
a program gets to statement *s* we can take the disjunction (logical or)
of all execution paths that lead to *s*. The result describes the state
of a program about to execute *s*, irrespective of how the
control flow got there. Such a formula is called the *precondition*
of *s*. So if the graph has the
edges *a*→ *b*, *a*→ *c*, *b*→ *d*, and *c*→ *d*, then we know that the
statement (*a*∧ *b*)∨(*a*∧ *c*) is true about a program that is
about to “execute” the statement in the node labelled *d*. (Of course,
there are more efficient ways than looking at all paths.)

We then use a theorem prover to see if some statement precondition can
never be true. (Alternatively, the set of states described by it is empty.)
Theorem provers are reasonably good at answering this kind of questions.
Still, they are quite slow, and asking the question for each statement
results in an unusable extended static checker. We should be able to do
much better. For example, if the flow graph is a chain
*a*_{1}→ *a*_{2}→⋯→ *a*_{n} and we ask about *a*_{n} and the theorem
prover says it is reachable, then we might as well conclude that
everything is fine, solving the problem with 1 query instead of *n*.
More generally, the answers given by the prover can be propagated in
the flow graph using some simple rules. It should then be possible
to use these rules to decide about which statements to ask the
prover in order to minimize the expected total number of prover
queries. I'll describe this in a subsequent post.

## No comments:

## Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.