The last week I worked on
FreeBoogie with Mikoláš. It's so much better to
work with someone you can communicate with than alone! Alone I
eventually get bored. Anyway, back to the story. This FreeBoogie
thing is supposed to be an open source version of
Boogie,
the static verifier in Microsoft's Spec#. Like any compiler,
Boogie has a front-end that takes high-level code and produces
an intermediate representation and a back-end that takes over
from there. The intermediate representation is a small programming
language, with syntax, semantics, and so on, that is called
BoogiePL. This language is documented in a technical report
that used to be on Spec#'s web-page.
Because BoogiePL is a language, the coupling between front-end
and back-end is loose and it is easy to support multiple high-level
languages and multiple (static) analyses at the same time.
BoogiePL is a very simple language, with no heap, but with
specifications that can be written in
first-order logic. Here is how one may implement and specify
Euclid's algorithm in this small language:
procedure gcd(a : int, b : int) returns ($ : int);
requires a >= 0 && b >= 0;
ensures (forall x : int:: a % x == 0 && b % x == 0 ==> $ % x == 0);
implementation gcd(a : int, b : int) returns ($ : int) {
var tmp : int;
entry:
goto l1, l2;
l1:
assume a == 0;
$ := b;
return;
l2:
assume a != 0;
tmp := a;
a := b % a;
b := tmp;
goto entry;
}
Notice that:
- Identifiers can contain funny characters (such as $).
- Instead of structured flow you get to use
goto.
- Instead of if you use a
nondeterministic goto and assume.
Hopefully you now have some idea of how this BoogiePL looks like
and we can get to the problem. The problem is that the
only available reference is out of sync with the Microsoft Research (MSR)
implementation. They added while, a "generic"
type hack, and then some. This post is specifically about how to
do the generic types better than they do it now, with minimal
modifications to the language.
We don't even have a heap and we want generic types? Well,
that's why we want them. Let me explain. BoogiePL is
supposed to express two things: the user program and the semantics
of the high-level language. So, for example, if you write
a Java program then the front-end should generate BoogiePL
that translates your Java code but also explains how
types work in Java, how the heap works in Java, and all the
other goodies that are lacking in BoogiePL. Let's see how
MSR describes the heap for C#.
var heap : [ref, <x>name]x;
procedure swap_int_field(obj : ref, f1 : <int>name, f2 : <int>name);
ensures heap[obj,f1] == old(heap[obj,f2]);
ensures heap[obj,f2] == old(heap[obj,f1]);
implementation swap_int_field(obj: ref, f1 : <int>name, f2 : <int>name) {
var tmp : int;
entry:
tmp := heap[obj,f1];
heap[obj,f1] := heap[obj,f2];
heap[obj,f2] := tmp;
return;
}
The heap is a bidimensional "array" that takes two indices,
a reference and the name of a field of type x, and returns
a value of type x. Here x is a
type variable whose
scope is not really clear. The expression <a>b is a
type constructor, which makes a new type out of two existing ones.
The reason they introduced these funny constructions is so that you
won't need to cast things when you work with the heap. That would be
pretty nasty since it's easy to make mistakes and miscast.
So, how to fix this mess? (If you don't think it's a mess you can
stop reading.)
After discussing some alternatives we (Mikoláš and me)
decided to introduce type variables explicitely and define their scope
as being one declaration (of a variable, of a procedure, ...).
var heap<x> : [ref, <x>name]x;
procedure swap_field<x>(obj : ref, f1 : <x>name, f2 : <x>name);
ensures heap[obj,f1] == old(heap[obj,f2]);
ensures heap[obj,f2] == old(heap[obj,f1]);
implementation swap_field<x>(obj: ref, f1 : <x>name, f2 : <x>name) {
var tmp : x;
entry:
tmp := heap[obj,f1];
heap[obj,f1] := heap[obj,f2];
heap[obj,f2] := tmp;
return;
}
As you can see the changes are small and we can do things that
are impossible with MSR's implementation. In fact, we can deal with
BoogiePL produced by Boogie by isolating the messiness into one
phase that makes type variable introductions explicit (by guessing where
they should be). This phase should go away eventually.
Parsing and name resolution for this new design is done. Typechecking
is being modified. Can you guess what's the interesting problem in
typechecking this?