Theory and Practice: BoogiePL generics

The last week I worked on FreeBoogie with Mikoláš. It's so much better to work with someone you can communicate with than alone! Alone I eventually get bored. Anyway, back to the story. This FreeBoogie thing is supposed to be an open source version of Boogie, the static verifier in Microsoft's Spec#. Like any compiler, Boogie has a front-end that takes high-level code and produces an intermediate representation and a back-end that takes over from there. The intermediate representation is a small programming language, with syntax, semantics, and so on, that is called BoogiePL. This language is documented in a technical report that used to be on Spec#'s web-page.

Because BoogiePL is a language, the coupling between front-end and back-end is loose and it is easy to support multiple high-level languages and multiple (static) analyses at the same time.

BoogiePL is a very simple language, with no heap, but with specifications that can be written in first-order logic. Here is how one may implement and specify Euclid's algorithm in this small language:

procedure gcd(a : int, b : int) returns ($ : int);
  requires a >= 0 && b >= 0;
  ensures (forall x : int:: a % x == 0 && b % x == 0 ==> $ % x == 0);

implementation gcd(a : int, b : int) returns ($ : int) {
  var tmp : int;

  entry:
    goto l1, l2;

  l1:
    assume a == 0;
    $ := b;
    return;

  l2:
    assume a != 0;
    tmp := a;
    a := b % a;
    b := tmp;
    goto entry;
}

Notice that:

Identifiers can contain funny characters (such as $).
Instead of structured flow you get to use goto.
Instead of if you use a nondeterministic goto and assume.

Hopefully you now have some idea of how this BoogiePL looks like and we can get to the problem. The problem is that the only available reference is out of sync with the Microsoft Research (MSR) implementation. They added while, a "generic" type hack, and then some. This post is specifically about how to do the generic types better than they do it now, with minimal modifications to the language.

We don't even have a heap and we want generic types? Well, that's why we want them. Let me explain. BoogiePL is supposed to express two things: the user program and the semantics of the high-level language. So, for example, if you write a Java program then the front-end should generate BoogiePL that translates your Java code but also explains how types work in Java, how the heap works in Java, and all the other goodies that are lacking in BoogiePL. Let's see how MSR describes the heap for C#.

var heap : [ref, <x>name]x;

procedure swap_int_field(obj : ref, f1 : <int>name, f2 : <int>name);
  ensures heap[obj,f1] == old(heap[obj,f2]);
  ensures heap[obj,f2] == old(heap[obj,f1]);

implementation swap_int_field(obj: ref, f1 : <int>name, f2 : <int>name) {
  var tmp : int;
  entry:
    tmp := heap[obj,f1];
    heap[obj,f1] := heap[obj,f2];
    heap[obj,f2] := tmp;
    return;
}

The heap is a bidimensional "array" that takes two indices, a reference and the name of a field of type x, and returns a value of type x. Here x is a type variable whose scope is not really clear. The expression <a>b is a type constructor, which makes a new type out of two existing ones. The reason they introduced these funny constructions is so that you won't need to cast things when you work with the heap. That would be pretty nasty since it's easy to make mistakes and miscast.

So, how to fix this mess? (If you don't think it's a mess you can stop reading.)

After discussing some alternatives we (Mikoláš and me) decided to introduce type variables explicitely and define their scope as being one declaration (of a variable, of a procedure, ...).

var heap<x> : [ref, <x>name]x;

procedure swap_field<x>(obj : ref, f1 : <x>name, f2 : <x>name);
  ensures heap[obj,f1] == old(heap[obj,f2]);
  ensures heap[obj,f2] == old(heap[obj,f1]);

implementation swap_field<x>(obj: ref, f1 : <x>name, f2 : <x>name) {
  var tmp : x;
  entry:
    tmp := heap[obj,f1];
    heap[obj,f1] := heap[obj,f2];
    heap[obj,f2] := tmp;
    return;
}

As you can see the changes are small and we can do things that are impossible with MSR's implementation. In fact, we can deal with BoogiePL produced by Boogie by isolating the messiness into one phase that makes type variable introductions explicit (by guessing where they should be). This phase should go away eventually.

Parsing and name resolution for this new design is done. Typechecking is being modified. Can you guess what's the interesting problem in typechecking this?

Theory and Practice

15 March 2008

BoogiePL generics

No comments:

Post a Comment