Theory and Practice: March 2008

30 March 2008

Experiment

Labels: opinion

Tomorrow I'm heading to ETAPS in Budapest, a beautiful city.

The other day I found this while searching for my last name on YouTube:

A beautiful traditional Romanian song. Too bad that the YouTube comments are a playground for Romanian nationalists. Also, too bad that some of the images in the clip are inflammatory. Why can't we listen to a song for the music?

I urge all those whose blood pressure raised by this time to read about an improbable experiment and calm down.

29 March 2008

Debian packages

Labels: research

One of the great things about GNU/Linux is that you can install almost any program with two or three commands. Here is how it works on Ubuntu:

apt-cache search burn cd (search for applications to burn CDs)
apt-cache show nautilus-cd-burner burn k3b (get details on a few applications that look promising)
apt-get install k3b (install the one you like most)

That's it: No need to browse malware-ridden download sites. A typical distribution comes with almost 30000 packaged applications. Let's see a few details of how it works.

Ubuntu uses the Debian package system. Packages are described by meta-data (located in /var/lib/apt/lists on Ubuntu). A package is identified by a name and a version:

Package: bash
Version: 3.2-0ubuntu11

(Versions are totally ordered and the comparison algorithm is a pain.)

A package can be installed if and only if it is available and certain other packages are installed.

Pre-Depends: libc6 (>= 2.6-1), libncurses5 (>= 5.6)

The notation above is CNF in disguise: Any of the mentioned versions would do. In fact, one can use a pipe (|) to explicitly say `or'. (A dumb feature of this DSL is that both < and <= mean `at most'. To say `less than' you have to use <<.)

A package is non-broken only if it is installed and certain other packages are non-broken.

Depends: base-files (>= 2.1.12), debianutils (>= 2.15)

A package is non-broken only if certain other packages are not installed.

Conflicts: bash-completion

(A missing version means `any version'.) If the last two conditions are both met then the package is non-broken.

Finally, virtual packages are not explicitly listed and only have a name. In fact, it's easier to think of it as having a special version, say ::VIRTUAL. They are provided by (normal) packages.

Provides: c-compiler

And, by the way, there is an exception to the semantics I presented. If a package x declares that it conflicts with package y (given only by name), which is a package that x also provides or is xitself, then it really means that x conflicts with all the other packages that provide y. Sneaky.

A distribution is a set of available packages. If there is no sequence of package installation that results in package x being non-broken then package x does not fit in the distribution. Such a situation is a bug (probably in the meta-data) and we want to detect it. How? By making things simpler. The stuff above makes my head spin.

Exercise 1. Are there packages that don't fit in the following distribution?

Package: a
Version: 1
Conflicts: a

Package: b
Version: 1
Depends: a (= 1)
Provides: a

Exercise 2. Same question.

Package: a
Version: 1
Conflicts: a

Package: b
Version: 1
Depends: a
Provides: a

Exercise 3. Same question.

Package: a
Version: 2
Conflicts: a (<= 2)

Package: b
Version: 1
Depends: a (= 2)
Provides: a

Do we need to look at sequences of installations or just at subsets of installed packages? Well, packages that are part of a pre-depends cycle cannot be installed, while packages part of a depends cycle may be installable. If there is no pre-depends cycle then it's safe to look only at subsets of packages, consider `installable' and `non-breakable' as equivalent, and consider `pre-depends' and `depends' as equivalent. That's a big simplification and I'm not even 100% convinced that it's correct.

The information in meta-data can be expressed as boolean propositions. For a package x we abuse notation and consider x to be a boolean variable, true if and only if the package x is non-broken/installed.

x → φ ∧ ψ if φ is the CNF after pre-depends and ψ is the CNF after depends
x → y₁ ∨ ⋯ ∨ y_n if y₁,…,y_n are all the packages that provide x
x → ¬ y if packages x and y have the same name but different versions
x → ¬ y if package x conflicts with package y, which (has a version specification) or (has a different name than x and its provided packages)
x → ¬ z if package x conflicts with package y, which has no version specified and has the same name as package x or one of the packages provided by package x; the package z is a package that provides y and is distinct from package x

Please let me know if you think the semantics given here conflict with (the intended meaning of) the Debian documentation.

Once everything is expressed in boolean form we can simply ask a SAT solver whether a package doesn't fit in a distribution.

Further info. A lot of this stuff is covered in an article on maintaining package dependencies. That article is accompanied by many interesting tools, such as edos-debcheck and anla. In case you prefer Python to English here is a hack that reads in package meta-data and outputs CNF boolean formulas (almost) in the DIMACS format. The result is unambiguous; the translation is not perfect.

25 March 2008

Research and Internet

Labels: opinion

For research,

Internet is great because when you need some information you can find it in minutes.
Internet sucks because every time you have an idea you find out that someone else had it before.

17 March 2008

Design patterns

Labels: coding

Design patterns are names given to object-oriented coding idioms. They are great because they facilitate communication. Unfortunately, most descriptions use UML pictures surrounded by pompous text and big examples.

When I say ..., I mean code that looks like ..., sort of.

Abstract factory:

interface Thing {}

interface Factory { Thing mk(/*...*/); }

Factory method:

interface Thing {}

abstract class Base {

  abstract Thing mk(/*...*/);

  /* do stuff */

}

Builder:

class Thing {/*...*/}

interface Builder {

  void mkNew();

  void buildPartA(/*...*/);

  /*...*/

  void buidPartZ(/*...*/);

  Thing getResult();

}

Lazy factory / Flyweight:

class Thing {

  private static final Map<String,Thing> inst = new HashMap<String,Thing>();

  private Thing(String args) {/*initialize from args*/}

  public static get(String args) {

    Thing t = inst.get(args);

    if (t == null) {

      t = new Thing(args);

      inst.put(args, t);

    }

    return t;

  }

}

Prototype:

interface Prototype<T> { T clone(); }

class Thing implements Prototype<Thing> {/*...*/}

Singleton:

class Thing {

  private static inst = null;

  private Thing() {/*...*/}

  public Thing inst() {

    if (inst == null) inst = new Thing();

    return inst;

  }

}

Adapter:

interface Interface { void doA(); /*...*/ }

class Adaptee {/*...*/}

class Adapter extends Interface {

  private Adaptee adaptee;

  @Override public void doA() { 

    /* do stuff, most of the work being delegated to adaptee */

  }

  /*...*/

}

Bridge:

interface LowLevel  { /*...*/ }

interface HighLevel { void doit(); /*...*/ }

class HlImpl implements HighLevel {

  private LowLevel ll;

  public HlImpl(LowLevel ll, /*...*/) {

    this.ll = ll;

    /*...*/

  }

  @Override void doit() { /* do work, using a lot methods in ll */ }

  /*...*/

}

Composite:

interface Composite { Composite get(int index); }

Decorator:

interface Thing { void doit(); /*...*/ }

abstract class Decorator implements Thing {

  protected Decorator wrapped;

}

class DecoratorA extends Decorator {

  public DecoratorA(Decorator d) { this.wrapped = wrapped; }

  @Override void doit() {

    /* do extra stuff */

    if (wrapped != null) wrapped.doit();

  }

}

Facade:

package p;

/* The operations doX use various classes in the package p,

 * are fairly high-level, and easy to understand by someone

 * not very familiar with p.

 */

class Facade {

  public void doA() {/*...*/}

  public void doB() {/*...*/}

  /*...*/

}

Proxy:

interface Thing { void doit(); }

/* Sometimes runing on a different computer */

class RealThing implements Thing {

  @Override public void doit() { /*...*/ }

}

class ProxyThing implements Thing {

  @Override public void doit() { /* use RealThing to do most of the work */ }

}

Command:

interface Closure { void go(); }

Chain of responsibility:

abstract class Chain {

  public Chain next;

  public setNext(Chain next) {

    this.next = next;

    return this;

  }

  public void doit(int mask, String arg) {

    if (handle(mask)) reallyDoit(arg);

    if (next != null) next.doit(mask, arg);

  }

  abstract protected boolean handle(int mask);

  abstract protected void reallyDoit(String arg);

}

class ChainA extends Chain { /*...*/ }

class ChainB extends Chain { /*...*/ }

class ChainC extends Chain { /*...*/ }

class User {

  void f() {

    Chain c = new ChainA().setNext(new ChainB().setNext(new ChainC()));

    /*...*/ c.doit(mask, arg); /*...*/

  }

}

Interpreter:

interface IntExpr  { int eval(); }

interface BoolExpr { boolean eval(); }

class CompareIntExpr implements BoolExpr {

  static enum CompareIntOp { LT, EQ, GT };

  CompareIntOp op;

  IntExpr left, right;

  @Override public boolean eval() {

    switch (op) {

      case LT:  return left.eval() < right.eval();

      case EQ:  return left.eval() == right.eval();

      case GT:  return left.eval() > right.eval();

    }

    assert false; return false;

  }

}

/* and many other expressions... */

Iterator:

interface Element {}

interface Iterator {

  Element value();

  void advance();

}

interface Collection {

  Iterator first();

}

Memento:

class Thing {

  public Object save() { /*...*/ }

  public void restore(Object obj) {

    Memento m = (Memento)obj;

    /* use m to populate fields */

  }

  private static class Memento { /*...*/ }

}

Observer:

interface Observer { void update(); }

interface Subject {

  void reg(Observer o);

  void unreg(Observer o);

  void notifyObservers();

}

Strategy:

interface Strategy { void execute(/*...*/); }

class Context {

  Strategy s;

  public void execute(/*...*/) { /* delegates much to s.execute() */ }

}

Template method:

abstract class Class {

  public void doit() { /* calls smallStepX */ }

  abstract void smallStepA();

  /*...*/

  abstract void smallStepZ();

}

Visitor:

interface Ast { void accept(Visitor v); }

class AstA implements Ast { @Override void accept(Visitor v) { v.visit(this); } /*...*/ }

class AstB implements Ast { @Override void accept(Visitor v) { v.visit(this); } /*...*/ }

interface Visitor {

  void visit(AstA a);

  void visit(AstB b);

}

15 March 2008

BoogiePL generics

Labels: research

The last week I worked on FreeBoogie with Mikoláš. It's so much better to work with someone you can communicate with than alone! Alone I eventually get bored. Anyway, back to the story. This FreeBoogie thing is supposed to be an open source version of Boogie, the static verifier in Microsoft's Spec#. Like any compiler, Boogie has a front-end that takes high-level code and produces an intermediate representation and a back-end that takes over from there. The intermediate representation is a small programming language, with syntax, semantics, and so on, that is called BoogiePL. This language is documented in a technical report that used to be on Spec#'s web-page.

Because BoogiePL is a language, the coupling between front-end and back-end is loose and it is easy to support multiple high-level languages and multiple (static) analyses at the same time.

BoogiePL is a very simple language, with no heap, but with specifications that can be written in first-order logic. Here is how one may implement and specify Euclid's algorithm in this small language:

procedure gcd(a : int, b : int) returns ($ : int);
  requires a >= 0 && b >= 0;
  ensures (forall x : int:: a % x == 0 && b % x == 0 ==> $ % x == 0);

implementation gcd(a : int, b : int) returns ($ : int) {
  var tmp : int;

  entry:
    goto l1, l2;

  l1:
    assume a == 0;
    $ := b;
    return;

  l2:
    assume a != 0;
    tmp := a;
    a := b % a;
    b := tmp;
    goto entry;
}

Notice that:

Identifiers can contain funny characters (such as $).
Instead of structured flow you get to use goto.
Instead of if you use a nondeterministic goto and assume.

Hopefully you now have some idea of how this BoogiePL looks like and we can get to the problem. The problem is that the only available reference is out of sync with the Microsoft Research (MSR) implementation. They added while, a "generic" type hack, and then some. This post is specifically about how to do the generic types better than they do it now, with minimal modifications to the language.

We don't even have a heap and we want generic types? Well, that's why we want them. Let me explain. BoogiePL is supposed to express two things: the user program and the semantics of the high-level language. So, for example, if you write a Java program then the front-end should generate BoogiePL that translates your Java code but also explains how types work in Java, how the heap works in Java, and all the other goodies that are lacking in BoogiePL. Let's see how MSR describes the heap for C#.

var heap : [ref, <x>name]x;

procedure swap_int_field(obj : ref, f1 : <int>name, f2 : <int>name);
  ensures heap[obj,f1] == old(heap[obj,f2]);
  ensures heap[obj,f2] == old(heap[obj,f1]);

implementation swap_int_field(obj: ref, f1 : <int>name, f2 : <int>name) {
  var tmp : int;
  entry:
    tmp := heap[obj,f1];
    heap[obj,f1] := heap[obj,f2];
    heap[obj,f2] := tmp;
    return;
}

The heap is a bidimensional "array" that takes two indices, a reference and the name of a field of type x, and returns a value of type x. Here x is a type variable whose scope is not really clear. The expression <a>b is a type constructor, which makes a new type out of two existing ones. The reason they introduced these funny constructions is so that you won't need to cast things when you work with the heap. That would be pretty nasty since it's easy to make mistakes and miscast.

So, how to fix this mess? (If you don't think it's a mess you can stop reading.)

After discussing some alternatives we (Mikoláš and me) decided to introduce type variables explicitely and define their scope as being one declaration (of a variable, of a procedure, ...).

var heap<x> : [ref, <x>name]x;

procedure swap_field<x>(obj : ref, f1 : <x>name, f2 : <x>name);
  ensures heap[obj,f1] == old(heap[obj,f2]);
  ensures heap[obj,f2] == old(heap[obj,f1]);

implementation swap_field<x>(obj: ref, f1 : <x>name, f2 : <x>name) {
  var tmp : x;
  entry:
    tmp := heap[obj,f1];
    heap[obj,f1] := heap[obj,f2];
    heap[obj,f2] := tmp;
    return;
}

As you can see the changes are small and we can do things that are impossible with MSR's implementation. In fact, we can deal with BoogiePL produced by Boogie by isolating the messiness into one phase that makes type variable introductions explicit (by guessing where they should be). This phase should go away eventually.

Parsing and name resolution for this new design is done. Typechecking is being modified. Can you guess what's the interesting problem in typechecking this?

01 March 2008

Index

Labels: opinion, presentation

A few years ago I never used the index of a book. My method was to look at the table of content, identify the possibly interesting sections, and then skim. It works pretty well. Now I know that a good index can make searching faster, especially for high volumes of information. I think of a couple of keywords and look them up in the index. If they are there then that's it, I know the exact pages with the information I need. Otherwise I backtrack to my old approach.

This is true for computers too. If you need to search something often then it pays off to preprocess the data and produce an index.

Now I just (re)realized another advantage of indexing. It is good not only for readers but also for authors because it encourages consistent terminology. All too often I write programs and refer to things (in comments and code) by different names, eventually becoming confused myself.