Theory and Practice: Trip report: the Oregon Summer School 2006

From 11th July to 22nd July I had fun in a summer school. The focus was on language based techniques for concurrency. It was therefore full with programming languages people. They are a different species. They do not believe in specifications. They dream about type rules. They like abstract math. They feel physical pain when they see goto or pointer arithmetic.

This is the first summer school I attended: if you can, try it. You get to meet a bunch of cool people. You go to lectures, but, unlike the undergraduate lectures, you don't get bored. The pace was in fact too fast for me. If you are lucky you might even meet some guy who is an author of one of the best books you have ever read.

Let's review some of the presentations. Cormac Flanagan had three talks about slightly different subjects.

The first one was about a type system that ensures Java programs do not have data races. It does so by checking that a locking discipline is enforced: references (to potentially shared objects) have an associated lock that must be held whenever that reference is accessed. Every time someone can name a reference it must also be able to name the associated lock; the lock is part of the reference's type name What happens when you have two references pointing to the same object? We want locks to be associated with objects not references! Well, because they point to the same object they have the same type and therefore the same lock. The following is an annotated Java program.

public class Account {
    private int amount /*# guarded_by this */;
    private int reserve /*# guarded_by this */;
    
    synchronized boolean stash(int delta) {
        if (delta < 0) return false;
        if (amount < delta) return false;
        amount -= delta;
        reserve += delta;
    }
}

Now you have a flavor of what this type system might do for you. After seeing the above code you might scream (internally) ``but what about having external locks? being forced to name a lock is way to restrictive!''. They have a solution to this problem: types parametrized by values. Instead of an explanation I give an example.

public class Node /*# { ghost Object lock } */ {
    public Data data /*# guarded_by lock */;
    public Node next;
}

One rather important detail I didn't mention is that locks must be final (so this can always be used). For further details see their TOPLAS article. Trivia: Cormac was an undergraduate at UCD where I am now, and I am working on reviving the implementation of this type system and incorporating it into a verification environment. It is worth noticing Boyapati's work who designed a more expressive type system for detecting concurrency bugs by allowing more protection mechanisms and deferring their specification from class definition to object instantiation. Unfortunately, his SafeJava implementation is unavailable.

Jeff Foster did something similar for C. The core of his presentation was on how to do ML-style type inference in order to deduce may alias sets for pointers, must alias sets for locks, and lock-sets for pointers. Since they analyze C they have to wave their hands when the programmer does pointer arithmetic (which is what happens for array accesses). Still, the result seems to be usable (although I have not tried it). I highly recommend listening to Jeff talking (about type inference, in particular): I felt the pleasure of understanding advanced material.

Cormac's second presentation was about a type system to check if declared atomicity is enforced by the acquired locks. Atomicity is a much more powerful concept than race-freedom because it directly relates to proving a program correct, understanding a program, or simply to thinking about a program. Most of the time, even in a concurrent setting, when I write a piece of code I'd like to be able to understand it in isolation as much as possible. I do not want to care about concurrent threads all the time. The idea behind atomicity is to split the proof in two parts. First you make sure a piece of code is atomic (that is, it runs as if there are no concurrent threads) and then you can think about it in isolation. The nice part is that ensuring a piece of code is atomic can be automated: atomicity can be built into the machine on which you are running (for example a virtual machine), or it can be checked statically that some other mechanisms ensure atomicity. This is what Cormac's type system does. You say ``I'd like this to be atomic, can you please verify that it really is by looking at how I do locking?'' and his tool says ``you rock!'' or ``you are a moron! here is why I think so''. Cormac also gave examples that show that atomicity and race-freedom are independent concepts. Their implementation for Java is buggy because they assume sequential consistency, while Java has a brain-dead memory model in the name of performance. If there are data races but atomicity would be enforced assuming sequential consistency, then Cormac's tool does not signal an error. However, if there are data races then all hell breaks loose in Java and you can observe things that do not correspond to any interleaving (although you cannot observe causal loops).

Starting from Lipton's reduction theory Cormac shows that a piece of Java is atomic if you have a set of synchronized blocks of which only the innermost one contains other code, and that code produces at most one data race.

Let's digress a little bit to talk about memory models. Steven made it clear to me why relaxing memory models is done: shared memory is far and physics dictates that communication with it is going to be slower than what one can do inside the processor. If hardware designers are not forced to offer sequential consistency then they can hide some of these performance costs. This is bad in my opinion. It trades correctness for performance, which is almost always a bad idea. It hides performance issues from higher levels (compilers, programmers), which can deal with them in smarter ways. If a programmer knows that his shared memory accesses take a long time then he can design the application to minimize that kind of communication, which is good. Finally, speculative execution can hide much of the communication costs without giving up on sequential consistency. The only worse thing is not to have sequential consistency even in a virtual machine! If hardware designers want to give programmers something we can use they'd better spend their energies on giving us efficient transactions; then it will be easier to use multi-core architectures and the performance problem will be attacked the way it should.

Cormac's final presentation was about Atomizer. It is a run-time checker for atomicity violations. Basically they add in a bunch of calls to a library whenever a lock is acquired or released and whenever a reference is accessed and whenever an atomic section is entered or exited. The library reports any detected violations. The check is not sound but catches much more violations than those that actually occur in the current execution. Many run-time checks, including Atomizer, and even some static checks for concurrency bugs are based on the lockset algorithm originally used in Eraser.

Dan Grossman continued the story of atomicity. He is the main scientific organizer of the summer school (great job!) and a very good speaker. Because of some flight connection problems I missed his first lecture. The second one was about semantics of atomicity. I said earlier that atomicity means that a piece of code runs as if no other threads are running in parallel. Apparently some people need convincing that this is the right semantics. For example Dan spent quite some time arguing that the effects of an atomic section should not be rolled back if an exception is thrown inside. His best argument was: ``if we add atomic annotations to a single-threaded program we wouldn't expect semantics to change''. His second presentation was about an implementation of atomicity at the language level (AtomCaml). The task was easier because OCaml is inherently kind of single-threaded; the main achievement was a very efficient implementation.

The last lecture of Matthew Flat was about thread termination. As you can see from his web page he is a Scheme guy. As such, he likes interactive development and having mechanisms to recover from bugs, because they do exist. One instance of this problem is: how to kill a thread that went wrong without killing the whole application. That's not an easy problem. When you commit murder you need to also clean-up. That means you need to keep track of the resources that are `managed' by a thread. They do this in Scheme by using a `custodian'.

Matthew's presentations are way cool because he writes and/or runs code during the presentation so you can really see what he is talking about. The drawback is that you can get lost quickly if you miss part of the presentation. I missed the first lecture because I wasn't feeling well. As a consequence I didn't really understood his implementation of Concurrent ML features in Scheme. It's some kind of framework for doing event based concurrent programming. It would be interesting at some stage to explore how it compares to OCaml's Event. One nifty thing he did was to show that using channels and messages you can cleanly implement a set of regular expressions that run in parallel on an input channel, while storing only the necessary incoming data.

Michael Hicks talked about dynamic update and about futures. I'm going to skip the dynamic updates part. Futures are an old idea that is now making its way in mainstream programming languages. Many concurrent applications can be structured as a rooted tree of threads in which only connected nodes communicate.

Object result = future func(x, y, z);
// ...
result.doit();

The function func in the code above is going to be executed (possibly) in another thread, and the current thread will wait for its result only when that result is later needed. If you make sure the communication between threads is done only by arguments and return values it's pretty hard to get data races and impossible to get deadlocks. And you can do much using only this model. But you are not forced: you can still communicate thru shared memory if you want and then locking is your responsibility.

What Mike's team did is a framework that generalizes this proxy approach by doing bytecode transformations. You can for example implement lazy evaluation in the same way. Wherever the keyword lazy is seen (note: this is really a do-nothing method call that acts as a tag) a closure is created. It is executed only when the result is needed. For each future a thread that starts execution immediately is created and when the result is needed the parent synchronizes with the child thread to get the answer. It's basically the same processing.

I'm pretty unhappy thou that they did not do a clean language design for futures: interaction with exceptions can cause pretty bad things to happen. Exceptions thrown from futures are not propagated. It's pretty damn hard to do this once they allow you to exit the method that created the future without waiting for the result.

The cilk and jCilk approach is less general. But the designers are concerned with having a clean interaction with exceptions and with efficiency.

Let's see what parallelism is. We'll use a simplified model of concurrent programs. Most notably it ignores communication costs. Imagine a DAG whose nodes represent one unit of work and whose arcs represent time dependencies: if there is an arc from A to B then A must run before B. We have P processors. What is the best running time we can obtain by allocating units of work to processors (that is, scheduling threads) in a smart way? If we denote the optimum time by T_P(N) then both T₁(N)/P and T_inf(N) are lower bounds: you can't have a super-linear increase in speed and you can't do better than you can do with an unlimited supply of processors. Now we can prove that a greedy schedule is within a factor of two from the optimum because it results in a running time which is at most T₁(N)/P + T_inf(N). At each time-step you either schedule P tasks or you reduce by one the longest remaining path, which is also called remaining span.

This upper limit will keep decreasing as you add processors as long as the number of processors is (much) smaller than T₁/T_inf. So parallelism is work over span. A parallelism which is logarithmic in the size of the input is said to be puny. This analysis gives a quick way to estimate asymptotic efficiency of an algorithm as you add more and more processors.

On Saturday we had a lab session with a small contest coordinated by Bradley Kuszmaul: implement matrix multiplication as fast as possible using cilk. Nels, Alan, and me were in the same team. Let's say we didn't do well. (Stupid me, I've spent about one hour trying to find a bug in my implementation while the problem was in the test-suite they provided: doubles tested for equality; why didn't I thought earlier about printing what we get and the reference values to compare them myself?) It was fun nonetheless.

On Sunday we went to Crater Lake. It's a very nice place. It was fun and relaxing; but I had to take a nap on the bus because it was also tiresome.

Here is a list of some guys with whom I had rather consistent discussions during the summer school and who are not mentioned above. Ezra works on Links and was at Amazon before. Kiran works on automatically parallelizing event-based C applications; he worked for a looong time developing software for network equipment and he studied business as an undergraduate! John has done some work on type-safe marshaling on OCaml and is interested in formalizing (and developing) routing protocols. Avi (Shinnar) did work on automatic cleanup and re-throwing of exceptions and related compiler optimizations. Tom is a fun guy to talk to because he poses interesting questions but I'm not really sure what he does :) David is a hardcore theory guy: category theory, pi calculus, stuff like that. Everyone was great. I just didn't had time to talk with all, especially since I'm a slow starter. (Oh, and Umar thought I'm a girl from my topcoder profile. Go figure.)

For a different perspective see Ezra's blog.

Speaking of blogs, this is the latest interesting one I've found.

Theory and Practice

10 August 2006

Trip report: the Oregon Summer School 2006

No comments:

Post a Comment