16 August 2008

Oh, Theta!

[Edit: Rewrote to more accurately reflect the story that prompted this post.]

I read O(f(n)) as (degree) at most f(n), Ω(f(n)) as (degree) at least f(n), and Θ(f(n)) as (degree) f(n). (I omit ‘degree’ when I’m lazy. That is, most of the time.)

If I describe amortized complexity I might say that a sequence of n operations takes at most degree n time, and each individual operation may take degree n time. I would write this as: A sequence of n operations takes O(n) time, and

  • one operation may take Θ(n) time.
  • one operation may take Ω(n) time.
  • each operation also takes O(n) time.

but not:

  • one operation may take O(n) time.

Language design

This post gives a subjective and high-level view of how languages, tools, and programmers interact.

We start with a small set of primitives that has pleasant aesthetics and provable power. Beauty implies symmetry, regularity, lack of special cases, generality. Power implies that a wide set of problems can be solved (efficiently) by combining primitives. Think of a nice assembly language with 10 instructions or so, or perhaps even something like MMIX.

There is no guarantee that solutions will be easy to find or that they will be small and elegant: We only know they exist. If the language is elegant, then programs written in that language are not automatically elegant. Nevertheless, this is the first step. We then solve problems we care about and stare at the ugly solutions we produce. No one solution is particularly ugly, but as set they are. We build libraries that encapsulate recurring patterns in the ugly solutions and rewrite those solutions into something less ugly. Adding a layer of abstraction helps. Some people say: "Let's step back. If abstraction helps then let's see what can be abstracted. How can we eliminate all redundancy? Can we abstract the process of abstracting?" I don't like this. I don't like one language to rule them all. I like variety. I like a small amount of redundancy. I like to solve existing problems, not potential future problems. The initial set of primitives addressed a problem domain: We took time to prove sufficiency. Libraries reduced redundancy that became apparent after writing solutions to a set of problems we were most interested in.

It's nice to have beautiful programs, but we should also think how we get there.

The set of primitives is nice if almost all possible combinations are potentially useful. Alas, it appears that this property is hard to maintain while going up the abstraction ladder: There are often combinations that do not make sense. If integers and integer division are primitives then x/y makes sense almost always but there is no sensible definition for x/0. The problem exacerbates for intrinsicaly longer combinations, such as those that use library abstractions. We are bound to combine constructs in nonsensical ways a few times while developing a program; more often than not, because our skull has limited volume. We can't pay attention to many details at the same time. It sounds like a job for a computer!

Tools can help essentially in three ways:

  1. A tool can try to find problem spots by looking at the program text. Type checkers do this. Some problems are really hard to spot this way, even if extra information is given by the programmer as annotations.
  2. A tool can introduce sanity checks in the code. That's how NullPointerException gets thrown in Java. (By the way, if you happen to believe that Java has no pointers, then can you please explain that name?) For both static analysis and runtime checks, combinations that are legal according to the language may still be meaningless for the programmer. Sure you can do x/2 any time, but sometimes you expect to have no reminder when you do it. You can either add in the language a `no reminder division' and use it, or keep saying x/2 and precede it by the annotation assert even x.
  3. The programmer introduces sanity checks and gets support for doing so, in the form of libraries, or code navigation utilities that point to good places to insert sanity checks. People usually call this testing.

08 August 2008

Sorting out stuff

Comparison based sorting needs Ω(lgn!)=Ω(n lgn) comparisons because each comparison yields one bit of information and there are n! permutations. If you are sorting integers then you can do better, and not only asymptotically. Radix sort works as follows:

deque<int> bucket[2][256];

void rsort(vector<int>& v) {
  int i, j, k;
  for (i = 0; i < v.size(); ++i)
    bucket[0][v[i]&0x000000ff].push_back(v[i]);
  for (i = 0; i < 256; ++i) {
    while (!bucket[0][i].empty()) {
      int &e = bucket[0][i].front();
      bucket[1][(e&0x0000ff00)>>8].push_back(e);
      bucket[0][i].pop_front();
    }
  }
  for (i = 0; i < 256; ++i) {
    while (!bucket[1][i].empty()) {
      int &e = bucket[1][i].front();
      bucket[0][(e&0x00ff0000)>>16].push_back(e);
      bucket[1][i].pop_front();
    }
  }
  for (i = 0; i < 256; ++i) {
    while (!bucket[0][i].empty()) {
      int &e = bucket[0][i].front();
      bucket[1][(e&0xff000000)>>8].push_back(e);
      bucket[0][i].pop_front();
    }
  }
  k = 0;
  for (i = 0; i < 256; ++i) {
    while (!bucket[1][i].empty()) {
      v[k++] = bucket[1][i].front();
      bucket[1][i].pop_front();
    }
  }
}

This straightforward (and somewhat repetitive) code is faster than std::sort if you sort more than one thousand integers.

Whenever someone says that radix sort works in linear time there is someone that feels compelled to point out that saying “linear time” is cheating. I really don’t care whether it “works in linear time” or not, but I can tell you that it is about two to three times faster than std::sort. We can still analyze the runtime and get a better idea of just how fast it is, but we need to be a bit more precise.

Let’s say we work on a computer with 2k bit words. Typical computers nowadays have k=5 or k=6, and the code above assumes k=5. We group the bits of a word in digits, each with 2w bits. In the example above w=3. We have b=22w buckets and d=2kw digits. The number of operations of the code above is proportional to n+(b+n)d=n+(22w+n)2kw. We can’t change how many numbers we must sort (n) or the machine on which the program runs (k). But we can change w. What is the optimum value? If we let the derivative with respect to w equal 0 we get 22w+wln2=2w+wn. That looks horrible. But it has one interesting feature: It doesn’t contain k, which means that the optimum number of bits per digit does not depend on the machine on which we do the sorting. It can be a 8 bit machine, a 32 bit machine, a 64 bit machine, or even a 8192 bit machine, we should still have the same number of bits per digit for a certain size of the list that we are sorting.

We now turn to approximations. It’s probably safe to assume 2ww and wn≫2w. We are left with 22wln2=wn, which is still horrible. Since the left side varies much faster than the right one we can replace the w on the right with a (yet unknown) suitable constant w′. We get that 2w varies as O(lgn). (Note that this is an upper bound.) If we plug this back into the running time formula we finally get O(n 2k/lgn). Here 2k is how many bits the machine has per word and lgn is how many bits we need to write down the length of the list we are sorting. So radix sort should be quite good when we have to sort lots of numbers. And indeed, as I said, it clearly beats std:sort.

Now, if you have to sort really lots of numbers, then you should probably look at funnelsort.