27 January 2006

Comparing incomparable objects

You are a Java programmer. You have a class key whose objects are uniquely identified by their reference. In other words a good implementation for equals is:

public boolean equals(Key other) { return this == other; }

Some legacy code does have that assumption. The reason behind it is not apparent. It might be a good one or a bad one. Most of the reasons that come to mind are bad ones. The only acceptable one is that objects are immutable and are guaranteed to refer to distinct entities (e.g. different lines in a file, different variables in a problem) by the way they are constructed. We don't want to investigate the reasons right now because the assumption underlies a big portion of the code. There is simply not enough physical time to rewrite everything.

Instead we concentrate on how to do well a particular task: use keys in a persistent map. A map is the usual Java map: a set of bindings from keys to values. Persistent means that whenever we modify a map by adding or removing a binding we want to construct a new one and keep the old one. Again, we don't know for sure why keeping multiple versions is necessary but that is the interface offered by the part of the program we want to refactor.

It is time to digress a little bit. You may ask yourself why should we let an interface boundary stop us? Well, there are different interface boundaries I can think of: we may try to refactor a method, a class, a package, a set of packages, or an entire program (which amounts almost to rewriting it from scratch). Whatever boundary we set for ourselves it is likely that if we move the boundary up one level and allow ourselves to change the "internal" interfaces we could do a more aggressive refactoring. However, pushing the boundary up one level has one big disadvantage: it multiplies the amount of information we need to keep track. Above a certain threshold it is just too hard for a human being to do it. So we should place the lower and upper thresholds for refactoring considering the amount of information one needs to keep track in his head. And then do the refactoring. Of course, we need not be fascists about it: if on occasion it seems profitable to peek above or below then we shouldn't be shy to do it.

In this particular case, the code that uses the map is too tangled for me to understand in a reasonable time. So it does not seem to be profitable to understand why we need multiple versions. Furthermore, having a persistent data structure is a cool thing to implement, and since the objects are going to be truly immutable less error prone (there are currently some mysterious exceptions bubbling up from the same package).

So let's get back to the problem: how to implement a finite map that has as keys objects for which the only thing you know is that they are identified by their references.

Well we can start from either end. We could ask ourselves how can we combine the operations on keys we have available, or we could start with what operations would we need in order to implement a persistent map. The second option looks more promising.

The only way I know of of implementing a persistent map is to use (possibly balanced or randomized) search trees. For that however we need the keys to be comparable. We are left with the problem that gave the title to this post: how to compare incomparable objects?

One solution that springs to mind is to use an immutable integer to label each created object and use those, instead of the references, for comparison. (NOTE: In Java we can't actually use the references for comparison because the compaction part of the garbage collector makes it meaningless. My previous statement applies directly to languages like C/C++; for Java it is only intended to convey an intuition.)

Finally, the code.

public class Key implements Comparable {
  private final static int ID_INCREMENT = 1610612741; // prime
  private static int idLast = 0;
  private final int idMine;

  /** Put this in every constructor */
  public Key() {
    idLast += ID_INCREMENT;
    idMine = idLast;
  }

  public int compareTo(Object o) {
    Key k = (Key)o;
    if (idMine < k.idMine) return -1;
    if (idMine > k.idMine) return +1;
    //@ assert this == k;
    return 0;
  }
}

Now you can use your favorite functional map implementation. If anyone is curious (or if I feel like it) I will post the code for that too.

You may wonder what is that funny assert doing there. Well, that is a piece of JML. It is also kind of useless since no tool will be able to reason without some extra help about that assert. It is more a subtle advertisement: that is what I work on now; and I intend to give you details soon.

19 January 2006

Kilometers and miles

I have just finished solving an exercise from TAOCP and when I checked the answer I was surprised to see that it was correct. I was also extra-surprised by an application of the result suggested by Knuth.

Here are the first few Fibonacci numbers, that any programmer should know: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377.

We can use these to mentally transform miles to kilometers and viceversa. How many miles are in 120km? Well 120 = 89 + 21 + 8 + 2. The equivalent in miles is 55 + 13 + 5 + 1 = 74. How to keep track of so many numbers in your head at once? Well, you really need to keep track of four numbers at any point in time. I'll illustrate for the conversion into miles of 350km. What is the biggest Fibonacci number not exceeding 350? Well, the first 277km are about 144miles. We are left with 73km and we repeat. The next 55km are about 34miles for a total of 178miles. We are left with 22km. The next 21km are about 13miles for a total of 191miles. And we are left with 1km which is about 1mile. The answer is 192miles.

Of course the transformation is not exact because the ratio mile/km is not exactly the golden ratio. It also works best for distances lower than 200.