## 27 January 2006

### Comparing incomparable objects

You are a Java programmer. You have a class key whose objects are uniquely identified by their reference. In other words a good implementation for equals is:

public boolean equals(Key other) { return this == other; }


Some legacy code does have that assumption. The reason behind it is not apparent. It might be a good one or a bad one. Most of the reasons that come to mind are bad ones. The only acceptable one is that objects are immutable and are guaranteed to refer to distinct entities (e.g. different lines in a file, different variables in a problem) by the way they are constructed. We don't want to investigate the reasons right now because the assumption underlies a big portion of the code. There is simply not enough physical time to rewrite everything.

Instead we concentrate on how to do well a particular task: use keys in a persistent map. A map is the usual Java map: a set of bindings from keys to values. Persistent means that whenever we modify a map by adding or removing a binding we want to construct a new one and keep the old one. Again, we don't know for sure why keeping multiple versions is necessary but that is the interface offered by the part of the program we want to refactor.

It is time to digress a little bit. You may ask yourself why should we let an interface boundary stop us? Well, there are different interface boundaries I can think of: we may try to refactor a method, a class, a package, a set of packages, or an entire program (which amounts almost to rewriting it from scratch). Whatever boundary we set for ourselves it is likely that if we move the boundary up one level and allow ourselves to change the "internal" interfaces we could do a more aggressive refactoring. However, pushing the boundary up one level has one big disadvantage: it multiplies the amount of information we need to keep track. Above a certain threshold it is just too hard for a human being to do it. So we should place the lower and upper thresholds for refactoring considering the amount of information one needs to keep track in his head. And then do the refactoring. Of course, we need not be fascists about it: if on occasion it seems profitable to peek above or below then we shouldn't be shy to do it.

In this particular case, the code that uses the map is too tangled for me to understand in a reasonable time. So it does not seem to be profitable to understand why we need multiple versions. Furthermore, having a persistent data structure is a cool thing to implement, and since the objects are going to be truly immutable less error prone (there are currently some mysterious exceptions bubbling up from the same package).

So let's get back to the problem: how to implement a finite map that has as keys objects for which the only thing you know is that they are identified by their references.

Well we can start from either end. We could ask ourselves how can we combine the operations on keys we have available, or we could start with what operations would we need in order to implement a persistent map. The second option looks more promising.

The only way I know of of implementing a persistent map is to use (possibly balanced or randomized) search trees. For that however we need the keys to be comparable. We are left with the problem that gave the title to this post: how to compare incomparable objects?

One solution that springs to mind is to use an immutable integer to label each created object and use those, instead of the references, for comparison. (NOTE: In Java we can't actually use the references for comparison because the compaction part of the garbage collector makes it meaningless. My previous statement applies directly to languages like C/C++; for Java it is only intended to convey an intuition.)

Finally, the code.

public class Key implements Comparable {
private final static int ID_INCREMENT = 1610612741; // prime
private static int idLast = 0;
private final int idMine;

/** Put this in every constructor */
public Key() {
idLast += ID_INCREMENT;
idMine = idLast;
}

public int compareTo(Object o) {
Key k = (Key)o;
if (idMine < k.idMine) return -1;
if (idMine > k.idMine) return +1;
//@ assert this == k;
return 0;
}
}


Now you can use your favorite functional map implementation. If anyone is curious (or if I feel like it) I will post the code for that too.

You may wonder what is that funny assert doing there. Well, that is a piece of JML. It is also kind of useless since no tool will be able to reason without some extra help about that assert. It is more a subtle advertisement: that is what I work on now; and I intend to give you details soon.

You might wanna add a null pointer check before atempting to access your Key casted object's members. After all, comparing to null should always return false, shouldn't it?

rgrig said...

Quote from http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Comparable.html: "The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)".

So one should throw an exception when the parameter of "compareTo" is null.

Sorry, my bad! I wasn't paying enough attention and was thinking about equals.

rgrig said...

On the contrary: thanks for the comment. I actually got it right by chance the first time and went on to search the docs only after reading your note.

In fact, I'm not sure how that works when you have inheritance. It seems to imply that you should also throw an exception if the argument is a subclass. Consider this:

class B extends A;
A a; B b;
...
b.compareTo(a); a.compareTo(b);

The first call would throw a cast exception so the quote above suggests the second should to. It follows that my code is not correct after all.

Anyway, a contradictory guideline that I didn't find might be present somewhere in the Sun docs.

traugust@topcoder said...

Why not simply use System.identityHashCode(key)?

/**
* Returns the same hashcode for the given object as
* would be returned by the default method hashCode(),
* whether or not the given object's class overrides
* hashCode().
* The hashcode for the null reference is zero.
*
* @param x object for which the hashCode is to be calculated
* @return the hashCode
* @since JDK1.1
*/
public static native int identityHashCode(Object x);

rgrig said...

Because it's not guaranteed to give different values for different objecs as far as I know. I will double check. But not right now because I'm very tired.

traugust said...

It is guranteed.

If you are using 1.4 or higher, you would also be able to use the java.util.IdentityHashMap intruduced in 1.4 as your map implementation and would be able to use the behaviour you want without writing an extra line of code.

rgrig said...

From http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Object.html:
" As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)"
So, even if I'm almost certain that all JVM's will return some sort of memory address that will be unique, it is not guaranteed by the specs. Unless I'm missing something...

The class IdentityHashMap is interesting. I did not know about it. But, unfortunately, it appears not to be persistent (i.e. it is mutable); and that is one of my requirements because the algorithm needs different versions of the map. Cloning the whole thing is obscenely inefficient.

rgrig said...

Oh, yes.. Another reason is that by implementing your own "hash" sequence you can make shorter trees much more probable even without implementing any balancing. Tweaking that prime number an whether you use addition or multiplication is much easier than implementing AVL, RB, HB or other balancing scheme.

Robin said...

You probably want to add some synchronization in the constructor, otherwise you could end up with two distinct objects comparing as equal in a multithreaded program.

Robin said...

Regarding the inheritance problem, the solution is simple, and in fact this is the only compliant solution:

If all instances of A are comparable with all instances of A, and B is a subtype of A, then (by definition) all instances of B must be comparable with all instances of A.

Do you follow?

rgrig said...

Robin, good point about synchronization. Lucky me the eapplication has only one thread.

rgrig said...

Yes, that makes sense for subtyping. We know though that subclassing in Java is not quite the same thing. Even worse, by your reasoning subclasses can't implement finer grained compareTo methods. IOW, if you have (as above)

class A extends B ...

then in A.compareTo you cannot refer to fields added by A (see above) so the order cannot be finer grained. Now... all classes inherit Object. That kind of stinks :) (not being able to compare anything)

Robin said...

Yes, that makes sense for subtyping. We know though that subclassing in Java is not quite the same thing.

Yes, but I am a fan of the idea that subclassing should follow the Liskov Substitutability Principle.

Even worse, by your reasoning subclasses can't implement finer grained compareTo methods.

Yes, that's my point!

IOW, if you have (as above)

class A extends B ...

then in A.compareTo you cannot refer to fields added by A (see above) so the order cannot be finer grained. Now... all classes inherit Object. That kind of stinks :) (not being able to compare anything)

No, you can compare things, because Object doesn't implement Comparable.

Even equals() is the same: you can't add more discrimination. But that's not a problem with Object either, because Object.equals() is the most fine-grained comparison possible: the only thing that is equal to an object by Object.equals(), is itself! That's a really precise notion of equality!

However, it does mean that you can't say:

(1) two java.util.Collections are equal iff they have the same objects, but...

(2) two java.util.Lists are equal iff they have the same objects in the same order

- because those statements are logically inconsistent.

I stand by the principle I gave, even if some people sometimes break it in practice. Not everyone is as pure as me! ;)