I wonder why are (time and space asymptotic) complexity guarantees so sparse in the Java API.

Today I was trying to write some code that preprocesses an AST and then can quickly answer the query: "Give me the set of variables that are read/written by the piece of code represented by this AST node." There are many ways to implement this thing but one natural one is to associate a persistent set of variables to each AST node during preprocessing. [**Edit:** This sentence is wrong. See comments.] Combining two sets takes logarithmic time so preprocessing is done in linear time. There is also going to be a lot of memory sharing. Now, how to do that in Java? The first approximation that came to mind was to use `TreeSet`

and `TreeSet.addAll`

. That will obviously be much worse but... how much worse? The API documentation gives no clue about the complexity of `TreeSet.addAll`

so I had to look at the SDK code to see that it is `O(n lg n)`. I proceeded to implement a custom data structure.

In what API are they not sparse?

ReplyDeleteSTL: http://www.sgi.com/tech/stl/table_of_contents.html

ReplyDeleteIn particular, for the equivalent operation one finds: "inserting an already-sorted range into a Sorted Associative Container is an O(N) operation" (on page http://www.sgi.com/tech/stl/insert_iterator.html)

I noticed a traffic spike caused by this appearing on reddit. Yes, the "logarithmic time union of persistent sets" is wrong. Here is a discussion where I help elucidate the complexity of the implementation in the OCaml standard library. The discussion is more than three years old and I seemed to know better then. Senility is to blame :)

ReplyDeleteNote that even if you use O(n) union or O(n lg n) union the running time for the whole preprocessing is still O(n lg n). So using Java's TreeSet as opposed to persistent sets isn't that bad.

The solution I ended up using is the uninteresting one given by 808140 on reddit. I eliminate duplicates in a later phase, if needed. It seems to work fairly well.