Theory and Practice: Why Interfaces Should Be Fat and Should Not Contain Efficiency Guarantees

Joshua Bloch, the designer of a big part of the Java API, says that the fundamental law of API design is "to leave [something] out whenever you are in doubt." In a recent edition of the ACM Queue magazine Michi Henning gives some rules of thumb for API design, including: "An API should be minimal, without imposing undue inconvenience on the caller." We learn from Eemonn McManus that it's good to "be minimalist" and, in particular, "if your API has more methods than it needs, then it's taking up more space than it needs." That obviously sounds like something bad. It also sounds a bit like a tautology but that's beside the point. Bill Venners says that an API designer should "strive for interfaces that are minimal and complete." The first guideline in a compilation done by a fellow blogger says that a good API "has as few public members per class and as few classes as possible." These things are not hard to find: They are all top hits for a google on API design. I also remember myself arguing for "leaving things out" on more than one occasion and even in situations where the subject was life, universe, and everything, not just APIs.

So what's with the crazy title? Who would prefer slim to fat, and not knowing to knowing? The answer has three letters: STL—possibly the best API in the world. Actually, no, the answer is slightly longer.

I hate guidelines. Whatever you do please don't read the rest as a plead for some competing guidelines. No. It is simply a plead against a particular and also a plead for thought. Maybe a strangely worded one but take my word—that's what it is. Don't do something because some guideline, rule of thumb, or law tells you to do. Do it because you thought about it and you found good reasons to believe that it, for various values of it, is the right thing to do.

I'm debating the rule of slim interfaces because it is an example of a well-established rule. The trouble with well-established rules is that they are admired, applied, spread like plague, and are almost never questioned. When you question something the purpose is not to demolish it. The purpose is to understand why it stands, when it stands, and whether it stands at all. To do a good job the mindset must be "I must find a situation where this doesn't hold, a counterexample." If you fail, then you would have had first hand experience that make you see why the rule is good.

Enough digression.

The 2007 edition of the ICFP Contest was quite fun. To get going, teams had to write an interpreter for a very low level functional language. To do so efficiently it turned out that one needs fast concatenation and subrange extraction from sequences of characters. Both must be done is sublinear time. Otherwise it takes a few hours to run a program. Not fun. If you have access to TAoCP then you can find in Section 6.2.3 algorithms that tell you how to use arrays and simple logic tests to represent sequences as AVL trees. That gives you logarithmic time for both the needed operations. But it's a pain to implement. The memory needed by AVL trees is linear in the length of the sequence. It sounds good. Except it isn't. The constant factor is about 15, which means that for processing a program that takes 25MB in a file one would need about 375MB of memory. That sucks because memory is time. And virtual memory is an eternity.

C++ programmers were lucky. You can simply implement a program that uses strings to represent sequences of characters, try it, see that it is terribly slow, then replace string by crope in one place, and watch it fly. Most compilers already support ropes, a STL data structure that makes its way towards the next version of the standard. (The man pulling the ropes is Hans Boehm.) There is no subtype relation between string and crope. Why was this possible? Because both implement (almost) the same methods!

If you follow the slim rule blindly the you would not implement concatenation for strings. It is a very slow operation so the stupid user of the API should be forbidden to shoot himself by writing terribly slow code. If concatenation is what he wants then let him use ropes. Strings won't give him the means to hang himself. Besides, if he wants to use strings and only occasionally do a concatenation then he can write a simple loop that appends characters one by one. It's quite easy to do and it will make it obvious how slow that part of the code is. One should avoid expensive functions in an API. So concatenation for strings is out.

But that would prevent programmers from writing the interpreter using strings and then switching to ropes by one local change! The programmers should have used ropes in the first place, you say? But that was a hard problem. The programmers had to experiment a bit to see how fast things are. Besides, blaming the users (in this case the programmers) sounds like a lame excuse for the API designer.

I think that string should implement concatenation because (1) it can, (2) concatenation is a well defined operation (3) that is implemented efficiently by ropes, which are a similar data structure, and (4) it is sometimes needed in practice. It should also have the same name/syntax as in the case of ropes. This clearly goes against the minimalist point of view. Nevertheless, you saw an example where such a strategy leads to good results.

Now, how about not having efficiency guarantees?

My point is that the interface is the sequence and its operations. Both string and crope should be seen merely as implementations. And they should have efficiency guarantees.

So, write fat interfaces with no efficiency guarantees and have lots of implementations, all faithfully adhering to the interface. Explain the efficiency of the implementations as well as possible. Not only time. Not only asymptotics. Explain how it works, give exact formulas if possible, give the results of real benchmarks. (And tell me how it worked out ☺.)

There is one argument in the slim rule that I don't see how to crack. (Your help is welcome.) An operation that does not have a clear spec should not go in the interface, because there is a nonnegligible chance that it will change, and changes in the API are usually accompanied by havoc, some even warn about the possible extinction of life on Earth.

Bleah!

Theory and Practice

03 August 2007

Why Interfaces Should Be Fat and Should Not Contain Efficiency Guarantees

No comments:

Post a Comment