July 3, 2009

Type Inference in C++

A short note to let you know about a long-missed C++ feature you can now use.

Since gcc 4.4 (21 April 2009) you can now skip giving the types of variables that you initialize. Do you remember the old macro

#define foreach(i, c) for (typeof((c).begin()) i = (c).begin(); i != (c).end(); ++i)

? Well, you can now write

#define foreach(i, c) for (auto i = (c).begin(); i != (c).end(); ++i)

In fact, since the definition is shorter now, you might even consider ditching the macro altogether. After all, macros cause bugs that take much time to track down.

Edit: Oh, yeah. I forgot to mention that auto will be a standard C++ feature, while typeof is a GNU extension.

July 1, 2009

Type Inference in Java, for Generics

An unexpected failure of the Java compiler. Or is it a failure of the Java type system?

Java programmers write repetitive code like

HashMap<String, Integer> foo = new HashMap<String, Integer>();

all the time. Those who seek beauty in their code may choose to use the Google collections library and write instead

HashMap<String, Integer> foo = Maps.newHashMap();

Constructors and static methods interact in very different ways with generics. If you say new Foo() you instantiate a raw type, which is quite different from a what you get by saying new Foo<Bar>(). On the other hand, static methods are the beneficiaries of a little bit of type inference, as specified by sections 15.12.2.7 and 15.12.2.8 of the Java Language Specification (Third Edition). The actual rules are complicated (and I haven't read them yet) but the idea seems to be to use two sources of information:

  • the types of the actual arguments
  • if the result is "assignment convertible" to some type T then this type T is used too.

The reason why the Google collections library works is the second bullet: The compiler uses the type of foo to figure out how to instantiate the type arguments of Maps.newHashMap.

OK, that was the background. Now let's see some examples.

Would you expect the following code to compile?

interface A {
  static enum Foo { BAR; }
  static enum Bar { FOO; }
}

class B<T extends Enum<T>> {
  private B() {}
  public static <T extends Enum<T>> B<T> get() {
    return new B<T>();
  }
}

public class C {
  public static void main(String[] args) {
    B<A.Foo> b = B.get();
  }
}

Yes, it works. The Sun Java compiler 1.6.0_13 has no problem in handling the sort-of recursive bound put on the type.

Now, would you expect the following to work?

interface A {
  static enum Foo { BAR; }
  static enum Bar { FOO; }
}

class B<T, S> {
  private B() {}
  public static <T, S> B<T, S> get() {
    return new B<T, S>();
  }
}

public class C {
  public static void main(String[] args) {
    B<A.Foo, A.Bar> b = B.get();
  }
}

Again, it works flawlessly. After all, why would multiple type arguments confuse the compiler?

So, you'd expect the following to work too, won't you?

interface A {
  static enum Foo { BAR; }
  static enum Bar { FOO; }
}

class B<T extends Enum<T>, S extends Enum<S>> {
  private B() {}
  public static <T extends Enum<T>, S extends Enum<S>> B<T, S> get() {
    return new B<T, S>();
  }
}

public class C {
  public static void main(String[] args) {
    B<A.Foo, A.Bar> b = B.get();
  }
}

Tough luck, though: It doesn't work. Apparently "no instance(s) of type variable(s) T,S exist so that B<T,S> conforms to B<A.Foo,A.Bar>". I would have thought that T=A.Foo and S=A.Bar would do the job.

As I said, I haven't read yet the relevant sections in the Java specification to see if this a problem with the language or a problem with the compiler. The thing is that the two sections take 10 and a half screens on my monitor, which signals that I should put aside maybe one whole day for it, and I don't have one whole day to spend. My hope is that it is a bug in the compiler, because otherwise I'd loose some confidence in the language itself. After all, there's something fishy about a 10-pages algorithm that doesn't end up trying the obvious solution.

May 30, 2009

BDD sucks, BDD rocks

Whining mode.

BDD (Behavioral Driven Development) is the most idiotic thing I've ever heard of. It is a buzzword that stands for something I completely don't understand. So, how can I say in such strong terms that it is idiotic if I don't even know what it is about? Simple: It breaks Google. Whenever I search for information about BDDs (Binary Decision Diagrams) I find complete garbage. OK, it's not that bad: Scholar gives sensible results.

May 27, 2009

Related to Transitive Closures

A problem that Mikolas bumped into.

Alice has a graph G that is closed under transitivity. Bob wants to reconstruct the graph G but Alice would only answer with YES or NO to questions of the form "is there an edge from node m to node n?". Help Bob achieve his task with as few questions as possible.

There are two possible cases. One is that Bob must aim for a minimum number of questions for the worst possible graph G. (Equivalently, Alice is evil and she keeps changing the portions of the graph G she didn't "commit" to yet, just so that Bob needs to ask more questions.) The other is that Bob aims for an expected minimum, thinking that Alice drew the graph randomly from some bag of graphs. (In the simplest case the bag is a set of all possible graphs.)

The number of nodes N is known in advance to Bob.

Any pointers?

May 25, 2009

Puzzle Answer: Postfix Expressions

A solution to a two-month old puzzle.

A while back I posted a puzzle asking for a recursive and stackless postfix expression evaluator. A bit over a week ago I received a solution from Ashutosh Mehra. (Go on, have a look at his blog. It's quite nice.)

At first sight this seems like a lowly coding exercise. And it is, but with a catch: It is designed to illustrate two fundamental ideas that should be part of every programmer's ethos.

  1. Recursion is equivalent to loops.
  2. Recursion is implemented with stacks.

If one of these two fairly obvious statements is not built into you, then you'll have trouble finding a solution. The first statement implies that any program can be written without using loops. How? First, write all the loops using only while. Second, replace each loop while (C) B; with a call to a new function loop defined as follows:

void loop(ARGS) {
  if (!C) return;
  B; loop();
}

The condition C and the body B will probably refer to some local variables. In order to have them accessible for reading and for writing from loop we send as arguments ARGS pointers to them, and modify C and B accordingly.

Exercise: What to do if B contains break? (Note: The solution to this is supposed to be easy if the previous two paragraphs were understood.)

What about the stack? We also want to get rid of the stack.

The solution is again easy if you remember that local variables (and function arguments) come from a "stack". So the key idea is to use the call stack as the operand stack. This said, enjoy the solution.

#include <stdio.h>
#include <ctype.h>

int add(int x, int y) { return x + y; }
int sub(int x, int y) { return x - y; }
int mul(int x, int y) { return x * y; }
typedef int (*op_t)(int, int);
op_t op[256];

int c; /* last read character */

void read_digits(int* y) {
  if (!isdigit(c = getchar())) return;
  *y = eval(*y, c - '0');
  read_digits(y);
}

int eval(int x, int y) {
  read_digits(&y);
  return op[c](x, y);
}

int main() {
  op['+'] = add, op['-'] = sub, op['*'] = mul, op['|'] = add;
  printf("%d\n", eval(0, getchar() - '0'));
}

When the function eval(x, y) is called, the top value in the operand stack is y and the value just before is x. While the next token is a digit it calls itself recursively eval(y, new_digit). The "while" part is implemented by the recursive function read_digits. When the token that says what operation should be performed between x and y is read, the operation is performed and the result returned.

This variant is somewhat wasteful of memory.