28 September 2009

Ambiguity in CLOPS

A problem we hit while working on CLOPS. We know a good solution but it makes a nice puzzle.

Problem. A digraph has letters on edges, so there is a word corresponding to each walk. Given two nodes m and n, are there two distinct walks mn that have the same corresponding word?

Examples. We are looking at walks from blue to red.
In the graph

there are no two distinct walks with the same word.

If the graph


has the letter x on all edges then there are two distinct walks corresponding to the word xxxxxxxxxxxxxxxxx (length 17).

Comment. This problem amounts to finding whether a CLOPS input file, which is a regular grammar for what can go on the command line of a program, is ambiguous.

27 September 2009

My most beautiful result

Celebrity gossip and physics mix well

A short review of Gina.

I have recently finished reading Gina says a (free) book by Gil Kalai. My opinion: not light bedtime reading! Indeed, I started reading at 11pm and finished around 3am... because I couldn't stop. But I was joking: It is fairly light reading. It is also quite depressing and should be hidden from high-school students trying to choose a career path. (Yes, I'm joking again: I don't really think hiding data is a good way of attracting kids to science.) On the other hand, I think it has great potential in popularizing some nice mathematical results.

If you like gossip about celebrities you will certainly love this book!

There is no segfault: sort of an answer

[edit: updated link to dead cat]

My attempt at answering my own puzzle.

But first I should acknowledge that the title comes from the song There is no dead cat. Second, I should note that my presentation of what induction means is unnecessarily redundant: The base case is included in the induction step given that there is a smallest element. (And, yes, that’s my strange sense of humor.)

OK, now the challenge: Under which conditions can we conclude P(n) for all nS given that ∧kn P(k) implies P(n) for all nS?

Read literally the question is not interesting: “Under which conditions p implies q?” has the obvious and useless answer “q, since pq implies q.” So, why is this an interesting question? Because we reduce the former to the later whenever P(n) has the form “the piece of code n is OK,” for some value of OK, one example being “does not dereference a null pointer.” Another reason that makes the question interesting to me is that I feel we are justified in looking at code one function at a time to assess whether it “goes wrong” but I don’t know it, because I never read or gave a coherent explanation: The matter is always treated as “obvious” as far as I can tell. Also, I formulated the problem in more general terms so that the answer has a chance of being useful in a different setting than program verification. Finally, I vaguely remembered reading that modular reasoning about a system is a form of coinduction: After reading a little about coinduction I’m not so sure anymore. (I looked at slides introducing coinduction in Coq by Yves Bertot and at blogs like n-category and a neighborhood of infinity. Reading through this tutorial is slow.)

The intuitive reason why we can analyze program pieces one at a time is that any counterexample is an execution that has one place where null is dereferenced. A bit more precise: An execution is a sequence of states. A state is a pointer to the next program statement to be executed plus the content of the memory (that is, a map from variables to values). The wrong state is special. The assumption ∧kn P(k) amounts to “ignore executions that go to the wrong state after executing a statement of a module kn.”

Problem. Can we use modular reasoning to put upper bounds (other than zero) on the number of errors that can occur in an execution? For example, is it possible even in principle to conclude that “this program may break its contract at most three times during its execution” without doing a global analysis? I suspect not.

Is there a useful way to generalize the answer with “executions”? Since I’ve been reading about model theory I’m inclined to view the program as a theory whose models are its executions. In this language the trick is that we can systematically explore parts of a model where a formula may not hold, assuming that in all other parts it does hold.

For the record, I’m not satisfied with my answer.

21 September 2009

There is no segfault

A short puzzle related to induction.

To prove that all elements of a set S have a property P we can use induction:

  • [induction step] we show that the elements of size s have the property P, assuming that all elements of size <s have it, and
  • [base case] we show that the smallest elements have the property P

For example, if S is the set of natural numbers and the size of a natural number is itself, then we can prove that 2Σknk=n(n+1) by showing that:

  • [base case] 2Σk0k=0, and
  • [induction step] 2Σk<nk=n(n-1) ⇒ 2Σknk=n(n+1)

As another example, if S is the set of (non-empty) binary trees in which each node either is a leaf or has exactly two children then we can show that the number of leafs is one more than the number of non-leafs by induction. In this case the size is the number of nodes in the tree. If we denote by l(t) the number of leafs in the tree t and by k(t) the number of non-leafs, then the statement we want to prove is l(t)-k(t)=1 for all trees t. Induction says that it suffices to show:

  • [base case] l(•)-k(•)=1, and
  • [induction step] (l(t1)-k(t1)=1 ∧ l(t2)-k(t2)=1) ⇒ l((t1,•,t2))-k((t1,•,t2))=1

You may want to take a moment to convince yourself (if you haven't already) that the base case and the induction step are fairly easy to check for both examples above.

But there is another type of induction! Suppose you have a program Q that is a set S of functions and you want to prove the following theorem: Program Q will not dereference a null pointer. One way to approach this task is to prove for each function that it will not dereference a null pointer ("throw NullPointerException" in Java, segfault in C, etc.) assuming that all the other functions in the program do not dereference a null pointer.

However, there is something troublesome about this. Until now we had an order: We think about the first proof by induction as first showing that an identity holds for n=0 then for n=1 then for n=2,... and whenever we get to proving for some value we rely on what was proven "before". But the last proof enjoys no such property: We can happily have a program made out of n functions, each calling all the n functions. Let's put it differently. Say we have four functions: a, b, c, and d. Is it possible for a, b, and c to be OK but d not? No, it isn't, because one of the proof obligations was to show that d is OK given that the others are. But... what's stopping c and d to be both not OK?

Still, I claim the proof is fine. Can you see why it works? What are the general conditions under which such proofs work?

14 September 2009

A Good Companion

A short "review" for the The Princeton Companion to Mathematics.

The Princeton Companion
to Mathematics
Sample Entry: Fermat's
Last Theorem
Podcast interview with
editor Timothy Gowers

Saturday I found a bookshop that I like. It has four floors: basement (bargains), ground (fiction), first (humanistic subjects), second (scientific subjects). This is contrast to the usual fiction and fiction bargains, plus some nude pictures counting as "art".

I stayed on the second floor for five minutes before deciding to buy The Companion. It is remarkable that it is expensive: 80 euro. It is also remarkable that within those five minutes two other people decided to buy it.

Compressed preface. The book leans towards examples rather than formalism. The core is a set of chapters on branches of mathematics that are being developed actively by today's researchers, such as, arithmetic geometry, numerical analysis, and theoretical computer science. Other parts worth mentioning are the concept index (useful for when you are too ashamed to admit you are completely lost in someone's argument) and the set of biographies of mathematicians. I'll let the other goodies surprise you.

My opinion after one hour. Contributors were chosen by expertise and expository skill so I was surprised to not see Knuth. I guess he's getting old and needs to finish his book. The typesetting uses Palatino fonts and is really well done. (Even the annoyingly small parentheses that Terence Tao uses have been fixed and now have the proper size. :p) OK, now let's get to less trivial things! The book is awesome! It's something I would have absolutely loved to have when I was in high-school and it's something that I will use as a bed-time reading for a while. It is very different in flavour from Knuth's books, though. It covers a lot of breadth and intuition but does not go into depth. That is intended, since in-depth study is fairly easy to carry on now that most scientific articles are available online. The big picture, however, can be acquired only by attending conferences, and few are lucky enough to be able to do it.

PS: It's a "review" and not a review because a proper review is done after you read the whole book. That won't happen any time soon.

Facebook Politics

It's interesting how otherwise smart people get worked up by irrational arguments, usually involving politics or religion. A colleague of mine was flooding my Facebook recently with basically this argument:

Treaty T is bad because:
  • It is written by crooks in city F (F stands for "far away")
  • It gives more power to people in city F and takes it away from people in city C (C stands for "close by" or "capital", your choice). Voters know more about the people in city C.

Now, if you didn't spot yet why this is insanely illogic, let me help.

First, let us make clear the assumptions:

  • People in city F are crooks.
  • We (voters) don't know people in city F well enough for an informed vote but we do know people in city C well enough for an informed vote.

This begs the question:

How do you know some people are "crooks" if you don't know enough about them?

The assumptions are inconsistent. That should be enough to make any logician puke. But let's assume only one of them is false.

So, first we assume that people in city F are crooks. So what? It doesn't follow in any way that treaty T is bad. A treaty is good or bad depending on what it contains, not depending on who wrote it.

Now let's assume they may be crooks but we don't know for sure because we don't know enough about them. How is that an argument against the treaty T?? Since people get their information through television, radio, and Internet the distance is completely irrelevant. Of course, it may be that media (who pushes information onto the sheepish voters) makes a bad choice in selecting what is important. That may very well be, but that's an argument against the media, not against the treaty. If the media would suddenly decide to only talk about the private life of Paris Hilton, does that mean that we can't vote for anyone else because we don't know enough about them? No, it means that the media is fucked up.

To make it clear, I am not arguing for the treaty for the simple reason that I did not read it. Before I read it I do not feel competent enough to preach to others how they should vote. However, I would prefer to see a little less fanaticism and a little more reason in my "inbox".

PS: Since I spent enough time on this spamming issue, be advised I will not read any comment on this post. Feel free to comment if you need to vent.

09 September 2009

carnival 56

It seems the Carnival of Mathematics is trying to get back to life. Here's my modest help: a link.