January 4, 2012

Seven Books that Everyone Should Read

If you want a solid background in the theoretical parts of computing, then you should read the following books. The best time to do so is in high-school and as an undergraduate. But never is too late.

Types and Programming Languages. You will enjoy this book if you like to manipulate symbols. Once you understand ordered sets and induction, you are ready to begin reading. You will learn at least the following:

  • Lambda calculus is the computational model of functional languages.
  • Well-typed programs don't fault (except when they do).
  • You construct more complicated types, such as variants, records, and function types, out of simpler ones.
  • Types are much like propositions in logic, and polymorphic types are much like propositions in second-order logic.
  • Mutable state is tricky.

Algorithms. You will enjoy this book if you like the problems used in Google Code Jam. Once you understand the notation $O(f(n))$ and you can solve the recurrence $x_{n+1}=2x_n+1$, you are ready to begin reading. You will learn at least the following:

  • Reductions are the algorithm design technique.
  • When you reduce an instance of a problem to smaller instances of the same problem, you say that you use recursion (or, if you want a fancier phrase, say ‘divide and conquer’).
  • If you cache the results of recursive calls, then you say that you use memoization.
  • If you use at most one recursive call, then you are greedy.
  • If you are stuck, then try to reduce your problem to linear programming, or some other problem that is famous and has good implementations.

Purely Functional Data Structures. You will enjoy this book if you puzzled over how to implement breadth-first search in a programming language without mutation. Once you know the basics of a functional language (such as OCaml or Haskell), you are ready to begin reading. You will learn at least the following:

  • Persistent implementations of heaps and of search trees are elegant.
  • Amortization is a technique for data structure design, although it is usually presented as a technique for algorithm analysis.
  • The requirements of persistence and amortization imply the need for lazy evaluation.
  • Number representations correspond to collection implementations.
  • Polymorphic recursion is useful.

Information Theory, Inference, and Learning Algorithms. You will enjoy this book if you like probabilities (or gambling). Once you can answer ‘What is the probability that the sum of a pair of dice is $\ge3$?’, you are ready to begin reading. You will learn at least the following:

  • Information equals entropy; it is measured in bits.
  • Communication systems are reliable because they use codes, which rely on redundancy.
  • Out of functions whose spectrum has finite support, those of type $\mathbb N\to\mathbb R$ carry as much information as those of type $\mathbb R\to\mathbb R$.
  • The bandwidth and the signal-to-noise ratio of a channel determine a fundamental upper limit on its capacity, known as the Shannon limit.
  • The codes that best approach the Shannon limit are based on sparse graphs.

The Art of Computer Programming. You will enjoy this book if you like detailed precision. Once you wrote a program in a low-level language, you are ready to begin reading. You will learn at least the following:

  • You must prove that your programs are correct and fast.
  • There are ${2n \choose n}/(n+1)$ binary trees with $n$ nodes; that is, $\sim 4^n/n$.
  • One of the best hashes of $x$ takes the high bits of $\lfloor 2^wa\rfloor x$, where $w$ is the word width, and $a$ is some irrational number from $(0,1)$.
  • BDDs represent boolean functions; ZDDs represent families of sets.
  • You can generate the permutations of $n$ objects using only a constant number of operations for each step.

Approximation Algorithms. You will enjoy this book if you like good-enough, fast code more than you like perfect, but nonexistent code. Once you finished the Algorithms book from above, you are ready to begin reading. You will learn at least the following:

  • You need not despair if you have to solve an NP-hard optimization problem.
  • Greedy algorithms are surprisingly good, but you should know examples on which they behave worst.
  • The approximation ratio compares the solutions produced by an algorithm to optimal solutions.
  • The greedy algorithm for Set-Cover has approximation ratio $O(\lg n)$, where $n$ is the size of the universe.
  • To design a minimization algorithm, search for a certificate that provides a lower bound.

Computational Complexity. You will enjoy this book if you like to know what computers can do in principle more than you like to know what computers really do. Once you understand why the halting of a program cannot be decided, you are ready to begin reading. You will learn at least the following:

  • ‘Can we solve a problem efficiently if we can check its answers efficiently?’ is the most famous open question in theoretical computer science.
  • We can efficiently find models for 3CNF formulas only if the answer to the famous question is ‘yes’.
  • Every problem that can be solved with reasonable amounts of memory can be solved (not much slower) by playing a game.
  • Some proofs don't teach you anything.
  • If you can quickly count models of CNF formulas, then you are good at playing games.

Disclaimer. I read only parts of these books.

Other lists. I wrote this post after I saw two other book lists: one for programmers (on a Romanian site popular among high-school students) and one for theoretical computer scientists (on CSTheory).


What other books would you put on this list? Which ones would you take off this list?

November 16, 2011

Squaregraphs

On Tuesday, Donald Knuth gave a talk about squaregraphs. I'm summarizing here what I learned, and I'm including the open problems that Knuth mentioned.

Squaregraphs are plannar graphs. When you draw a graph in a plane, each of its cycles partition the plane into two regions, one bounded and one unbounded. When the bounded region is empty (that is, no edge and no vertex is drawn in it), the cycle is called a bounded face; when the unbounded region is empty, the cycle is called an unbounded face. By definition, all bounded faces of a squaregraph are 4-cycles. There is one more condition. A vertex is interior if it does not belong to the unbounded face. All interior vertices must have degree $\ge4$. Here is an example, a gear graph.

Knuth told us about many properties that are implied by this definition, usually without proof. I enjoy such talks. First, they give you a hawk's eye view of a subject; second, they make you think about the proofs. Some proofs are easy and some escape you during the talk and even days afterwards, but at least you get an idea of what is easy and what is not. What bothers me is that at no point did I feel the need to invoke the last property, that interior vertices have at least 4 neighbours.

But let's focus on what I did learn, rather than what I did not understand.

While on the bus, before the talk, Rasmus and I looked at the Wikipedia article on squaregraphs, which is very dense, and tried to make sense of it. I think it helped both of us to follow the presentation better. One of the last things Rasmus said was: "Squaregraphs ought to have an inductive definition." And, lo and behold, we found out what that definition is during the talk. Every squaregraph can be built by starting with a 4-cycle and repeatedly adding new faces, subject to the degree constraint: take a simple path of length 1, 2, or 3 that is part of the unbounded face and make a new 4-cycle by adding, respectively, 3, 2, or 1 new edges (drawn in the former unbounded face, that is, towards the outside).

Open problem: Is it possible, using this inductive definition, to construct squaregraphs that look like Mona Lisa when viewed from afar?

By the way, a frog can jump from an edge to an opposite edge across a bounded face, obviously. Here is a picture with all possible frog expeditions on an 8-gear.

Each expedition, depicted above with a curved green line, is a set of edges that cut the graph into two disconnected parts. We give vertices on one side a bit $0$ and vertices on the other side a bit $1$. If we do this for each expedition then each vertex gets as many bits as there are expeditions. To put some order on these bits we name the expeditions $0$, $1$, $\ldots\,$, $n-1$, so each vertex gets an $n$-bit string. In our example, there are 8 green lines so each vertex will get an 8-bit string.

These bitstrings are interesting because the shortest distance between two vertices equals the Hamming distance between the bitstrings of the vertices. We say that we found an isometric embedding of squaregraphs into an $n$-dimensional hypercube. In $n$ dimensions, squaregraphs look squarish.

Another interesting property of squaregraphs is that they are median graphs. A median graph is one in which all three vertices $x$, $y$, and $z$ have a median vertex $\langle xyz\rangle$ that is on some shortest path between $x$ and $y$, on some shortest path between $y$ and $z$, and on some shortest path between $z$ and $x$. An example is the integer grid, with vertices $(m_x,m_y)$ for all integers $m_x$ and $m_y$, and edges between vertices $m$ and $n$ when $|m_x-n_x|+|m_y-n_y|=1$. The $x$-coordinate of $\langle mnp\rangle$ is the median of the array $[m_x,n_x,p_x]$, hence the name (I guess). It turns out that you can also isometrically embed median graphs in $n$-dimensional hypercubes. That is, you can label vertices with bitstrings such that travelling an edge flips exactly one bit.

A subset of the $n$-dimensional hypercube is a median graph exactly when the selected coordinates are the models of a 2-CNF formula. Knuth said that an open problem would be to find a nice condition to impose on 2-CNF formulas such that their models correspond to squaregraphs.

Notice that on squaregraphs we have a notion of a line (the green equivalence classes above) and also of turning left or right. This means that we can define a knight's move on the bounded faces of a squaregraph. How can we generalize various problems that are usually done on chessboards to squaregraphs? For example, can one define something like rook polynomials for squaregraphs?

Finally, is there a 3D generalization of squaregraphs?

For more information, Knuth recommended a paper by Bandelt, Chepoi, and Eppstein, which is also recommended by Wikipedia. It's quite long, though. Knuth also showed us some nice pictures of squaregraphs on Eppstein's blog. Of this blog he said: "It is one of the best blogs in computer science. I do not read it, for otherwise I would not do anything else."

[Thanks to Rasmus for finding several mistakes in a previous version.]

November 10, 2011

Automata on Infinite Alphabets

There are several ways of extending the standard definitions of automata to infinite alphabets. Here I try to explain why the trivial extension (just use the same definitions) is not satisfactory and present a couple of alternatives.

The usual definitions go like this. Given is a finite set $\Sigma$. We say that $\Sigma$ is an alphabet and its elements are letters. Words are strings of letters, languages are sets of words. A finite automaton is a finite digraph whose arcs are labelled by letters, has a distinguished initial vertex, and has some vertices marked as being accepting. Each (finite) automaton determines a language, namely the set of those words obtained by concatenating the labels of a walk from the initial vertex to some accepting vertex. A language that can be described by a finite automaton is said to be regular.

Why restrict $\Sigma$ to be finite? What could possibly cause problems if we allow it to be bigger? Isn't the generalization straightforward?

Yes and no.

Yes, we could simply remove the restriction on $\Sigma$ and keep the same definitions. However, that would lead to situations that intuitively just don't feel right. For example, is the language $L=\{\,abc \mid c\in\{a,b\}\,\}$ regular? With the definitions above $L$ is regular exactly when $\Sigma$ is finite. There is something unpleasant about a language suddenly becoming non-regular, although its (declarative) definition does not change.

Recall the standard model for (finite) automata: a transition function $\delta:X\times\Sigma\to X$, where $X=V$ is the (finite) set of vertices, an accepting function $\alpha:V\to2$, and the initial function $\iota:1\to X$. There are at least three ways of extending the standard model: finite-memory automata aka register automata (Francez and Kaminski, 1994), pebble automata, and data automata. All three are surveyed by Segoufin (2006). Thinking back of the language $L$, the intuitive reason why it cannot be recognized by a finite automaton is that we cannot easily remember the letters we saw. So the natural thing to try is to add a (finite) memory to the automaton. This suggests distinguishing between the (not necessarily finite) set of states $X$ and the finite set of vertices $V$. For example, we could take $X=V\times\Sigma^k$ for some $k$. The automaton has $k$ registers. However, now the automaton is too powerful. For example, take $V$ to be the singleton set $1$, and $\Sigma$ to be the set of natural numbers $\mathbb{N}$, and $k$ to be $1$. Then $\delta$ could be essentially any function $\mathbb{N}\times\mathbb{N}\to\mathbb{N}$. We did not even require it to be computable! The class of languages recognized by such a device is clearly too big to be called "regular". This is why extra restrictions on $\delta$ are needed. Francez and Kaminski (1994) essentially required that all that $\delta$ can do with letters is (1) store them in registers and (2) compare them for equality.

Finally, Bojanczyk et al (2011) give what I think is a very cute extension of the standard model of (finite) automata.


Their model is still a triple $(\delta, \alpha, \iota)$ of functions with the types as given above. However, the extra restrictions are much more principled. Now $X$ and $\Sigma$ are required to be $G$-Sets (for some group $G$), and all the three functions are required to be invariant. (These concepts are explained in my previous post on group actions.) A language is now $G$-regular when it is recognized by an automaton for which both $X$ and $\Sigma$ have a finite number of orbits.

November 7, 2011

Group Actions

The theory of finite automata works with finite alphabets. The article Automata with Group Actions by Bojanczyk et al. is one recent attempt to extend the theory to infinite alphabets. To do so, the authors use a basic notion from group theory, namely group action. Here I review the facts about group actions that are necessary to understand Bojanczyk's article.

Recall that a group is a set $G$ together with a binary operation that has an identity, is associative, and each element has an inverse:

  • $1 \pi=\pi 1$, for all $\pi \in G$
  • $(\pi\sigma)\tau=\pi(\sigma\tau)$, for all $\pi,\sigma,\tau\in G$, and
  • there exist a $\pi^{-1}\in G$ such that $\pi\pi^{-1}=\pi^{-1}\pi=1$, for all $\pi\in G$

The typical example of a group is the set of permutations of some set $X$. Its elements have the type $X \to X$. The identity is the identity function. The binary operation is function composition, which is always associative. Each permutation has an inverse almost by definition. (Permutations are those functions that have inverses.) This group is sometimes denoted $\mathrm{Sym}(X)$ and is called the symmetric group of $X$.

Recall also that a function $f : G \to H$ is called a homomorphism when $f(\pi\sigma)=f(\pi)f(\sigma)$ for all $\pi,\sigma \in G$. We say that $f$ preserves the group operation. In particular, $f(1)=1$ or, more precisely, $f(1_G)=1_H$. In words, homomorphisms map the identity of one group to the identity of the other. This is easy to prove, and makes a nice exercise.

A group action is a homomorphism from $G$ to $\mathrm{Sym}(X)$. The set $X$ is called a $G$-Set.

The simplest group action is the trivial action, which maps all elements of $G$ to the identity permutation of $X$,

I know of two reasons why group actions are an interesting concept. First, they let you think of elements of a group $G$ as symmetries of a set $X$ and therefore they partition the set $X$ into equivalence classes. You change the group $G$, you change which elements of $X$ are "essentially the same." Second, once you have group actions for $X$ and $Y$ you also have group actions for $X \times Y$ and $X+Y$ and $X^*$. So group actions build up nicely.

The picture I have in my mind when I talk about G-Sets is a bunch of points (the elements of $X$) with a bunch of arcs between them. Each arc is labelled by an element of $G$. When we look at all arcs labelled by $\pi$ (where $\pi \in G$) we get the picture of a permutation on $X$. That is, a bunch of (disjoint) cycles. When we look at the arcs labelled by $1_G$ we see a loop on each point. In general, we see $|G|$ arcs getting out of each point. This picture has a few more properties. If there is an arc from $x$ to $y$ and one from $y$ to $z$, then there is an arc from $x$ to $z$. In other words, the digraph I'm describing is transitively closed. Proving this property makes another nice exercise. In yet other words, for all $x$ and $y$ either there is an arc from $x$ to $y$ or there is no path from $x$ to $y$. The (strongly) connected components of this graph are the equivalence classes that $G$ determines on $X$. The equivalence class of an element $x \in X$ is called its orbit. Another property of this graph (the Burnside lemma) is that the points in each orbit have exactly $|G|$ loops in total. (Well, I admit: This is not quite the standard formulation.)

To discuss how group actions build up it's useful to finally introduce some (common) notation. We write $x \cdot \pi = y$ when the permutation associated with $\pi \in G$ maps $x \in X$ to $y \in Y$. So, in a way, the dot $\cdot$ represents the (still nameless) group action. Typically, we use the following.

  • $(x,y) \cdot \pi = (x \cdot \pi, y \cdot \pi)$
  • $[x_0,\ldots,x_{n-1}]\cdot \pi = [x_0\cdot\pi,\ldots,x_{n-1}\cdot\pi]$

A subset $Z$ of $X$ is said to be invariant (or equivariant) when it is a union of orbits. Equivalently, $Z$ is said to be invariant when each $\pi \in G$ permutes its elements, that is $Z \cdot \pi = Z$. (The graph picture is very handy to figure out why these two definitions are equivalent.)

Now suppose that the $G$-Set is $X \times Y$ and we are given a function $f : X \to Y$. We say that $f$ is invariant when the subset $\{\,(x,f(x))\mid x\in X\,\}$ is invariant. In other words, when $$f(x \cdot \pi) = f(x) \cdot \pi$$ for all $x \in X$ and $\pi \in G$. Invariant functions are closed under composition. $G$-Set is the category of $G$-Sets with the arrows being invariant functions.

How is this at all relevant to finite automata over infinite alphabets? Well, very roughly, a finite automaton over infinite alphabets is one that cannot distinguish between symmetric traces. For example, an automaton might say that "given two iterators $\alpha$ and $\beta$ on a collection $\gamma$, it is illegal to use $\alpha$ to modify $\gamma$ and then continue to use $\beta$." The exact values $\alpha$, $\beta$, and $\gamma$ are irrelevant, so if a trace with some values is accepted, then so are all traces obtained by pointwise permuting the values in the trace. I realize this is rather unclear, but I plan to talk about it in a future post.

In the meantime, there are two other nice references on group actions that I know of: Wikipedia and a recent post on Gower's blog.

September 16, 2011

Education, Sport, Entertainment

The summer Olympic Games are less than a year away. In those Games, athletes compete in numerous events, all physical. I saw the term Olympiad used also for non-physical contests, like the International Olympiad in Informatics. I wondered when did people start to refer to intellectual contests as Olympiads, so I tried Wikipedia and … did not find an answer. Instead, I did find out that 'Olympiad' means 'a period of four years.' Strictly speaking, it is erroneous to say 'Olympiad' for the event itself, especially if the event is annual.

Let's move on from etymology to a more interesting issue: What do these events have in common? Both are contests whose participants demonstrate skill levels that seem super-human. For a normal person, just watching the performances can be inspiring and motivating. I believe that this last similarity is key, because it is what makes the events good.

Let me explain.

If two people say that 'X is good,' they may, nevertheless,disagree. For example, if two people say that Natalie Portman is good, they may mean different things. One may mean she is a good actress but not a particularly good scientist; another may mean she is a good scientist but not a particularly good actress. It is therefore important to say what I mean by good. Briefly, something that benefits humankind.

So, what exactly is good about the Games? It can't be the extra pollution caused by hordes of people flying to London. It can't be the overcrowded tubes, which will make my life miserable. Could it be the entertainment it provides to so many people? It could, if only it were less expensive. I'm sure there are cheaper forms of entertainment, such as watching silly videos on YouTube.

So what is it?

Every performance, such as a movie, a dance, or a football game, makes people feel good while watching. More importantly, some performances have a lasting effect. Some movies haunt you for a long time and make you think about certain aspects of life. Beckham and Hagi inspired a whole generation to play football (in UK and Romania, respectively). That is one healthier generation than it could have been. Yes, the effects of one particular performance on one particular individual are small, but the effects add up.

In my mind, by far the most important contributions to society of the Olympic Games are the following.

  • promoting exercise and hence a healthier life-style
  • promoting fair-play

Identifying elites? A means to an end. The skills that awe us are not even the skills most useful to society. Schumacher is a good driver, but do we really think it would be good for society if all drivers would have his skills? The bus driver who takes children to school for 40 years without any accident is less awe inspiring, but his skills definitely benefit the lives of the many children he transports. Yes, there is some correlation between the two skills.

And so it is with intellectual contests.