Theory and Practice: October 2009

28 October 2009

Irrational

Labels: puzzle

A cute math puzzle that I saw twice in the past week.

There are three parts. The first two are supposed to be very easy.

Are there two irrational numbers a and b such that a+b is rational?
Are there two irrational numbers a and b such that ab is rational?
Are there two irrational numbers a and b such that a^b is rational?

Sources: The Princeton Companion and Richard Lipton's blog, quoting a talk by Manny Blum.

22 October 2009

Overflow

Labels: random

How do you use StackOverflow and MathOverflow?

I sometimes find answers on StackOverflow while googling for technical problems. For example, yesterday I wanted to know what is a good way to center code typeset with the LaTeX package listings. I didn't find out, so I'm stuck with nesting four environments: figure, center, tabular, lstlisting. Instead, I found someone asking on StackOverflow how to top-align two listings that go on the same row of a tabular environment. There were a few overly complicated answers, so I added mine (\lstset{boxpos=t}). Notice that it's more of a coincidence that StackOverflow got involved—I never planned to answer a question there. In fact, whenever I feel like helping random people and I go to StackOverflow or MathOverflow and click on "Unanswered" I end up sad because I never know how to answer any question.

So, do you use these websites? If yes, then how do you interact with them?

PS: In other news, expect this blog to be quiet until next year. I'm suppose to write a dissertation. Wish me luck.

16 October 2009

Reducible Flowgraphs 0

Labels: graphs, learning, research

A summary of some basic results on reducible flowgraphs. Feedback is welcome. In particular, the ‘easy’ and ‘obvious’ parts are notorious for hiding mistakes. So let me know if something doesn’t look as ‘easy’ as I claim. This post is also available in PDF form.

Preliminaries. A flowgraph is a digraph with an initial node 0 from which all other nodes are reachable. Typically nodes represent statements in some low-level language (like Java bytecode) and edges represent possible successors in an execution. There are various restricted classes of flowgraphs that have been studied and are important in practice. For example, if a flowgraph is a dag (has no cycles) then the program terminates for all inputs; if a flowgraph is a two-terminal series–parallel graph then it corresponds to programs written using only two simple composition operations: sequence (;) and choice ([]). The subject of this post, reducible flowgraphs, is a class of flowgraphs that correspond to programs that can be written without using goto — loops like while and for suffice.

Definition 1 A flowgraph is reducible when it does not have a strongly connected subgraph with two (or more) entries.

A graph is strongly connected when there is a path x ↝ y for all nodes x and y. Node x is an entry for a subgraph H when there is a path 0 ↝^p x such that p ∩ H = {x}. (By notation abuse, p and H stand for sets of nodes, including the endpoints for the path.)

There are at least three other equivalent definitions, as Theorem 1 shows. Those definitions use a few more graph-theoretic concepts. A node x dominates node y when all paths 0 ↝ y contain node x. An edge x → x (for some node x) is a cycle-edge. A depth first search (DFS) of a flowgraph partitions its edges into tree edges (making up a spanning tree known as a DFS tree), forward edges (pointing to a successor in the spanning tree), back edges (pointing to a predecessor in the spanning tree, plus cycle-edges), and cross edges (the others). Tree edges, forward edges, and cross edges form a dag known as a DFS dag.

Theorem 1 ([1, 2]) The following statements about a flowgraph are equivalent:

It is reducible.
Every back edge has its source dominated by its target, for all DFS trees.
It has a unique DFS dag.
It can be transformed into a single node by repeated application of the transformations T₁ and T₂:
- T₁: Remove a cycle-edge.
- T₂: Pick a non-initial node y that has only one incoming edge x → y and glue nodes x and y.

Proof. It is not hard to show (by taking the contrapositive) that (1) ⇒ (2) and (2) ⇒ (3) and (3) ⇒ (1) and (4) ⇒ (1). It remains to show that (4) is implied by one of the others, which I’ll do, again, by looking at the contrapositive.

(1) ⇒ (4): If T₁ and T₂ cannot reduce a graph to a single node then the graph must have a strongly connected subgraph with two entries. This task can be split in two: First, show that if T₁ and T₂ cannot be applied at all then there is a strongly connected subgraph with two entries; second, show that applications of T₁⁻¹ and T₂⁻¹ maintain the existence of a strongly connected subgraph with two entries.

First part: If T₁ and T₂ cannot be applied then each non-initial node has ≥ 2 (distinct) predecessors. The initial node must have ≥ 2 (distinct) successors, let’s call them 1,…,k. Let’s look at nodes x and y ranging over this set. Because node y must have a second incoming edge there has to be a path x ↝ y for some x. If we can find x ≠ y for all y then we are done, since at least two of 1,…,k must be in strongly connected subgraph. Otherwise, let’s examine some node y whose second incoming edge comes from a cycle y ↝ y. The other nodes in this cycle must themselves have some second incoming edge which might come from some node x ≠ y, and we are done as before. Otherwise, we apply recursively the same reasoning on the subflowgraph induced by the nodes of the cycle y ↝ y (with node y as the initial node).

Second part: Easy. (Some would say ‘exercise’.) ☐

The loop forest. Typically a programmer writes in a language with while, which gets desugared, and then various tools (like optimizers or program verifiers) must reconstruct the loop structure of the original program. You may wonder why not keep the while constructs around in the low-level language:

Fewer language constructs means other algorithms are easier.
If the programmer is abusing goto we still want to know what he meant.

An important property of reducible flowgraphs is that there is a natural definition for loops. Intuitively, we want loops to be strongly connected subgraphs and, in reducible flowgraphs, such subgraphs can be associated with their entry.

Definition 2 In a reducible flowgraph, the loop L_x associated with the node x is the unique maximal strongly connected subgraph for which node x is an entry.

Such a graph always exists because a strongly connected subgraph of a reducible flowgraph is dominated by its unique entry. In other words, Definition 2 says that the loop L_x is the set of nodes y that are dominated by node x and also have a path y ↝ x back to the entry. (The path 0 ↝ x that witnesses node x being an entry cannot contain any dominated node y ≠ x.)

Proposition 1 Loops are strongly connected.

Proposition 2 Two loops are either disjunct or one is included in the other.

Proof. If node x dominates node y and there exists a node z ∈ L_x ∩ L_y then L_x ∩ L_y = L_y. If there is no domination between node x and node y then there is no node dominated by both so L_x ∩ L_y = ∅. ☐

When we extend the definition of loops to non-reducible graphs we’ll make sure that Propositions 1 and 2 still hold. Thus, it always makes sense to talk about a loop forest (in which there is a path L_x ↝ L_y when L_x ⊇ L_y).

The structure of loops in reducible flowgraphs can be analyzed by looking at what back edges they contain. A cycle is a graph of the form 1 → 2 → ⋯ → k → 1. Every cycle (which is a subgraph) of a flowgraph must contain one back edge. In reducible flowgraphs a stronger statement holds.

Lemma 1 Every cycle in a reducible flowgraph contains exactly one back edge.

Proof. Suppose there is a cycle x′ → x ↝^p y′ → y ↝^q x′, which contains back edges x′ → x and y′ → y. By definition, the nodes of a cycle are distinct, but we allow paths p and q to be (not necessarily both) of length 0 so it is possible that x = y′ and it is possible that y = x′. Every path 0 ↝ x′ must contain node x according to statement (2) in Theorem 1. Since there are paths 0 ↝ y ↝^q x′, this implies that all paths 0 ↝ y must contain node x (because path q does not contain node x, since they are part of a cycle). In other words, node x dominates node y. Symmetrically, node y dominates node x. This is only possible if x=y, which is a contradiction. ☐

Figure 1: flowgraph examples

Because of Lemma 1, Hecht and Ullman [2] originally defined loops in reducible graphs by associating them to back edges: The loop L_{y → x} is the union of paths x ↝ y that do not use node x as an intermediate node, plus the back edge itself. This shows that there isn’t really only one natural way of defining loops in reducible flowgraphs. For the graph in Figure 1(left) the old definition gives loop L_{1 → 0}={0,1} nested in loop L_{2 → 0}={0,1,2}, while our definition only finds the loop L₀=L_{2 → 0} (plus the uninteresting L₁={1} and L₂={2}). The problem with the old definition, however, is that it does not imply Proposition 2 as can be seen in Figure 1(right).

Definition 2 is always more coarse grained than that of Hecht and Ullman [2].

Lemma 2 In a reducible flowgraph, we have L_x = ∪L_{y → x}, where the union ranges over back edges y → x.

def dfs_forward(x):
  assert status[x] == unseen
  status[x] = seen
  for y in succ[x]:
    if status[y] == unseen:
      dfs_forward(y)
    elif status[y] == seen:
      backpred[y] += [x]
  for y in backpred[x]:
    dfs_backward(y, x)
  status[x] = done

def dfs_backward(y, z):
  y = find(y)
  if inner[y] != -1 or y == z:
    return
  if status[y] != done:
    print ’NOT reducible!’
    exit(1)
  inner[y] = z
  union(y, z)
  for x in pred[y]:
    dfs_backward(x, z)

Figure 2: finding loops in reducible flowgraphs

Tarjan [4] described an algorithm for recognizing reducible flowgraphs. Ramalingam [3] presents a modified version that identifies the loop forest. (Tarjan [4] says that Ullman discovered independently the same algorithm, although he didn’t publish it. Ramalingam [3] doesn’t say that he presents a modified algorithm. Perhaps the modification is folklore?) The core of a Python implementation appears in Figure 2. After executing dfs_forward(0) we have x==inner[y] when L_x ⊃ L_y and there’s no other loop in between. In other words, the array inner contains the parent pointers of the loop forest. The array succ represents the flowgraph through adjacency lists; the array pred represents the reversed graph through adjacency lists. The functions union and find are the standard union-find with the added trick that find(x) is guaranteed to return the z in the last call union(y, z) that involved the set to which x now belongs. (Figuring out a nice way to implement this trick is not hard, but fun.)

Remark 1 The recursive implementation looks nice but it tends to crash the stack, especially in Python, which has a rather low default recursion limit (1000 on my computer).

It is hopefully not too hard to see why this algorithm works after all those lemmas and theorems.

More to come. I plan three more posts: (1) finding loops in graphs that are not reducible, (2) making graphs reducible, and (3) summary of advanced results.

References

[1]: Mattew S. Hecht and Jeffrey D. Ullman. Flow graph reducibility. 1972.
[2]: Mattew S. Hecht and Jeffrey D. Ullman. Characterizations of reducible flow graphs. 1974.
[3]: Ganesan Ramalingam. Identifying loops in almost linear time. 1999.
[4]: Robert Tarjan. Testing flow graph reducibility. 1974.

Thanks to Claudia Grigore for feedback.

15 October 2009

Time, learning, teaching, markup

Labels: opinion

A rant about blog-writing technology.

This term I am not teaching so that I can focus my energies on writing a thesis. Instead, I found other creative ways of wasting time. I have recently succumbed to peer-pressure and started using Facebook and Twitter. I am also writing blog posts.

I realized recently that my knowledge about reducible flowgraphs is shallow. Normally I’d go and read about them from a textbook or some old articles and perhaps even do a few exercises. This time I decided to apply an advice I heard often: “The best way to learn a subject is to teach it.” Since I don’t teach, the next best thing is to write a blog post about it. (That’s not entirely true: Even if I would teach a blog post would be a better option, because I’m usually either demonstrating and have little say on the content, or I’m in charge of a course on a subject like Unix programming — good luck with fitting graph theory in there.)

The obvious question is: Why is this post not about reducible flowgraphs and instead it rants about an inexistent post about reducible flowgraph? OK, perhaps phrased like that it isn’t quite ‘obvious’. Maybe this works better: Get to the point!

Well, I’m technology frustrated. The mechanics of writing and publishing should take an insignificant amount of time compared to the time it takes to produce the content. Alas, that’s not the case when you want to convey information using:

text
drawings
code (in a programming language)
(math) formulas

Let’s see what’s available. (La)TeX is very good for text and formulas. You just go and write your text. A new paragraph is achieved by 2 enter keys, marking up a block of text with italics takes 5 extra characters, inserting a link takes 9 extra characters. For formulas, it is superb: I can typeset a_b with 5 keys, including the 2 delimiters. (If you think I’m crazy to count keys, wait until I tell you how many you need in the xML family of languages.) For drawings (La)TeX is again very good if you use the TiKZ package. Describing pictures in that language is so cool that I often fantasize about using it to introduce kids to programming. For code, the LaTeX(-only) package listings does a decent job: You have to bracket the piece of code with the right environment (34 keys), but sometimes you need to configure it to get nice results.

However, there is a big problem with LaTeX: browsers don’t know how to render it. Browsers know how to render xML: HTML, XML with style-sheets, MathML, SVG, and so on. Unfortunately, writing in these languages is a nightmare because they were designed to be easily consumed and produced by programs, not by people. Of course, you may use an editor that hides the details, but there are two problems with that approach. First, I know of no good editor that handles all those four types of content. Second, switching between keyboard and mouse takes time and disrupts focus so wysiwyg editors are out of the question for me. The corollary to that is that what I want is a language that lets me type easily the content. After that is available, fancy editors can bring minor improvements.

But let me be more explicit about why using xML is horrible. For text it is fairly OK. A new paragraph is achieved by a minimum of 3 keys (or 7 keys if you want proper bracketing of tags) but you usually type the 2 extra enters anyway to have the paragraphs nicely layed out while editing. Marking up a block of text with italics takes 7 extra characters. Inserting a link takes 14 characters (5 more than for LaTeX, which is ironic considering that when people think HTML they think links). But it gets worse. The xML way to typeset a_b is to use MathML and it takes… 89 characters! (Plus, locally the file must have the extension xml so I’m not sure it will work on Blogger.) Call me crazy, but I’m not willing to trade 5 characters for 89 characters no matter how better structured and more explicit the latter is. For drawings there’s something called SVG, which is similarly verbose, although perhaps not quite as bad when compared to TiKZ (or metapost). For typesetting code the xML solution is actually quite good: You just put the code in the proper environment (31 keys) and then you let a script do its job. I never saw it fail (although the implementation is quite hackish).

There is of course the option of writing LaTeX and converting that to xML. There are a few tools around to do just that. The best, in my opinion, is HeVeA. It handles text perfectly. For TiKZ drawings you can add an environment (again, around 30 keys) and it will make PNGs out of those. Cool. For formulas it uses plain HTML (no MathML) so the result works on older browsers but looks uglier than it could on newer ones. (MathML support is experimental, and it doesn’t work on my files.) You also have the option of transforming selected formulas to images (just like you do with TiKZ pictures). This makes sense if the formulas are complicated, in which case the HTML-only rendering is likely to suck. For code, HeVeA has some built-in special handling of the listings package, which produces results that I find quite ugly. HeVeA is almost ideal, except it is hard to customize, even though it’s supposed to be easy. For example, I want to translate the lstlisting LaTeX environment into a pre block with a certain class and let the script I mentioned do the syntax highlight. Why? It looks better and it is backed by Google, meaning that it’s likely to be more up-to-date than the internals of HeVeA. (HeVea has one great developer, but he is still one.) This customization seems to require changing the OCaml code of HeVeA and that is not easy. I also want to change some uses of df and dd tags and some purple (yuck!) things it generates but, again, I need to spend some time to figure out how to do it: It’s not easy. Of course, one solution for code is to turn it into images: That would look OK (though no colors) but it precludes readers to do copy and paste.

Another LaTeX to HTML converter is latex2wp. It handles text OK. It handles formulas by exploiting Wordpress support, which makes it unusable on Blogger or other sites (unless you are willing to piss off Wordpress for using their resources to drive traffic to a competitor). This brings up another point about HeVeA: If you are a Wordpress user it makes sense to exploit their support for LaTeX. But, good luck with convincing HeVeA to translate $a_b$ into $latex a_b$ . Finally, if I remember correctly, latex2wp has no support for drawings or code.

HeVeA is a successor written by Luc Maranget of a convertor written by the Xavier Leroy. (And, if you don’t know who the latter is, then you are not a computer scientist.) Xavier remarked that an ideal solution for writing technical documentation would be to design a new language that is easily convertible to both LaTeX (to produce printable PDFs) and xML (for easy viewing in browsers), but practically most people writing technical stuff would prefer to use LaTeX and will be reluctant to learn a new language. That was before Wikipedia: Its markup language is convertible to both HTML and PDF (through LaTeX?). Alas, I think their tools are not publicly available. (If they are and are easy to use on Linux I want to know!) Supposing that the tools are available, there is a bigger problem: While text, code, and math are supported, drawings have to be written in the verbose SVG.

That was the motivation. Now comes the challenge.

Design a concise language for describing documents with text, drawings, code, and formulas. Make it feel familiar by having syntax similar to existing languages. Implement translations to LaTeX and xML. Make sure the implementation is very easy to customize. (Keep in mind the failings of HeVeA in this area.) The usage of the tool should be as easy as tool -tex docname.

Now, why didn’t I think of using that as a project when I was teaching?

PS: In case you wonder, I wrote this in LaTeX and used HeVeA.