Theory and Practice: Reducible Flowgraphs 0

A summary of some basic results on reducible flowgraphs. Feedback is welcome. In particular, the ‘easy’ and ‘obvious’ parts are notorious for hiding mistakes. So let me know if something doesn’t look as ‘easy’ as I claim. This post is also available in PDF form.

Preliminaries. A flowgraph is a digraph with an initial node 0 from which all other nodes are reachable. Typically nodes represent statements in some low-level language (like Java bytecode) and edges represent possible successors in an execution. There are various restricted classes of flowgraphs that have been studied and are important in practice. For example, if a flowgraph is a dag (has no cycles) then the program terminates for all inputs; if a flowgraph is a two-terminal series–parallel graph then it corresponds to programs written using only two simple composition operations: sequence (;) and choice ([]). The subject of this post, reducible flowgraphs, is a class of flowgraphs that correspond to programs that can be written without using goto — loops like while and for suffice.

Definition 1 A flowgraph is reducible when it does not have a strongly connected subgraph with two (or more) entries.

A graph is strongly connected when there is a path x ↝ y for all nodes x and y. Node x is an entry for a subgraph H when there is a path 0 ↝^p x such that p ∩ H = {x}. (By notation abuse, p and H stand for sets of nodes, including the endpoints for the path.)

There are at least three other equivalent definitions, as Theorem 1 shows. Those definitions use a few more graph-theoretic concepts. A node x dominates node y when all paths 0 ↝ y contain node x. An edge x → x (for some node x) is a cycle-edge. A depth first search (DFS) of a flowgraph partitions its edges into tree edges (making up a spanning tree known as a DFS tree), forward edges (pointing to a successor in the spanning tree), back edges (pointing to a predecessor in the spanning tree, plus cycle-edges), and cross edges (the others). Tree edges, forward edges, and cross edges form a dag known as a DFS dag.

Theorem 1 ([1, 2]) The following statements about a flowgraph are equivalent:

It is reducible.
Every back edge has its source dominated by its target, for all DFS trees.
It has a unique DFS dag.
It can be transformed into a single node by repeated application of the transformations T₁ and T₂:
- T₁: Remove a cycle-edge.
- T₂: Pick a non-initial node y that has only one incoming edge x → y and glue nodes x and y.

Proof. It is not hard to show (by taking the contrapositive) that (1) ⇒ (2) and (2) ⇒ (3) and (3) ⇒ (1) and (4) ⇒ (1). It remains to show that (4) is implied by one of the others, which I’ll do, again, by looking at the contrapositive.

(1) ⇒ (4): If T₁ and T₂ cannot reduce a graph to a single node then the graph must have a strongly connected subgraph with two entries. This task can be split in two: First, show that if T₁ and T₂ cannot be applied at all then there is a strongly connected subgraph with two entries; second, show that applications of T₁⁻¹ and T₂⁻¹ maintain the existence of a strongly connected subgraph with two entries.

First part: If T₁ and T₂ cannot be applied then each non-initial node has ≥ 2 (distinct) predecessors. The initial node must have ≥ 2 (distinct) successors, let’s call them 1,…,k. Let’s look at nodes x and y ranging over this set. Because node y must have a second incoming edge there has to be a path x ↝ y for some x. If we can find x ≠ y for all y then we are done, since at least two of 1,…,k must be in strongly connected subgraph. Otherwise, let’s examine some node y whose second incoming edge comes from a cycle y ↝ y. The other nodes in this cycle must themselves have some second incoming edge which might come from some node x ≠ y, and we are done as before. Otherwise, we apply recursively the same reasoning on the subflowgraph induced by the nodes of the cycle y ↝ y (with node y as the initial node).

Second part: Easy. (Some would say ‘exercise’.) ☐

The loop forest. Typically a programmer writes in a language with while, which gets desugared, and then various tools (like optimizers or program verifiers) must reconstruct the loop structure of the original program. You may wonder why not keep the while constructs around in the low-level language:

Fewer language constructs means other algorithms are easier.
If the programmer is abusing goto we still want to know what he meant.

An important property of reducible flowgraphs is that there is a natural definition for loops. Intuitively, we want loops to be strongly connected subgraphs and, in reducible flowgraphs, such subgraphs can be associated with their entry.

Definition 2 In a reducible flowgraph, the loop L_x associated with the node x is the unique maximal strongly connected subgraph for which node x is an entry.

Such a graph always exists because a strongly connected subgraph of a reducible flowgraph is dominated by its unique entry. In other words, Definition 2 says that the loop L_x is the set of nodes y that are dominated by node x and also have a path y ↝ x back to the entry. (The path 0 ↝ x that witnesses node x being an entry cannot contain any dominated node y ≠ x.)

Proposition 1 Loops are strongly connected.

Proposition 2 Two loops are either disjunct or one is included in the other.

Proof. If node x dominates node y and there exists a node z ∈ L_x ∩ L_y then L_x ∩ L_y = L_y. If there is no domination between node x and node y then there is no node dominated by both so L_x ∩ L_y = ∅. ☐

When we extend the definition of loops to non-reducible graphs we’ll make sure that Propositions 1 and 2 still hold. Thus, it always makes sense to talk about a loop forest (in which there is a path L_x ↝ L_y when L_x ⊇ L_y).

The structure of loops in reducible flowgraphs can be analyzed by looking at what back edges they contain. A cycle is a graph of the form 1 → 2 → ⋯ → k → 1. Every cycle (which is a subgraph) of a flowgraph must contain one back edge. In reducible flowgraphs a stronger statement holds.

Lemma 1 Every cycle in a reducible flowgraph contains exactly one back edge.

Proof. Suppose there is a cycle x′ → x ↝^p y′ → y ↝^q x′, which contains back edges x′ → x and y′ → y. By definition, the nodes of a cycle are distinct, but we allow paths p and q to be (not necessarily both) of length 0 so it is possible that x = y′ and it is possible that y = x′. Every path 0 ↝ x′ must contain node x according to statement (2) in Theorem 1. Since there are paths 0 ↝ y ↝^q x′, this implies that all paths 0 ↝ y must contain node x (because path q does not contain node x, since they are part of a cycle). In other words, node x dominates node y. Symmetrically, node y dominates node x. This is only possible if x=y, which is a contradiction. ☐

Figure 1: flowgraph examples

Because of Lemma 1, Hecht and Ullman [2] originally defined loops in reducible graphs by associating them to back edges: The loop L_{y → x} is the union of paths x ↝ y that do not use node x as an intermediate node, plus the back edge itself. This shows that there isn’t really only one natural way of defining loops in reducible flowgraphs. For the graph in Figure 1(left) the old definition gives loop L_{1 → 0}={0,1} nested in loop L_{2 → 0}={0,1,2}, while our definition only finds the loop L₀=L_{2 → 0} (plus the uninteresting L₁={1} and L₂={2}). The problem with the old definition, however, is that it does not imply Proposition 2 as can be seen in Figure 1(right).

Definition 2 is always more coarse grained than that of Hecht and Ullman [2].

Lemma 2 In a reducible flowgraph, we have L_x = ∪L_{y → x}, where the union ranges over back edges y → x.

def dfs_forward(x):
  assert status[x] == unseen
  status[x] = seen
  for y in succ[x]:
    if status[y] == unseen:
      dfs_forward(y)
    elif status[y] == seen:
      backpred[y] += [x]
  for y in backpred[x]:
    dfs_backward(y, x)
  status[x] = done

def dfs_backward(y, z):
  y = find(y)
  if inner[y] != -1 or y == z:
    return
  if status[y] != done:
    print ’NOT reducible!’
    exit(1)
  inner[y] = z
  union(y, z)
  for x in pred[y]:
    dfs_backward(x, z)

Figure 2: finding loops in reducible flowgraphs

Tarjan [4] described an algorithm for recognizing reducible flowgraphs. Ramalingam [3] presents a modified version that identifies the loop forest. (Tarjan [4] says that Ullman discovered independently the same algorithm, although he didn’t publish it. Ramalingam [3] doesn’t say that he presents a modified algorithm. Perhaps the modification is folklore?) The core of a Python implementation appears in Figure 2. After executing dfs_forward(0) we have x==inner[y] when L_x ⊃ L_y and there’s no other loop in between. In other words, the array inner contains the parent pointers of the loop forest. The array succ represents the flowgraph through adjacency lists; the array pred represents the reversed graph through adjacency lists. The functions union and find are the standard union-find with the added trick that find(x) is guaranteed to return the z in the last call union(y, z) that involved the set to which x now belongs. (Figuring out a nice way to implement this trick is not hard, but fun.)

Remark 1 The recursive implementation looks nice but it tends to crash the stack, especially in Python, which has a rather low default recursion limit (1000 on my computer).

It is hopefully not too hard to see why this algorithm works after all those lemmas and theorems.

More to come. I plan three more posts: (1) finding loops in graphs that are not reducible, (2) making graphs reducible, and (3) summary of advanced results.

References

[1]: Mattew S. Hecht and Jeffrey D. Ullman. Flow graph reducibility. 1972.
[2]: Mattew S. Hecht and Jeffrey D. Ullman. Characterizations of reducible flow graphs. 1974.
[3]: Ganesan Ramalingam. Identifying loops in almost linear time. 1999.
[4]: Robert Tarjan. Testing flow graph reducibility. 1974.

Thanks to Claudia Grigore for feedback.

Theory and Practice

16 October 2009

Reducible Flowgraphs 0

No comments:

Post a Comment