A summary of some basic results on reducible flowgraphs. Feedback is welcome. In particular, the ‘easy’ and ‘obvious’ parts are notorious for hiding mistakes. So let me know if something doesn’t look as ‘easy’ as I claim. This post is also available in PDF form.
Preliminaries. A flowgraph is a digraph with an initial node 0 from which all other nodes are reachable. Typically nodes represent statements in some low-level language (like Java bytecode) and edges represent possible successors in an execution. There are various restricted classes of flowgraphs that have been studied and are important in practice. For example, if a flowgraph is a dag (has no cycles) then the program terminates for all inputs; if a flowgraph is a two-terminal series–parallel graph then it corresponds to programs written using only two simple composition operations: sequence (;) and choice ([]). The subject of this post, reducible flowgraphs, is a class of flowgraphs that correspond to programs that can be written without using goto — loops like while and for suffice.
Definition 1
A flowgraph is reducible when it does not have a
strongly connected subgraph with two (or more) entries.
A graph is strongly connected when there is a
path x ↝ y for all nodes x
and y. Node x is an entry for a subgraph H when
there is a path 0 ↝^{p} x such that p ∩
H = {x}. (By notation abuse, p and H stand for sets of
nodes, including the endpoints for the path.)There are at least three other equivalent definitions, as Theorem 1 shows. Those definitions use a few more graph-theoretic concepts. A node x dominates node y when all paths 0 ↝ y contain node x. An edge x → x (for some node x) is a cycle-edge. A depth first search (DFS) of a flowgraph partitions its edges into tree edges (making up a spanning tree known as a DFS tree), forward edges (pointing to a successor in the spanning tree), back edges (pointing to a predecessor in the spanning tree, plus cycle-edges), and cross edges (the others). Tree edges, forward edges, and cross edges form a dag known as a DFS dag.
Theorem 1 ([1, 2])
The following statements about a flowgraph are equivalent:
Proof. It is not hard to show (by taking the contrapositive) that (1)
⇒ (2) and (2) ⇒ (3) and (3) ⇒ (1) and (4) ⇒
(1). It remains to show that (4) is implied by one of the
others, which I’ll do, again, by looking at the contrapositive.- It is reducible.
- Every back edge has its source dominated by its target, for all DFS trees.
- It has a unique DFS dag.
- It can be transformed into a single node by
repeated application of the transformations T_{1} and T_{2}:
- T_{1}: Remove a cycle-edge.
- T_{2}: Pick a non-initial node y that has only one incoming edge x → y and glue nodes x and y.
(1) ⇒ (4): If T_{1} and T_{2} cannot reduce a graph to a single node then the graph must have a strongly connected subgraph with two entries. This task can be split in two: First, show that if T_{1} and T_{2} cannot be applied at all then there is a strongly connected subgraph with two entries; second, show that applications of T_{1}^{−1} and T_{2}^{−1} maintain the existence of a strongly connected subgraph with two entries.
First part: If T_{1} and T_{2} cannot be applied then each non-initial node has ≥ 2 (distinct) predecessors. The initial node must have ≥ 2 (distinct) successors, let’s call them 1,…,k. Let’s look at nodes x and y ranging over this set. Because node y must have a second incoming edge there has to be a path x ↝ y for some x. If we can find x ≠ y for all y then we are done, since at least two of 1,…,k must be in strongly connected subgraph. Otherwise, let’s examine some node y whose second incoming edge comes from a cycle y ↝ y. The other nodes in this cycle must themselves have some second incoming edge which might come from some node x ≠ y, and we are done as before. Otherwise, we apply recursively the same reasoning on the subflowgraph induced by the nodes of the cycle y ↝ y (with node y as the initial node).
Second part: Easy. (Some would say ‘exercise’.) ☐
The loop forest. Typically a programmer writes in a language with while, which gets desugared, and then various tools (like optimizers or program verifiers) must reconstruct the loop structure of the original program. You may wonder why not keep the while constructs around in the low-level language:
- Fewer language constructs means other algorithms are easier.
- If the programmer is abusing goto we still want to know what he meant.
Definition 2
In a reducible flowgraph, the loop L_{x} associated with
the node x is the unique maximal strongly connected subgraph
for which node x is an entry.
Such a graph always exists because a strongly connected
subgraph of a reducible flowgraph is dominated by its unique
entry. In other words, Definition 2 says that
the loop L_{x} is the set of nodes y that are dominated by
node x and also have a path y ↝ x back to the entry.
(The path 0 ↝ x that witnesses node x being an entry
cannot contain any dominated node y ≠ x.)Proof. If node x dominates node y and there exists a node z ∈ L_{x} ∩ L_{y} then L_{x} ∩ L_{y} = L_{y}. If there is no domination between node x and node y then there is no node dominated by both so L_{x} ∩ L_{y} = ∅. ☐
When we extend the definition of loops to non-reducible graphs we’ll make sure that Propositions 1 and 2 still hold. Thus, it always makes sense to talk about a loop forest (in which there is a path L_{x} ↝ L_{y} when L_{x} ⊇ L_{y}).
The structure of loops in reducible flowgraphs can be analyzed by looking at what back edges they contain. A cycle is a graph of the form 1 → 2 → ⋯ → k → 1. Every cycle (which is a subgraph) of a flowgraph must contain one back edge. In reducible flowgraphs a stronger statement holds.
Proof. Suppose there is a cycle x′ → x ↝^{p} y′ → y ↝^{q} x′, which contains back edges x′ → x and y′ → y. By definition, the nodes of a cycle are distinct, but we allow paths p and q to be (not necessarily both) of length 0 so it is possible that x = y′ and it is possible that y = x′. Every path 0 ↝ x′ must contain node x according to statement (2) in Theorem 1. Since there are paths 0 ↝ y ↝^{q} x′, this implies that all paths 0 ↝ y must contain node x (because path q does not contain node x, since they are part of a cycle). In other words, node x dominates node y. Symmetrically, node y dominates node x. This is only possible if x=y, which is a contradiction. ☐
Because of Lemma 1, Hecht and Ullman [2] originally defined loops in reducible graphs by associating them to back edges: The loop L_{y → x} is the union of paths x ↝ y that do not use node x as an intermediate node, plus the back edge itself. This shows that there isn’t really only one natural way of defining loops in reducible flowgraphs. For the graph in Figure 1(left) the old definition gives loop L_{1 → 0}={0,1} nested in loop L_{2 → 0}={0,1,2}, while our definition only finds the loop L_{0}=L_{2 → 0} (plus the uninteresting L_{1}={1} and L_{2}={2}). The problem with the old definition, however, is that it does not imply Proposition 2 as can be seen in Figure 1(right).
Definition 2 is always more coarse grained than that of Hecht and Ullman [2].
Lemma 2
In a reducible flowgraph, we have L_{x} = ∪L_{y → x},
where the union ranges over back edges y → x.
Tarjan [4] described an algorithm for recognizing reducible flowgraphs. Ramalingam [3] presents a modified version that identifies the loop forest. (Tarjan [4] says that Ullman discovered independently the same algorithm, although he didn’t publish it. Ramalingam [3] doesn’t say that he presents a modified algorithm. Perhaps the modification is folklore?) The core of a Python implementation appears in Figure 2. After executing
def dfs_forward(x): assert status[x] == unseen status[x] = seen for y in succ[x]: if status[y] == unseen: dfs_forward(y) elif status[y] == seen: backpred[y] += [x] for y in backpred[x]: dfs_backward(y, x) status[x] = donedef dfs_backward(y, z): y = find(y) if inner[y] != -1 or y == z: return if status[y] != done: print ’NOT reducible!’ exit(1) inner[y] = z union(y, z) for x in pred[y]: dfs_backward(x, z)
dfs_forward
(0)
we
have x
==
inner
[
y
]
when L_{x} ⊃ L_{y} and
there’s no other loop in between. In other words, the
array inner
contains the parent pointers of
the loop forest. The array succ
represents the
flowgraph through adjacency lists; the array pred
represents the reversed graph through adjacency lists. The
functions union
and find
are the
standard union-find with the added trick that
find
(
x
)
is guaranteed to return the z
in the last call union
(
y
,
z
)
that involved the set
to which x
now belongs. (Figuring out a nice way to
implement this trick is not hard, but fun.)Remark 1 The recursive implementation looks nice but it tends to crash the stack, especially in Python, which has a rather low default recursion limit (1000 on my computer).
It is hopefully not too hard to see why this algorithm works after all those lemmas and theorems.
More to come. I plan three more posts: (1) finding loops in graphs that are not reducible, (2) making graphs reducible, and (3) summary of advanced results.
References
- [1]
- Mattew S. Hecht and Jeffrey D. Ullman.
Flow graph reducibility.
1972.
- [2]
- Mattew S. Hecht and Jeffrey D. Ullman.
Characterizations of reducible flow graphs.
1974.
- [3]
- Ganesan Ramalingam.
Identifying loops in almost linear time.
1999.
- [4]
- Robert Tarjan.
Testing flow graph reducibility.
1974.
Thanks to Claudia Grigore for feedback.
No comments:
Post a Comment
Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.