A summary of some basic results on reducible flowgraphs. Feedback is welcome. In particular, the ‘easy’ and ‘obvious’ parts are notorious for hiding mistakes. So let me know if something doesn’t look as ‘easy’ as I claim. This post is also available in PDF form.
Preliminaries. A flowgraph is a digraph with an initial node 0 from which all other nodes are reachable. Typically nodes represent statements in some low-level language (like Java bytecode) and edges represent possible successors in an execution. There are various restricted classes of flowgraphs that have been studied and are important in practice. For example, if a flowgraph is a dag (has no cycles) then the program terminates for all inputs; if a flowgraph is a two-terminal series–parallel graph then it corresponds to programs written using only two simple composition operations: sequence (;) and choice ([]). The subject of this post, reducible flowgraphs, is a class of flowgraphs that correspond to programs that can be written without using goto — loops like while and for suffice.
Definition 1
A flowgraph is reducible when it does not have a
strongly connected subgraph with two (or more) entries.
A graph is strongly connected when there is a
path x ↝ y for all nodes x
and y. Node x is an entry for a subgraph H when
there is a path 0 ↝p x such that p ∩
H = {x}. (By notation abuse, p and H stand for sets of
nodes, including the endpoints for the path.)There are at least three other equivalent definitions, as Theorem 1 shows. Those definitions use a few more graph-theoretic concepts. A node x dominates node y when all paths 0 ↝ y contain node x. An edge x → x (for some node x) is a cycle-edge. A depth first search (DFS) of a flowgraph partitions its edges into tree edges (making up a spanning tree known as a DFS tree), forward edges (pointing to a successor in the spanning tree), back edges (pointing to a predecessor in the spanning tree, plus cycle-edges), and cross edges (the others). Tree edges, forward edges, and cross edges form a dag known as a DFS dag.
Theorem 1 ([1, 2])
The following statements about a flowgraph are equivalent:
Proof. It is not hard to show (by taking the contrapositive) that (1)
⇒ (2) and (2) ⇒ (3) and (3) ⇒ (1) and (4) ⇒
(1). It remains to show that (4) is implied by one of the
others, which I’ll do, again, by looking at the contrapositive.- It is reducible.
- Every back edge has its source dominated by its target, for all DFS trees.
- It has a unique DFS dag.
- It can be transformed into a single node by
repeated application of the transformations T1 and T2:
- T1: Remove a cycle-edge.
- T2: Pick a non-initial node y that has only one incoming edge x → y and glue nodes x and y.
(1) ⇒ (4): If T1 and T2 cannot reduce a graph to a single node then the graph must have a strongly connected subgraph with two entries. This task can be split in two: First, show that if T1 and T2 cannot be applied at all then there is a strongly connected subgraph with two entries; second, show that applications of T1−1 and T2−1 maintain the existence of a strongly connected subgraph with two entries.
First part: If T1 and T2 cannot be applied then each non-initial node has ≥ 2 (distinct) predecessors. The initial node must have ≥ 2 (distinct) successors, let’s call them 1,…,k. Let’s look at nodes x and y ranging over this set. Because node y must have a second incoming edge there has to be a path x ↝ y for some x. If we can find x ≠ y for all y then we are done, since at least two of 1,…,k must be in strongly connected subgraph. Otherwise, let’s examine some node y whose second incoming edge comes from a cycle y ↝ y. The other nodes in this cycle must themselves have some second incoming edge which might come from some node x ≠ y, and we are done as before. Otherwise, we apply recursively the same reasoning on the subflowgraph induced by the nodes of the cycle y ↝ y (with node y as the initial node).
Second part: Easy. (Some would say ‘exercise’.) ☐
The loop forest. Typically a programmer writes in a language with while, which gets desugared, and then various tools (like optimizers or program verifiers) must reconstruct the loop structure of the original program. You may wonder why not keep the while constructs around in the low-level language:
- Fewer language constructs means other algorithms are easier.
- If the programmer is abusing goto we still want to know what he meant.
Definition 2
In a reducible flowgraph, the loop Lx associated with
the node x is the unique maximal strongly connected subgraph
for which node x is an entry.
Such a graph always exists because a strongly connected
subgraph of a reducible flowgraph is dominated by its unique
entry. In other words, Definition 2 says that
the loop Lx is the set of nodes y that are dominated by
node x and also have a path y ↝ x back to the entry.
(The path 0 ↝ x that witnesses node x being an entry
cannot contain any dominated node y ≠ x.)Proof. If node x dominates node y and there exists a node z ∈ Lx ∩ Ly then Lx ∩ Ly = Ly. If there is no domination between node x and node y then there is no node dominated by both so Lx ∩ Ly = ∅. ☐
When we extend the definition of loops to non-reducible graphs we’ll make sure that Propositions 1 and 2 still hold. Thus, it always makes sense to talk about a loop forest (in which there is a path Lx ↝ Ly when Lx ⊇ Ly).
The structure of loops in reducible flowgraphs can be analyzed by looking at what back edges they contain. A cycle is a graph of the form 1 → 2 → ⋯ → k → 1. Every cycle (which is a subgraph) of a flowgraph must contain one back edge. In reducible flowgraphs a stronger statement holds.
Proof. Suppose there is a cycle x′ → x ↝p y′ → y ↝q x′, which contains back edges x′ → x and y′ → y. By definition, the nodes of a cycle are distinct, but we allow paths p and q to be (not necessarily both) of length 0 so it is possible that x = y′ and it is possible that y = x′. Every path 0 ↝ x′ must contain node x according to statement (2) in Theorem 1. Since there are paths 0 ↝ y ↝q x′, this implies that all paths 0 ↝ y must contain node x (because path q does not contain node x, since they are part of a cycle). In other words, node x dominates node y. Symmetrically, node y dominates node x. This is only possible if x=y, which is a contradiction. ☐
Because of Lemma 1, Hecht and Ullman [2] originally defined loops in reducible graphs by associating them to back edges: The loop Ly → x is the union of paths x ↝ y that do not use node x as an intermediate node, plus the back edge itself. This shows that there isn’t really only one natural way of defining loops in reducible flowgraphs. For the graph in Figure 1(left) the old definition gives loop L1 → 0={0,1} nested in loop L2 → 0={0,1,2}, while our definition only finds the loop L0=L2 → 0 (plus the uninteresting L1={1} and L2={2}). The problem with the old definition, however, is that it does not imply Proposition 2 as can be seen in Figure 1(right).
Definition 2 is always more coarse grained than that of Hecht and Ullman [2].
Lemma 2
In a reducible flowgraph, we have Lx = ∪Ly → x,
where the union ranges over back edges y → x.
Tarjan [4] described an algorithm for recognizing reducible flowgraphs. Ramalingam [3] presents a modified version that identifies the loop forest. (Tarjan [4] says that Ullman discovered independently the same algorithm, although he didn’t publish it. Ramalingam [3] doesn’t say that he presents a modified algorithm. Perhaps the modification is folklore?) The core of a Python implementation appears in Figure 2. After executing
def dfs_forward(x): assert status[x] == unseen status[x] = seen for y in succ[x]: if status[y] == unseen: dfs_forward(y) elif status[y] == seen: backpred[y] += [x] for y in backpred[x]: dfs_backward(y, x) status[x] = donedef dfs_backward(y, z): y = find(y) if inner[y] != -1 or y == z: return if status[y] != done: print ’NOT reducible!’ exit(1) inner[y] = z union(y, z) for x in pred[y]: dfs_backward(x, z)
dfs_forward
(0)
we
have x
==
inner
[
y
]
when Lx ⊃ Ly and
there’s no other loop in between. In other words, the
array inner
contains the parent pointers of
the loop forest. The array succ
represents the
flowgraph through adjacency lists; the array pred
represents the reversed graph through adjacency lists. The
functions union
and find
are the
standard union-find with the added trick that
find
(
x
)
is guaranteed to return the z
in the last call union
(
y
,
z
)
that involved the set
to which x
now belongs. (Figuring out a nice way to
implement this trick is not hard, but fun.)Remark 1 The recursive implementation looks nice but it tends to crash the stack, especially in Python, which has a rather low default recursion limit (1000 on my computer).
It is hopefully not too hard to see why this algorithm works after all those lemmas and theorems.
More to come. I plan three more posts: (1) finding loops in graphs that are not reducible, (2) making graphs reducible, and (3) summary of advanced results.
References
- [1]
- Mattew S. Hecht and Jeffrey D. Ullman.
Flow graph reducibility.
1972.
- [2]
- Mattew S. Hecht and Jeffrey D. Ullman.
Characterizations of reducible flow graphs.
1974.
- [3]
- Ganesan Ramalingam.
Identifying loops in almost linear time.
1999.
- [4]
- Robert Tarjan.
Testing flow graph reducibility.
1974.
Thanks to Claudia Grigore for feedback.
No comments:
Post a Comment
Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.