A summary of some basic results on reducible
flowgraphs. Feedback is welcome. In particular, the ‘easy’ and
‘obvious’ parts are notorious for hiding mistakes. So let me know
if something doesn’t look as ‘easy’ as I claim. This post is also
available in PDF form.
Preliminaries. A
flowgraph is a
digraph with an
initial node 0 from
which all other nodes are reachable. Typically nodes represent
statements in some low-level language (like Java bytecode)
and edges represent possible successors in an execution.
There are various restricted classes of flowgraphs that have
been studied and are important in practice. For example, if
a flowgraph is a
dag
(has no cycles) then the program terminates for all inputs;
if a flowgraph is a
two-terminal series–parallel graph then it corresponds to
programs written using only two simple composition operations:
sequence (
;) and choice (
[]). The subject
of this post, reducible flowgraphs, is a class of flowgraphs
that correspond to programs that can be written without using
goto — loops like
while and
for
suffice.
Definition 1
A flowgraph is reducible
when it does not have a
strongly connected subgraph with two (or more) entries.
A graph is
strongly connected when there is a
path x ↝
y for all nodes
x
and
y. Node
x is an
entry for a subgraph
H when
there is a path 0 ↝
p x such that
p ∩
H = {
x}. (By notation abuse,
p and
H stand for sets of
nodes, including the endpoints for the path.)
There are at least three other equivalent definitions, as
Theorem 1 shows. Those definitions use a few
more graph-theoretic concepts. A node
x dominates
node
y when all paths 0 ↝
y contain node
x. An
edge
x →
x (for some node
x) is a
cycle-edge.
A
depth first search (DFS) of a
flowgraph partitions its edges into
tree edges (making
up a spanning tree known as a
DFS tree),
forward
edges (pointing to a successor in the spanning tree),
back
edges (pointing to a predecessor in the spanning tree, plus
cycle-edges), and
cross edges (the others). Tree edges,
forward edges, and cross edges form a dag known as a
DFS
dag.
Theorem 1 ([1, 2])
The following statements about a flowgraph are equivalent:
-
It is reducible.
- Every back edge has its source dominated by its target,
for all DFS trees.
- It has a unique DFS dag.
- It can be transformed into a single node by
repeated application of the transformations T1 and T2:
-
T1: Remove a cycle-edge.
- T2: Pick a non-initial node y that has only one
incoming edge x → y and glue nodes x and y.
Proof. It is not hard to show (by taking the contrapositive) that (1)
⇒ (2) and (2) ⇒ (3) and (3) ⇒ (1) and (4) ⇒
(1). It remains to show that (4) is implied by one of the
others, which I’ll do, again, by looking at the contrapositive.
(1) ⇒ (4): If
T1 and
T2 cannot reduce a graph to
a single node then the graph must have a strongly connected
subgraph with two entries. This task can be split in two:
First, show that if
T1 and
T2 cannot be applied at all then
there is a strongly connected subgraph with two entries; second,
show that applications of
T1−1 and
T2−1 maintain the
existence of a strongly connected subgraph with two entries.
First part: If
T1 and
T2 cannot be applied then each
non-initial node has ≥ 2 (distinct) predecessors. The initial
node must have ≥ 2 (distinct) successors, let’s call them
1,…,
k. Let’s look at nodes
x and
y ranging over this
set. Because node
y must have a second incoming edge there
has to be a path
x ↝
y for some
x. If we can find
x ≠
y for all
y then we are done, since at least two of
1,…,
k must be in strongly connected subgraph. Otherwise,
let’s examine some node
y whose second incoming edge comes
from a cycle
y ↝
y. The other nodes in this cycle must
themselves have some second incoming edge which might come from
some node
x ≠
y, and we are done as before. Otherwise, we
apply recursively the same reasoning on the subflowgraph induced
by the nodes of the cycle
y ↝
y (with node
y as the
initial node).
Second part: Easy. (Some would say ‘exercise’.) ☐
The loop forest.
Typically a programmer writes in a language with
while,
which gets desugared, and then various tools (like optimizers or
program verifiers) must reconstruct the loop structure of the
original program. You may wonder why not keep the
while
constructs around in the low-level language:
-
Fewer language constructs means other algorithms
are easier.
- If the programmer is abusing goto we still want to
know what he meant.
An important property of reducible flowgraphs is that there is a
natural definition for loops. Intuitively, we want loops to be
strongly connected subgraphs and, in reducible flowgraphs, such
subgraphs can be associated with their entry.
Definition 2
In a reducible flowgraph, the loop
Lx associated with
the node x is the unique maximal strongly connected subgraph
for which node x is an entry.
Such a graph always exists because a strongly connected
subgraph of a reducible flowgraph is dominated by its unique
entry. In other words, Definition 2 says that
the loop
Lx is the set of nodes
y that are dominated by
node
x and also have a path
y ↝
x back to the entry.
(The path 0 ↝
x that witnesses node
x being an entry
cannot contain any dominated node
y ≠
x.)
Proposition 1
Loops are strongly connected.
Proposition 2
Two loops are either disjunct or one is included in the other.
Proof. If node
x dominates node
y and there exists a node
z ∈
Lx
∩
Ly then
Lx ∩
Ly =
Ly. If there is no domination
between node
x and node
y then there is no node dominated by
both so
Lx ∩
Ly = ∅. ☐
When we extend the definition of loops to non-reducible
graphs we’ll make sure that Propositions 1
and 2 still hold. Thus, it always makes sense
to talk about a
loop forest (in which there is a path
Lx
↝
Ly when
Lx ⊇
Ly).
The structure of loops in reducible flowgraphs can be analyzed
by looking at what back edges they contain. A
cycle is a
graph of the form 1 → 2 → ⋯ →
k → 1. Every cycle
(which is a subgraph) of a flowgraph must contain one back edge.
In reducible flowgraphs a stronger statement holds.
Lemma 1
Every cycle in a reducible flowgraph contains exactly
one back edge.
Proof. Suppose there is a cycle
x′ →
x ↝
p y′
→
y ↝
q x′, which contains back edges
x′
→
x and
y′ →
y. By definition, the nodes of a cycle are
distinct, but we allow paths
p and
q to be (not necessarily
both) of length 0 so it is possible that
x =
y′ and it is
possible that
y =
x′. Every path 0 ↝
x′ must contain
node
x according to statement (2) in Theorem 1.
Since there are paths 0 ↝
y ↝
q x′,
this implies that all paths 0 ↝
y must contain node
x
(because path
q does not contain node
x, since they are
part of a cycle). In other words, node
x dominates node
y.
Symmetrically, node
y dominates node
x. This is only possible
if
x=
y, which is a contradiction. ☐
Figure 1: flowgraph examples
|
|
|
Because of Lemma 1, Hecht and
Ullman [2] originally defined loops in reducible
graphs by associating them to back edges: The loop
Ly → x
is the union of paths
x ↝
y that do not use node
x
as an intermediate node, plus the back edge itself. This shows
that there isn’t really only one natural way of defining loops in
reducible flowgraphs. For the graph in Figure 1(left)
the old definition gives loop
L1 → 0={0,1} nested in
loop
L2 → 0={0,1,2}, while our definition only finds the
loop
L0=
L2 → 0 (plus the uninteresting
L1={1} and
L2={2}). The problem with the old definition, however, is
that it does not imply Proposition 2 as can be
seen in Figure 1(right).
Definition 2 is always more coarse grained
than that of Hecht and Ullman [2].
Lemma 2
In a reducible flowgraph, we have Lx = ∪Ly → x,
where the union ranges over back edges y → x.
def dfs_forward(x):
assert status[x] == unseen
status[x] = seen
for y in succ[x]:
if status[y] == unseen:
dfs_forward(y)
elif status[y] == seen:
backpred[y] += [x]
for y in backpred[x]:
dfs_backward(y, x)
status[x] = done
def dfs_backward(y, z):
y = find(y)
if inner[y] != -1 or y == z:
return
if status[y] != done:
print ’NOT reducible!’
exit(1)
inner[y] = z
union(y, z)
for x in pred[y]:
dfs_backward(x, z)
Figure 2: finding loops in reducible flowgraphs
|
Tarjan [4] described an
algorithm for recognizing reducible flowgraphs.
Ramalingam [3] presents a
modified version that identifies the loop forest.
(Tarjan [4] says that Ullman discovered
independently the same algorithm, although he didn’t publish it.
Ramalingam [3] doesn’t say that he presents
a modified algorithm. Perhaps the modification is folklore?) The
core of
a Python implementation appears in Figure 2.
After executing
dfs_forward(0) we
have
x==inner[y] when
Lx ⊃
Ly and
there’s no other loop in between. In other words, the
array
inner contains the parent pointers of
the loop forest. The array
succ represents the
flowgraph through adjacency lists; the array
pred
represents the reversed graph through adjacency lists. The
functions
union and
find are the
standard
union-find with the added trick that
find(x) is guaranteed to return the
z
in the last call
union(y, z) that involved the set
to which
x now belongs. (Figuring out a nice way to
implement this trick is not hard, but fun.)
Remark 1
The recursive implementation looks nice but it tends to crash
the stack, especially in Python, which has a rather low default
recursion limit (1000 on my computer).
It is hopefully not too hard to see why this algorithm works
after all those lemmas and theorems.
More to come. I plan three more posts: (1) finding
loops in graphs that are not reducible, (2) making graphs
reducible, and (3) summary of advanced results.
References
-
[1]
- Mattew S. Hecht and Jeffrey D. Ullman.
Flow graph reducibility.
1972.
- [2]
- Mattew S. Hecht and Jeffrey D. Ullman.
Characterizations of reducible flow graphs.
1974.
- [3]
- Ganesan Ramalingam.
Identifying loops in almost linear time.
1999.
- [4]
- Robert Tarjan.
Testing flow graph reducibility.
1974.
Thanks to Claudia Grigore for feedback.