You want to print
Node(Leaf, [{ra=1;rb="a"};{ra=2;rb="b"}], Node(Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf))
as "1 * a * 2 * b * 3 * foo * bar". How do you do it?
Let us begin with the simpler problem of printing a star-separated list of integers.
open Format let simple_print = function -> | [] -> () | [x] -> printf "%d" x | x :: xs -> printf "%d@ * "; simple_print xs
Since there is one less star than there are integers, the last integer in the list is treated in a special way. Of course, we may want to also print lists of strings in this way; or we may want to use some other separator than a star; or we may want to use a different formatter.
let better_print separator pp ppf = function -> | [] -> () | [x] -> fprintf ppf "%a" pp x | x::xs -> fprintf ppf "%a@ %s " pp x separator; better_print separator pp ppf xs let pp_string ppf x = fprintf ppf "%s" x let pp_int ppf x = fprintf ppf "%d" x better_print "||" pp_string std_formatter ["foo"; "bar"]
Now comes the interesting part. Suppose that what we really want to print
are the primitive values that hang in a complicated tree of nested data
structures like lists, variants, records, sets, and so on. For example, the
value mentioned in the summary has type t
.
type r = { ra : int; rb : string list } type t = Leaf | Node of t * r list * t let v = Node (Leaf, [{ra=1;rb=["a"]};{ra=2;rb=["b"]}], Node (Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf))
We'll probably have a pretty-printing function pp_r
for type r
and a pretty-printing function pp_t
for type
t
. Let's think about the latter. For Node(left,data,right)
we'd be tempted to say
"print left, print star, print data, print star, print right." Of course,
left
might be a leaf, in which case we
should omit the first "print star." Or perhaps, the list data
is empty, in which case we should omit
one of the two "print star"s. In a more complicated scenario data
might be a list of sets and, although the
list is not empty, all of its sets may be. In order to figure whether this is
the case we'd need to traverse the whole list. Or perhaps we have a record
with $n$ fields. After printing the first $k$ fields we should print a star
when one of the first $k$ fields is nonempty and one of the last $n-k$ fields
is nonempty. So we'd better start by asking once in the beginning each
field if it is empty: Otherwise we may end up asking each field $\sim n$
times and we know that checking emptiness may take linear time. (It would be
pretty stupid for pretty-printing of a big structure with few leafs to take
quadratic time.) Or perhaps, …
OK. That's a nightmare, so let's try to take a step back. We want:
- a way to traverse the data structure,
- a way to print each possible leaf, and
- a way to know when we get to a leaf whether it is the first (or last).
That's easy! It sounds like what we want is a fold. Well, almost, because when we hit a string leaf we need to call one function and when we hit an integer leaf we need to call another. So it is a specialized fold but, nevertheless, our implementation may look very similar to what we'd do for a fold.
let pp_string' ppf first x = if not first then fprintf ppf "@ * "; pp_string ppf x; false let pp_int' ppf first x = if not first then fprintf ppf "@ * "; pp_int ppf x; false let pp_r ppf first {ra=ra; rb=rb} = let first = pp_int' ppf first ra in List.fold_left pp_string' first rb let pp_t ppf first = function | Leaf -> first | Node (left, data, right) -> let first = pp_t ppf first left in let first = List.fold_left (pp_r ppf) first data in pp_t ppf first right let pp_r' ppf = pp_r ppf false let pp_t' ppf = pp_t ppf false
This works and is already much nicer than the previous option. However, There are still problems.
- The functions
pp_string'
andpp_int'
are obtained from their unprimed counterparts in a very systematic way, so we shouldn't write the same code once for each leaf type. - Similarly for
pp_r'
andpp_t'
. - Note that for lists we used
List.fold_left
so for the typet
we should be able to do something similar: Implement a generic fold, and then use it. - The star is hardcoded. What if we want a different separator?
So let's see how can we encode the recipe for obtaining
pp_string'
and
pp_int'
. We may be tempted to try the
following.
let pp_sep separator pp = fun ppf first x -> if not first then fprintf ppf "@ %s " separator; false
Yay: We also solved the problem with the star being hard-coded! What
we can do is to add a (first) parameter pp
to
pp_r
and
pp_t
. Then we redefine the primed
versions.
let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false
We then replace calls to pp_int'
by pp pp_int
. Everything's sweet, right?
Well, … no, this doesn't typecheck. The reason is in
OCAML's FAQ.
You may want to try these modifications to convince yourself that they
indeed do not work. Anyway, the workaround is in OCAML's FAQ. So here's
how to do it.
open Format (* example data structures *) type r = { ra : int; rb : string list } type t = Leaf | Node of t * r list * t (* a value to test with *) let v = Node (Leaf, [{ra=1;rb=["a"]};{ra=2;rb=["b"]}], Node (Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf)) (* how to print 'leafs' *) let pp_string ppf x = fprintf ppf "%s" x let pp_int ppf x = fprintf ppf "%d" x (* a recipe for making leaf printing foldable *) type sep_wrapper = { primitive : 'a. (formatter->'a->unit)->formatter->bool->'a->bool } let pp_sep separator = { primitive = fun pp ppf first x -> if not first then fprintf ppf "@ %s " separator; pp ppf x; false } (* folds for our homogeous data structures *) let rec fold_t f s = function | Leaf -> s | Node (l, xs, r) -> fold_t f (List.fold_left f (fold_t f s l) xs) r (* pretty printing workers, used mostly internally *) let pp_r pp ppf first {ra=ra; rb=rb} = let first = pp.primitive pp_int ppf first ra in List.fold_left (pp.primitive pp_string ppf) first rb let pp_t pp ppf = fold_t (pp_r pp ppf) (* pretty printing for the typical user *) let pp_whole pp = fun separator ppf x -> ignore (pp (pp_sep separator) ppf true x) let pp_r' = pp_whole pp_r let pp_t' = pp_whole pp_t (* a small test *) let _ = printf "@[%a@." (pp_t' "*") v
Note that the strategy of implementing a general fold and then use it
does not work for type r
because it holds
two kinds of leafs (integers and strings). Also note that what I called leafs (or leaves, whatever) do not need to be OCAML primitive types.
No comments:
Post a Comment
Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.