Theory and Practice: A Certain Type of Pretty-Printing in OCAML

You want to print Node(Leaf, [{ra=1;rb="a"};{ra=2;rb="b"}], Node(Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf)) as "1 * a * 2 * b * 3 * foo * bar". How do you do it?

Let us begin with the simpler problem of printing a star-separated list of integers.

open Format
let simple_print = function ->
  | [] -> ()
  | [x] -> printf "%d" x
  | x :: xs -> printf "%d@ * "; simple_print xs

Since there is one less star than there are integers, the last integer in the list is treated in a special way. Of course, we may want to also print lists of strings in this way; or we may want to use some other separator than a star; or we may want to use a different formatter.

let better_print separator pp ppf = function ->
  | [] -> ()
  | [x] -> fprintf ppf "%a" pp x
  | x::xs -> 
      fprintf ppf "%a@ %s " pp x separator; 
      better_print separator pp ppf xs
let pp_string ppf x = fprintf ppf "%s" x
let pp_int ppf x = fprintf ppf "%d" x
better_print "||" pp_string std_formatter ["foo"; "bar"]

Now comes the interesting part. Suppose that what we really want to print are the primitive values that hang in a complicated tree of nested data structures like lists, variants, records, sets, and so on. For example, the value mentioned in the summary has type t.

type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t
let v =
  Node
    (Leaf,
    [{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
    Node
      (Leaf,
      [{ra=3;rb=["foo";"bar"]}],
      Leaf))

We'll probably have a pretty-printing function pp_r for type r and a pretty-printing function pp_t for type t. Let's think about the latter. For Node(left,data,right) we'd be tempted to say "print left, print star, print data, print star, print right." Of course, left might be a leaf, in which case we should omit the first "print star." Or perhaps, the list data is empty, in which case we should omit one of the two "print star"s. In a more complicated scenario data might be a list of sets and, although the list is not empty, all of its sets may be. In order to figure whether this is the case we'd need to traverse the whole list. Or perhaps we have a record with $n$ fields. After printing the first $k$ fields we should print a star when one of the first $k$ fields is nonempty and one of the last $n-k$ fields is nonempty. So we'd better start by asking once in the beginning each field if it is empty: Otherwise we may end up asking each field $\sim n$ times and we know that checking emptiness may take linear time. (It would be pretty stupid for pretty-printing of a big structure with few leafs to take quadratic time.) Or perhaps, …

OK. That's a nightmare, so let's try to take a step back. We want:

a way to traverse the data structure,
a way to print each possible leaf, and
a way to know when we get to a leaf whether it is the first (or last).

That's easy! It sounds like what we want is a fold. Well, almost, because when we hit a string leaf we need to call one function and when we hit an integer leaf we need to call another. So it is a specialized fold but, nevertheless, our implementation may look very similar to what we'd do for a fold.

let pp_string' ppf first x = 
  if not first then fprintf ppf "@ * "; pp_string ppf x; false
let pp_int' ppf first x =
  if not first then fprintf ppf "@ * "; pp_int ppf x; false
let pp_r ppf first {ra=ra; rb=rb} =
  let first = pp_int' ppf first ra in
  List.fold_left pp_string' first rb
let pp_t ppf first = function
  | Leaf -> first
  | Node (left, data, right) ->
      let first = pp_t ppf first left in
      let first = List.fold_left (pp_r ppf) first data in
      pp_t ppf first right
let pp_r' ppf = pp_r ppf false
let pp_t' ppf = pp_t ppf false

This works and is already much nicer than the previous option. However, There are still problems.

The functions pp_string' and pp_int' are obtained from their unprimed counterparts in a very systematic way, so we shouldn't write the same code once for each leaf type.
Similarly for pp_r' and pp_t'.
Note that for lists we used List.fold_left so for the type t we should be able to do something similar: Implement a generic fold, and then use it.
The star is hardcoded. What if we want a different separator?

So let's see how can we encode the recipe for obtaining pp_string' and pp_int'. We may be tempted to try the following.

let pp_sep separator pp = 
  fun ppf first x -> if not first then fprintf ppf "@ %s " separator; false

Yay: We also solved the problem with the star being hard-coded! What we can do is to add a (first) parameter pp to pp_r and pp_t. Then we redefine the primed versions.

let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false
let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false

We then replace calls to pp_int' by pp pp_int. Everything's sweet, right? Well, … no, this doesn't typecheck. The reason is in OCAML's FAQ. You may want to try these modifications to convince yourself that they indeed do not work. Anyway, the workaround is in OCAML's FAQ. So here's how to do it.

open Format

(* example data structures *)
type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t

(* a value to test with *)
let v =
  Node
    (Leaf,
    [{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
    Node
      (Leaf,
      [{ra=3;rb=["foo";"bar"]}],
      Leaf))

(* how to print 'leafs' *)
let pp_string ppf x = fprintf ppf "%s" x
let pp_int ppf x = fprintf ppf "%d" x

(* a recipe for making leaf printing foldable *)
type sep_wrapper = { 
  primitive : 'a. (formatter->'a->unit)->formatter->bool->'a->bool
}
let pp_sep separator = {
  primitive = fun pp ppf first x -> 
    if not first then fprintf ppf "@ %s " separator; pp ppf x; false
}

(* folds for our homogeous data structures *)
let rec fold_t f s = function
  | Leaf -> s
  | Node (l, xs, r) -> fold_t f (List.fold_left f (fold_t f s l) xs) r

(* pretty printing workers, used mostly internally *)
let pp_r pp ppf first {ra=ra; rb=rb} =
  let first = pp.primitive pp_int ppf first ra in
  List.fold_left (pp.primitive pp_string ppf) first rb
let pp_t pp ppf = 
  fold_t (pp_r pp ppf)

(* pretty printing for the typical user *)
let pp_whole pp = 
  fun separator ppf x -> ignore (pp (pp_sep separator) ppf true x)
let pp_r' = pp_whole pp_r
let pp_t' = pp_whole pp_t

(* a small test *)
let _ =
  printf "@[%a@." (pp_t' "*") v

Note that the strategy of implementing a general fold and then use it does not work for type r because it holds two kinds of leafs (integers and strings). Also note that what I called leafs (or leaves, whatever) do not need to be OCAML primitive types.

Theory and Practice

15 September 2010

A Certain Type of Pretty-Printing in OCAML

No comments:

Post a Comment