Simple Generators: Alternative to Lazy Evaluation

Generators: the API for traversal and non-determinism
Introduction
The splendors and miseries of lazy evaluation
Simple generators in OCaml
Simple generators in Haskell
Streams of random data: a pitfall of lazy evaluation
Incremental, Linear Pretty-printing

Introduction

Incremental stream processing, pervasive in practice, makes the best case for lazy evaluation. Lazy evaluation promotes modularity, letting us glue together separately developed stream producers, consumers and transformers. Lazy list processing has become a cardinal feature of Haskell. It also brings the worst in lazy evaluation: its incompatibility with effects and unpredictable and often extraordinary use of memory. Much of the Haskell programming lore are the ways to get around lazy evaluation.

We propose a programming style for incremental stream processing based on typed simple generators. It promotes modularity and decoupling of producers and consumers just like lazy evaluation. Simple generators, however, expose the implicit suspension and resumption inherent in lazy evaluation and hence are robust in the presence of effects. Simple generators let us accurately reason about memory consumption and latency. They can be realized in a strict or lazy language. The remarkable implementation simplicity and efficiency of simple generators strongly motivates investigating and pushing the limits of their expressiveness.

To substantiate our claims we give a new solution to the notorious pretty-printing problem. Like earlier solutions, it is linear, backtracking-free and with bounded latency. It is also modular, structured as a cascade of separately developed stream transducers, which makes it simpler to write, test and to precisely analyze latency, time and space consumption. It is compatible with effects including IO, letting us read the source document from a file, and format it as we read.

Joint work with Simon Peyton-Jones and Amr Sabry.

Version

The current version is December 2012

References

yield-pp.pdf [242K]
Lazy v. Yield: Incremental, Linear Pretty-printing
The full version of the paper published in the Proceedings of the 10th Asian Symposium on Programming Languages and Systems. Kyoto, Japan, December 11-13, 2012.
Springer's LNCS 7705, pp. 190-206

talk.pdf [184K]
The annotated slides of the talk presented at APLAS 2012 on December 12, 2012 in Kyoto, Japan.

The splendors and miseries of lazy evaluation

Let us recall the universally agreed upon greatest attractions and greatest drawbacks of lazy evaluation. John Hughes in his famous paper regarded lazy evaluation as one of the main reasons why functional programming matters. By decoupling data producers and consumers, lazy evaluation enables code reuse and promotes programming by composing combinators. Alas, this greatest asset is incompatible with effects, and we pay for it with the excruciating difficulty of estimating the space requirements of a program and plugging memory leaks.

To illustrate, we take the main point example from Lennart Augustsson's well-discussed May 2011 blog post. The example is the Haskell function any, which tells if an element of a list satisfies a given predicate.

    any :: (a -> Bool) -> [a] -> Bool

When the predicate returns True for some element of the list, any should return True and stop scanning the list. The function can be defined by the composition of two already written, standard Haskell functions or and map:

    any f = or . map f
   
    or  :: [Bool] -> Bool
    map :: (a -> b) -> [a] -> [b]

Whenever the list contains an element satisfying the predicate, thus defined any indeed returns True -- even if the list is infinite:

    t1 = any (>10) [1..]
    -- True

We will not get any result in a strict language, where arguments must be fully evaluated before the application and hence or gets to work only after map f finishes -- which, if it is applied to an infinite list, does not. The infinite list is not a frivolity: Although all practical programs ever examine a finite prefix of an infinite list, the size of that prefix is often difficult to tell beforehand. Lazy evaluation spares us from guessing the limits and imposing arbitrary restrictions. Rather, lazy evaluation generates as much of the list as needed for a given task, on demand.

The on-demand evaluation helps even if the list is finite. In a strict language, map f must first build the intermediate list of Booleans before or can search through it. In Haskell, or consumes a new element of the intermediate list as soon as map f produces it. The intermediate list is never constructed in full.

The any processing chain can be grown farther. For example, the input list of numbers could be the result of parsing a column of text.

    t2 = any (>10) . map read . lines $ "2\n3\n5\n7\n11\n13\nINVALID"
    -- True

Splitting into lines, parsing, comparing -- all happens incrementally and on demand. Indeed, t2 returns True even though "INVALID" is unparseable as a number. After the satisfying element is found, there is no further demand on the list of integers, and rest of it is not constructed.

But what if the string to parse and search is to be read from a file? Assuming read_string to be an IO action that reads a string from a file, we may re-write the code as

    (any (>10) . map read . lines) `fmap` (read_string fname :: IO String)

How much of the file read_string has to read however? For as long as any needs data. However, any is a pure function consuming the result of an IO action read_string, the action that returns the result after it fully finishes. An action cannot yield the result half-way. Therefore, any cannot give the feedback to read_string while it is still reading. Without the feedback, read_string has no choice but to read the whole file. We have lost the incrementality and the early termination, and we have to completely re-write the code to regain them.

But an action can yield the result half-way, suspending and resuming as needed! It is called Lazy IO: the standard Haskell function hGetContents arranges for the on-demand reading of a file:

    t2_lazy = do
      h   <- openFile test_file ReadMode
      str <- hGetContents h
      let result = any (>10) . map read . lines $ str
      return result

Lazy IO is not a solution however: it is a problem. Its many pitfalls have been described in detail elsewhere (see the FLOPS 2012 paper). For example: the above code has the action to open a file but no action to close it. Since the point of the example is to return True as soon as the satisfying line is found, the file in general will not be read completely. Therefore, we have to rely on the garbage collector to close the file, when it gets around to it, if ever. The Lazy IO code thus leaks a scarce resource, the file handle. Lazy IO is especially problematic when the input comes from a pipe or a network channel. It is not specified how much hGetContents really reads. It usually reads by buffer-full and it may read-ahead arbitrarily, which for communication channels can (and does) lead to a deadlock.

Let's re-write the reading-parsing-checking pipeline t2 in a strict language, embedded in Haskell via the (strict) IO monad.

    t2_strict = bracket (openFile test_file ReadMode) hClose (loop (>10))
     where
      loop f h = do
        r <- liftM f $ liftM read $ hGetLine h
        if r then return True else loop f h

The code reads as little of the file as needed to decide if the file has a line with the number greater than 10. We attain the incremental processing and early termination. The code now includes hClose: the file will definitely be closed as soon as the result is determined. A strict language lets us reason about resources such as memory, file handles, etc. and tightly control them. However, the compositionality is lost. The program t2_strict is not written by combining standard list-processing functions. We had to write the recursion explicitly and inline or, hence forgoing code reuse. Lennart Augustsson expounds on this principal drawback of strict languages in his blog article.

There is one more option: incremental IO via a stream, or monadic list:

    data ListV m a = Nil | Cons a (List m a)
    type List m a = m (ListV m a)

Like Lazy IO, stream processing lets us reuse code and build programs by pipe-lining previously written fragments. Also like Lazy IO, stream processing leaks resources such as file handles; see the accompanying code for details.

To summarize: Non-strict evaluation promotes modularity and compositionality by decoupling producers and consumers. It is incompatible with effects and makes it very difficult to reason about memory and other resources. Strict languages, in contrast, have the excellent resource cost model and are compatible with effects, but couple data consumers and producers and hence break modularity and inhibit reuse. However, there is a way to untangle producers and consumers even in strict languages: generators.

Version

The current version is December 2012

References

Lennart Augustsson: More points for lazy evaluation, May 2011
<http://augustss.blogspot.com/2011/05/more-points-for-lazy-evaluation-in.html>

YAny.hs [9K]
The complete code for the example

The APLAS paper briefly talks about tying the knot -- a fascinating application on one hand, which also lets us tie up ourselves with our own rope.

Preventing memoization in (AI) search problems
Lazy evaluation is a vexing obstacle for non-deterministic, probabilistic programming and general AI search problems, where memoization is exactly the wrong trade-off.

describe.pdf [334K]
The full version of the paper presented at FLOPS 2012. The first half of the paper describes many problems of Lazy IO.

Simple generators in Haskell

Generators bring the main attraction of lazy evaluation -- modularity and compositionality -- to strict, call-by-value languages, and they are compatible with effects. This page is about simple generators, of the following interface. (Simple generators in OCaml are even simpler, to the point of triviality).

    type GenT e m
    instance MonadTrans (GenT e)
   
    type Producer m e            = GenT e m ()
    type Consumer m e            = e -> m ()
    type Transducer m1 m2 e1 e2  = Producer m1 e1 -> Producer m2 e2
   
   
    yield   :: Monad m => e -> Producer m e
    runGenT :: Monad m => Producer m e -> Consumer m e -> m ()
    foldG   :: Monad m => (s -> e -> m s) -> s -> Producer (StateT s m) e -> m s

The interface is simple: one abstract type GenT e m a and three functions. GenT e is a monad transformer for generators that yield the value of type e in some monad m. Intuitively, yielding is like tracing, from printf or similar debug tracing statements embedded in the code. As the program runs it sends traced intermediate results, usually to a log file. A debugger can intercept the traced data to analyze them.

A Producer m e is a computation executed only for its tracing, of values of type e; the overall result is irrelevant. The generator interface provides one primitive producer, yield. Other producers are built by monadic composition: for example, a producer that reads a file and yields its characters one-by-one.

    fileG :: (GBracket m, MonadIO m) => FilePath -> Producer m Char
    fileG fname = gbracket (liftIO $ openFile fname ReadMode) (liftIO . hClose) loop
     where
      loop h = do
       b <- liftIO $ hIsEOF h
       if b then return () else liftIO (hGetChar h) >>= yield >> loop h

The overall result is indeed irrelevant: one runs fileG for the data it yields. The code looks like the strict IO code t2_strict of the running example described earlier. Emphatically, the file handle is closed as soon as the result (or an exception) is obtained. As in strict IO, the generator lets us account for and tightly control resources. Unlike t2_strict code, fileG does no processing of file data -- it merely yields them. The function fileG is a only producer. The consumption of file data is clearly untangled; the same fileG producer can be used with many consumers.

A Consumer m e is an action to be executed on `intercepted trace data'. The standard function putChar is one, simple consumer. Its type is indeed Consumer IO Char. The primitive runGenT connects a producer and a consumer, acting like debugger's setting an action to do at a trace point, and running a program. Hooking up fileG and putChar gives us the cat program that copies a file to the standard output.

    catG :: FilePath -> IO ()
    catG fname = fileG fname `runGenT` putChar

Simple generators have very simple semantics, embodied in the equation

    runGenT C[yield v] consumer === C[consumer v]

where C[] is a (monadic) evaluation context containing no embedded runGenT. The consumer action is executed right at the trace point. In the case of catG, the read character is immediately written out; no intermediate data structures are built.

A more interesting consumer is the generator analogue of the function or :: [Bool] -> Bool on lists.

    orG :: Monad m => Producer (ErrT Bool m) Bool -> m Bool
    orG gen = either id (const False) `liftM` runErrT (runGenT gen orC)
     where
      orC :: MonadError Bool m => Consumer m Bool
      orC True  = throwError True
      orC False = return ()

The semantics of Haskell's or is purposely undetermined: or should stop and return True whenever a True element is found. However, or does not have to stop right away. It may speculatively examine a few more elements of the list, so long as this speculation is bounded. With effects we cannot afford such ambiguity: Any speculation becomes observable. By throwing the exception, orG makes it clear that the evaluation really stops, as soon as the True element is found.

What makes simple generators interesting is that yield is not only a producer but is also a consumer. Indeed, yield also has the type Consumer (GenT e m) e. In other words, a consumer of data may itself use yield, to produce other data. The equational semantics of simple generators derives the following law:

    runGenT gen yield           === gen

where gen is any generator. Thus yield is the left and the right unit of runGenT.

Since yield is both a consumer and a producer, we can write transducers: Transducer m1 m2 e1 e2 consumes data of the type e1 and yields e2. If one regards Producer m e as an effectful version of a list [e], then Transducer m1 m2 e1 e2 is a list transformer. A simple list transformer is map. Here is the effectful version of it:

    mapG :: Monad m => (e1 -> e2) -> Transducer (GenT e2 m) m e1 e2
    mapG f gen = runGenT gen (yield . f)

Once gen produced e1, it is passed to the function f whose result is immediately yielded. Therefore, mapG transduces in lock-step. It keeps no transformation state. If the running time of f is bounded, so is the latency of mapG f (the time from obtaining the input to producing the output). The function foldG -- which is the analogue of the (left) list fold, as its signature suggests -- lets us write stateful transducers. The transducer linesG below transforms a producer of Char to the producer of lines. It uses the internal state -- the look-ahead buffer -- to accumulate characters until it sees a newline, at which point the accumulated string is yielded. The transducer is the analogue of Data.List.lines.

    type TrState m = StateT String (GenT String m)
    linesG :: Monad m => Transducer (TrState m) m Char String
    linesG gen = foldG tr [] gen >>= at_end
     where
     tr s '\n' = yield (reverse s) >> return []
     tr s c    = return (c:s)
     at_end [] = return ()
     at_end s  = yield (reverse s)

We can now re-write the any example from the previous section:

    anyG f = orG . mapG f
   
    t2_gen :: IO Bool
    t2_gen = anyG (>10) . mapG read . linesG $ fileG test_file

The code looks remarkably like that for lazy evaluation, t2. Complex producers and consumers are built by literally composing previously written components. There is clearly code reuse. Unlike the lazy evaluation code t2, we read from a file, and we definitely close the file as soon as the result is obtained.

Our generators are intentionally simple. They are asymmetric, one-shot, implicit suspensions. They are asymmetric because yield only passes data into the context without accepting anything from it. Simple generators are one-shot since the computation suspended by yield can be resumed only once. Unlike delimited control operators, simple generators do not explicitly capture their context as a first-class object. For these reasons, simple generators are limited in expressiveness. For example, two simple generators cannot run side-by-side; we cannot solve the same fringe problem with simple generators. But they are very easy to implement: simple generators are essentially restartable exceptions. Their implementation in Haskell is trivial, directly following from the equational specification.

    type GenT e m = ReaderT (e -> m ()) m
    
    yield :: Monad m => e -> Producer m e
    yield e = ask >>= \foreach -> lift $ foreach e
    
    runGenT :: Monad m => Producer m e -> Consumer m e -> m ()
    runGenT m f = runReaderT m f

Simple generators were first introduced in 1975 in the programming language CLU (where they are called iterators). In CLU, yield amounts to a regular function call: ``a yield effectively calls the loop body, which returns to the iterator when it is finished. Imposing the limitations on iterators was done to get the efficient, single stack implementation, albeit at the expense of some expressive power.'' (Barbara Liskov: A History of CLU). The CLU implementation was the inspiration for our Haskell library of simple generators, which are likewise lightweight. The insight of the CLU implementation is yield calling the consumer as a regular function, which it must have obtained from the environment.

We have shown how simple generators decouple data producers and consumers in a (embedded) strict language, and how simple they are and easy to implement. Aren't they too simple? Quite surprisingly, they turn out expressive enough for the notorious pretty-printing problem.

Version

The current version is December 2012

References

Generators: the API for traversal and non-determinism
Generators in general

GenT.hs [4K]
The interface and implementation for simple generators

Y4.hs [6K]
Running example from the APLAS paper (tab expansion) and simple illustrative examples of generators and transducers

Y1.hs [13K]
A more detailed and slow derivation of simple generators and an example of repmin-like problem (tree traversals) implemented with simple generators

Simple generators in OCaml

The programming style of simple generators, which unbundles data consumers and producers and promotes modularity and compositionality, can be realized in many languages, strict or lazy. In fact, simple generators were first introduced in a strict language, CLU, as mentioned earlier. Here we show the implementation in OCaml. It is simple to the point of triviality. To convince ourselves, we derive it in detail. However trivial, simple generators are up to solving the thorny linear pretty-printing problem.

The running example (the same as in the PPYield paper) is tab expansion: to read and print a file expanding all tabs. We do not assume the file is made of lines of bounded width. We implement two tab expansion algorithms -- naive and more sophisticated -- keeping an eye on the ease of replacing one with the other.

We start with an analogue of non-strict (lazy) evaluation in OCaml: sequences, provided by the Seq standard library module. If the input stream is char Seq.t, the following function naively expands its tabs: replacing each tab character with 8 spaces:

    let rec tabX0 : char Seq.t -> char Seq.t = fun s ->
      let open Seq in
      match uncons s with
      | None -> empty
      | Some ('\t',rest) -> append (init 8 (Fun.const ' ')) (tabX0 rest)
      | Some (c,rest)    -> cons c @@ tabX0 rest

The more sophisticated version adds spaces up to the next tab stop (multiple of 8). It needs local state, for the current position.

    let tabX1 : char Seq.t -> char Seq.t = fun s ->
      let open Seq in
      let rec loop pos s = 
        match uncons s with
        | None -> empty
        | Some ('\t',rest) -> 
             let pos' = (pos + 8) - pos mod 8 in
             append (init (pos' - pos) (Fun.const ' ')) (loop pos' rest)
        | Some (c,rest) -> cons c @@ loop (if c = '\n' then 0 else pos + 1) rest
      in loop 0 s

To apply either tabX0 or tabX1 to a file, we first have to convert the file contents to a sequence. The standard library does not provide such a function (for a good reason, as we shall see soon). One may attempt to implement it as follows:

    let seq_of_file : string -> char Seq.t = fun fname ->
      let ch = open_in fname in
      Seq.unfold 
          (fun ch -> Option.map (fun c -> (c,ch)) @@ In_channel.input_char ch) ch

We then easily solve the problem:

    let expand_file_lazy fname = 
      seq_of_file fname |> tabX0 |> Seq.iter print_char

There is a lot to like about this solution: it is clearly modular, built of three components: one is only concerned with data source (file), another only with data transformation, and the third one with consumption. It is easy to swap the transformer tabX0 with tabX1, or abstract over it. The processing is incremental: the file is read on-demand; a read character is immediately transformed and printed. Therefore, we may process arbitrarily large files in bounded memory (amortized constant memory: we still need to create and dispose of closures.)

There are also many problems. The function seq_of_file opens the file, but does not close it -- because it cannot know when a read character will be handled and when we are finished with the sequence. Therefore, we have to rely on finalization to close the file, which may happen late and hence leak the limited resource (file handle). More serious is error handling: there is none. If an I/O error occurs, an exception is raised. It is raised however not in seq_of_file but in tabX0, which does not, and should not care about I/O and hence cannot deal with I/O exceptions.

To properly handle errors and promptly close the file, we have to use `strict IO':

    let expand_file_strict fname = 
      let rec loop ch = 
        match In_channel.input_char ch with
        | None -> ()
        | Some '\t' -> print_string (String.make 8 ' '); loop ch
        | Some c    -> print_string (String.make 1 c); loop ch
      in
      In_channel.with_open_text fname loop

The input file is now closed, as soon as its processing is finished, and even in case of an I/O error. By wrapping the last line in a try-block we may easily report I/O errors. Alas, reading the file and tab expansion are now intertwined. We can no longer easily extend the naive tab expansion with a more sophisticated one: we have to find a way to initialize and pass around its state. One clearly sees what John Hughes meant when saying that strict evaluation entangles consumers and producers and inhibits modularity.

It is not all bad: one may still factor out the handling of the read character

    let expand_file_strict fname = 
      let consume = function
        | '\t' -> print_string (String.make 8 ' ')
        | c    -> print_string (String.make 1 c) 
      in
      let rec loop ch = 
        match In_channel.input_char ch with
        | None -> ()
        | Some c -> consume c; loop ch
      in
      In_channel.with_open_text fname loop

and then abstract it out

    let expand_file_gen0 fname consume = 
      let rec loop ch = 
        match In_channel.input_char ch with
        | None -> ()
        | Some c -> consume c; loop ch
      in
      In_channel.with_open_text fname loop

giving us the function of the type string -> (char -> unit) -> unit. The last part of the type is remarkable and evokes many associations. It is worth giving it a name:

    type 'e gen = ('e -> unit) -> unit

This is the type of functions that take consumers of 'e data and feed them data, invoking repeatedly. (There are other ways to view this type.) With this in mind, we rewrite expand_file_gen0 more suggestively

    let file_gen fname : char gen = fun yield -> 
      let rec loop ch = 
        match In_channel.input_char ch with
        | None -> ()
        | Some c -> yield c; loop ch
      in
      In_channel.with_open_text fname loop

This is what we call a simple generator. Just like seq_of_file, it is concerned only with getting data, leaving the processing to the consumer yield, received as an argument. Unlike seq_of_file, all the reading occurs within the dynamic extent of file_gen: that is, file_gen knows when the processing is finished; if an I/O exception occurs, file_gen is in a position to handle it.

The abstracted out consumer

    let tabY0 : char -> unit = function
      | '\t' -> print_string (String.make 8 ' ')
      | c    -> print_string (String.make 1 c)

is the function to yield to. It is quite like tabX0. However, there is no rest: tabY0 deals only with the current character. Relatedly, tabY0 is not recursive.

The more sophisticated tab expansion needs local state:

    let tabY1 : char -> unit = 
      let pos = ref 0 in
      function
        | '\t' -> let pos' = (!pos + 8) - !pos mod 8 in
                  print_string (String.make (pos' - !pos) ' ');
                  pos := pos'
        | c    -> print_string (String.make 1 c); 
                  pos := if c = '\n' then 0 else !pos + 1

(one could re-write it in explicit state-passing style, with unclear benefits. The state is inherent in the algorithm and has to be present one way or the other. In tabY1 it is encapsulated and not visible from outside.)

One may refactor further, separating the tab expansion from writing out the result:

    let tabY1' : (string -> unit) -> (char -> unit) = fun yield ->
      let pos = ref 0 in
      function
        | '\t' -> let pos' = (!pos + 8) - !pos mod 8 in
                  yield (String.make (pos' - !pos) ' ');
                  pos := pos'
        | c    -> yield (String.make 1 c); 
                  pos := if c = '\n' then 0 else !pos + 1

obtaining a transducer: transformer of consumers. The overall tab expansion becomes

    let expand_file_gen fname = 
       file_gen fname @@ tabY1' @@ print_string

which is just as modular as expand_file_lazy -- but with proper error handling. It is also more efficient: there is no need to keep creating, invoking and disposing of thunks (`cons cells' of Seq.t).

A further generalization may extend the type of the generator ('e -> unit) -> unit to ('e -> 'w) -> 'w where 'w is a monoid. For example, file_gen then becomes:

    type 'w monoid = {unit: 'w; app: 'w -> 'w -> 'w}
    
    let file_genm fname : 'w monoid -> (char->'w) -> 'w = fun m yield -> 
      let rec loop ch = 
        match In_channel.input_char ch with
        | None   -> m.unit
        | Some c -> m.app (yield c) (loop ch)
      in
      In_channel.with_open_text fname loop

One can't help but think of monoid comprehensions.

In conclusion: simple generator is a function that takes a consumer, typically called yield, as an argument, and invokes it repeatedly. It is that simple. Looking back, yield as an ordinary function call was exactly how generators were implemented in CLU.

I thank Daniel Bünzli for helpful discussions and encouragement.

Version

The current version is August 2023

References

y1.ml [7K]
The complete code for the derivation of simple generators, with tests

ppyield.ml [26K]
Incremental, linear pretty-printer
Quite more extensive and realistic application of simple generators

Leonidas Fegaras, David Maier: Towards an Effective Calculus for Object Query Languages. SIGMOD 1995.
The paper that introduced and developed monoid comprehensions, which deserve to be much more widely known.

Incremental, Linear Pretty-printing

Although simple generators are intentionally limited in expressiveness, they are adequate for many tasks -- more tasks than one might think. A new solution to the pretty-printing problem is a non-trivial illustration of simple generators. The new solution is a contribution by itself -- a surprising contribution since simple generators are believed to be too simple for this task. The solution shows how can we reason about and accurately predict the time and space performance of a program. The Haskell version of our pretty-printer demonstrates straightforward reasoning about space, in Haskell. The OCaml version is even simpler.

The pretty-printing problem was posed by Oppen in 1980 as a `nice' formatting of a tree-structured document within a fixed page width. Oppen has abstracted the problem to the bare minimum; his formulation is used in almost all other subsequent works on the topic. In Haskell, the source document is represented by the data type:

    data Doc = Text String | Line | Doc :+: Doc | Group Doc

The source document is composed of text with advisory line breaks Line, which are to be interpreted consistently within a Group. Lines in a group (excluding embedded groups) are to be interpreted either as all spaces or as all line breaks. Lines should be formatted as spaces if the entire group fits within the remainder of the current line. For example, a sample document

    Group (Text "A" :+: Line :+: 
             Group (Text "B" :+: Line :+: Text "C"))

should we formatted on one line if the page width w is 5 or more, on two lines if the page width is 3 or 4, and on three lines otherwise:

    A B C              A                 A
                       B C               B
                                         C

The desired optimal algorithm should have:

running time linear in document size n;
running time independent of the line width w;
bounded latency: the amount of time from the moment a piece of data such as A is seen by the formatter till the moment it is placed in the output document; The latency is bound by the page width w.
run in constant memory, when the input document is taken from a file.

A realistic library would provide additional document primitives, e.g., for setting margins and indentation -- which are easy to add. In this paper however we concentrate on the core of the problem.

Oppen has demonstrated the first optimal algorithm, with co-routines. Oppen's work has inspired a large number of attempts to derive the optimal algorithm equationally (unsuccessfully so far) or to find a simpler algorithm. The problem is indeed complex: the naive implementation has the running time exponential in document size and the unbounded latency -- and so has the backtracking. Memoization, or pre-computing the widths of all groups, reduces the running time to linear in n but requires likewise linear in n extra space; the latency remains unbounded. Satisfying all the requirements has proved to be challenging. A particular challenge is finding a modular way to add the second optimization, pruning (computing the widths of groups only up to w).

Our new solution is distinguished by modularity and ease and precision of the analysis about latency, time, and especially space. It does not use lazy evaluation in any guise. We build an efficient pretty-printer by combining two key optimizations: (i) avoiding re-computations of group width by memoization or pre-computation and (ii) pruning, computing the width of a group only as far as needed to determine if the group fits on the line. These optimizations are present in one form or another in all optimal pretty-printing implementations. Our development is distinguished by a systematic, modular and compositional application of the optimizations. We build the pretty-printer as a cascade of separately developed, tested and analyzed stream transducers. We stress the ease of analysis and its composability. We can swap the components, for example, replacing the component that traverses an in-memory document with a generator that reads the document from a file. The new structure helped us discover edge cases overlooked in previous implementations.

Version

The current version is December 2012

References

talk.pdf [184K]
The APLAS 2012 talk outlines the simple-generator solution and shows benchmark results. The predicted time and space complexities well agree with the observed ones. It is possible and easy, after all, to accurately predict the space usage of a Haskell program, if it is written in a particular style (e.g., with generators).

yield-pp.pdf [242K]
The full version of the paper (Sections 4 and 5 and Appendix A) explain the solution in detail, demonstrating reasoning about time, space, and latency.

PPYield.hs [31K]
The complete Haskell code for the solution, and the benchmarks

ppyield.ml [26K]
The complete OCaml code for the solution, and the benchmarks

Streams of random data: a pitfall of lazy evaluation

A real-life example posted on Haskell-Cafe highlighted both the attraction of lazy evaluation as a demand-driven programming for free, and the eventual disappointment. Haskell lazy lists seem like the ideal interface between producers and consumers of a series of data. Producers do not have to worry about generating the `right' amount of data; they keep producing, putting the results into an `infinite' list. A consumer will pick as many elements as it needs, effectively stopping the producer afterwards. Since data are produced on demand, they are consumed as soon as they are created and do not have to be buffered. Alas, even small modifications to the program disrupt the synchronization; consumer suddenly waits until producer finishes -- which it never does, busily generating the infinite list and eventually crashing the program. The example posted by Dale Jordan on Haskell-Cafe did exactly that.

We use the example to show off explicit streams and simple generators. They too separate consumers and producers. They are not lazy and hence robust. Unlike Haskell lazy lists, they are reliably composable and work in the presence of arbitrary effects. Coding the example lead to a surprise: simple generators form a monad in yet another way.

Dale Jordan hoped to use a Haskell list as a synchronization mechanism: his producer will repeatedly generate chunks (finite lists) of (random) numbers and the ultimate consumer will pick however many it needs. The example relies on State to keep the current state of a random number generator. The State monad is implemented in pure Haskell; inlining the monadic bind should give us pure Haskell code, theoretically. There should not be any impediment to lazy evaluation. And yet we see that even benign effects interfere badly with lazy evaluation.

Dale Jordan message used the following sample chunk producer, generating a length-n list of random numbers uniformly distributed within [m,m+9]:

    type Rand r = State r
   
    something :: RandomGen g => Int -> Int -> Rand g [Int]
    something n m = sequence . replicate n . state $ randomR (m,m+9)

The combinator iterateR runs the chunk producer repeatedly, concatenating the chunks into an infinite list:

    iterateR :: RandomGen g => Rand g [Int] -> Rand g [Int]
    iterateR m = do
      chunk <- m
      (chunk ++) <$> iterateR m

The ultimate consumer will pick however many elements it needs. It really works: the consumer drives the producer and no infinite lists are actually constructed. The comment line below shows the result.

    run1 = evalState (take 10 <$> (iterateR (something 2 0))) $ mkStdGen 42
    -- [1,1,7,4,6,1,8,1,8,5]

Alas, if we start composing iterateR computations we run into a problem. Neither of the two following expressions terminate. They loop and try to build an infinite list, which quickly crashed the program.

    run2 = evalState
              (take 10 <$> (iterateR (something 2 0) >>
                            iterateR (something 3 10)))
              $ mkStdGen 42
    
    run3 = evalState
              (take 10 <$> (iterateR (something 2 0) >>=
                            iterateR . (something 3 . head)))
              $ mkStdGen 42

Dale Jordan's message was entitled ``Why doesn't laziness save the day here?''

The question is easy to answer. Recall, iterateR is a State computation: it receives the current state of the random number generator, and produces a list and the final state. In the expressions run2 and run3, the second iterateR, before it can run, has to receive the state of the random number generator from the first iterateR. However, iterateR is meant to run forever; it never produces the final state. When we compose the iterateR computation like iterateR ... >>= k, we should bear in mind two entangled data dependencies between iterateR ... and k. One is on the list result of iterateR, which can be computed lazily and consumed lazily. If k also generates random numbers, there is the second dependency, on the state of the generator. Before the generator in k is to yield anything, in needs the fully computed state from the previous generator. Hence the data produced by k depend both on the list and on the state of iterateR .... It is the second dependency that causes the divergence, that breaks the intended lazy list synchronization. The second dependency is hidden inside the State monad and the implementation of the random number generator, which makes it difficult to see. The overall problem with lazy evaluation is that the data dependencies, which drive the computation, may be numerous, may be hidden in abstraction layers, and are not visible in types.

The first solution is to use a proper stream, which makes it explicit that producing each element requires an effect.

    data List m a = Nil | Cons a (ListM m a)
    type ListM m a = m (List m a)

The monadic stream ListM is not a Haskell list. We have to write our own ListM-processing functions like headL, replicateL, appendL, etc. (There is probably a Hackage package for it.) The chunk producer and the iterateR combinator from the original example keep the same form (modulo the replacement of replicate, take and other list-processing functions with their stream equivalents):

    somethingL :: RandomGen g => Int -> Int -> ListM (Rand g) Int
    somethingL n m = sequenceL . replicateL n . state $ randomR (m,m+9)
    
    iterateRL :: RandomGen g => ListM (Rand g) Int -> ListM (Rand g) Int
    iterateRL m = appendL m (iterateRL m)

All three run functions from the original example now work, as expected; here is the third one:

    run3L = evalState
              (takeLL 10 (iterateRL (somethingL 2 0) >>=
                          iterateRL . (somethingL 3 . headL)))
              $ mkStdGen 42
    -- [2,8,5,7,2,9,2,9,6,6]

One may use a fancier stream data type:

    data RList m a = RList [a]              -- known finite prefix
                           [m [a]]          -- a stream of producing actions

Dale Jordan aptly characterized this solution as ``reifying the implicit continuation in my iterateR's recursive definition into a data structure''

Simple generators, described elsewhere on this page, offer another solution. Recall the relevant definitions (which are almost the entire implementation of the simple generators):

    type GenT e m = ReaderT (e -> m ()) m
    type Producer m e = GenT e m ()
    type Consumer m e = e -> m () 
    
    yield :: Monad m => e -> Producer m e
    yield e = ask >>= \f -> lift $ f e

Clearly GenT e m is a monad and GenT e is a monad transformer. It may be surprising that Producer m e is also a monad, whose return operation is yield and whose (>>=) we will write as bindG. Re-writing Dale Jordan's code with generators hardly changes the form of the code; iterateR however becomes particularly simple:

    somethingG :: (RandomGen g, MonadState g m) =>
                  Int -> Int -> Producer m Int
    somethingG n m = sequence_ . replicate n .
                     (>>= yield) . lift . state $ randomR (m,m+9)
    
    iterateRG :: Monad m => Producer m Int -> Producer m Int
    iterateRG m = m >> iterateRG m

All three run functions in the original code finish, with the expected result. Here is the third one:

    run3G = evalState
              (takeG 10 (iterateRG (somethingG 2 0) `bindG`
                          (iterateRG . (somethingG 3))))
              $ mkStdGen 42
    -- [2,8,5,7,2,9,2,9,6,6]

The attentive reader may noticed the curious absence of head in run3G (cf. run3 and run3L). That mystery is left as an exercise to the reader.

In conclusion: lazy evaluation is indeed driven by the demand for data, which is its strength -- and downfall: data dependencies can be numerous, entangled, hidden behind abstractions such as monad and interfaces, and not at all expressed in types. Surprises abound. With one hand lazy evaluation promotes modularity, with the other hand, breaks it. Streams and generators separate producers and consumers and promote modularity. With them, however, the evaluation order is explicit and predictable. We have seen that the stream or generator-based code may be just as elegant as the code with lazy lists -- sometimes even more so. As a bonus, it works.

Version

The current version is March 2014

References

Randoms.hs [8K]
The complete code for the example, with stream- and generator-based solutions

Dale Jordan: Why doesn't laziness save the day here?
The message posted on Haskell-Cafe on Mon Jan 4 20:31:20 EST 2010

Simple Generators: Alternative to Lazy Evaluation

Introduction

The splendors and miseries of lazy evaluation

Simple generators in Haskell

Simple generators in OCaml

Incremental, Linear Pretty-printing

Streams of random data: a pitfall of lazy evaluation

Last updated August 11, 2023