previous   next   up   top

Generators: the API for traversal and non-determinism

 

Generators have become popular because they give the benefit of lazy evaluation -- modular on-demand processing -- in strict languages or when lazy evaluation does not apply because the computation has observable effects such as IO. Our main result clarifies the meaning of generators and may lead to new ways of implementing them: computations that yield decompose into exceptions and non-determinism. To be precise if abstruse, a monad of generators is an exception-monad--transformed LogicT monad. The one-line Haskell definition of yield should clarify:

     yield x = raise x `mplus` return ()

We come to the main result after an overview of the origins of generators as the encapsulation of traversal of a real or virtual collection; a non-deterministic computation being an example of the latter. We focus on the convenience of defining generators for new collections and on building generators by composition. Yield then arises as a mode of composition.

Our main result sprouts the implementations of generators in Haskell, logical programming and ML -- the implementations that are correct and working on the first try. Iterators like for-loop are programmed-in rather than built-in. We program not only customary traversals but also accumulating ones, not only generator statements but also value-producing generator expressions, not only nesting of generators but also running them side-by-side.

Overview
Main result
Implementations of generators
Iteratees: Incremental multi-level input processing and collection enumeration

I am grateful to Roman Cheplyaka for helpful discussions and suggestions.


 

Origins

Modern generators first appeared in the language Alphard developed in the the group of Mary Shaw at CMU in mid-1970s. They were inspired by `generators' in IPL-V and mapping functions in Lisp. In 1975 generators, under the name of iterators, migrated to the language CLU. Both Alphard and CLU stressed data abstraction. Alphard in particular was aimed at the development of verifiably reliable software. Abstraction was indispensable in making (Hoare-style) proofs simpler and modular. Since iterative computations invariably loop over the elements of some collection (be it as simple as the range of integers), it made sense to separate operations on the current element (the loop body) from obtaining the current element (the loop control). The goal was to hide the details of the collection, encapsulate the state of the enumeration, and make iterative computations modular and easier to prove terminating and correct. As Shaw et al. wrote, generators abstract over control as functions abstract over operations.

About two years later, generators as the fundamental control structure appeared in the programming language Icon. Icon, as its predecessor SL5, focused on goal-directed evaluation, expressing non-deterministic computations. A generator in Icon is an expression that may produce several values on demand. The expression's context determines how many values will actually be produced. Again there was a separation between obtaining (candidate) values and consuming the values, possibly demanding more. A non-deterministic operation hides the details of generating the candidates and encapsulates the generation state. It represents the traversal of a virtual collection whose elements are computed during the enumeration as needed.

However different Alphard and Icon are, they both have for-loop, implemented in terms of the generator producing integers from i through j on demand -- called upto(i,j) in Alphard and i to j in Icon.

I thank Richard O'Keefe for pointing out early AI languages.

References
Mary Shaw, William A. Wulf and Ralph L. London
Abstraction and verification in Alphard: defining and specifying iteration and generators.
Communications of the ACM, v20, N8, pp. 553-564. August 1977.
< http://portal.acm.org/citation.cfm?id=359782 >

Barbara Liskov: A History of CLU.
MIT LCS Technical Report 561. April l992.
Section 3.10. Iterators
< http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-561.pdf >

Ralph E. Griswold: An Overview of the Icon Programming Language; Version 9. March 2, 1996.
Section 2.2. Generators
< http://www.cs.arizona.edu/icon/docs/ipd266.htm >

Daniel G. Bobrow and Bertram Raphael: New Programming languages for AI research
Xerox Palo Alto Research Center Report CSL-73-2. August 20, 1973.
< http://www.bitsavers.org/pdf/xerox/parc/techReports/CSL-73-2_New_Programming_Languages_For_AI_Research.pdf >
Generators also appeared in many AI languages of early 1970s such as SAIL, CONNIVER, QLISP/INTRELISP, POPLER -- alongside backtracking, co-routines and other control facilities. Icon stands out in the systematic use of generators as the fundamental control structure, on which backtracking and other control facilities are built.

 

Simple generators and lazy lists

We introduce generators on several simple examples of search and enumeration, and relate them to lazy lists, in particular, Haskell lists.

Generators in Alphard and CLU encapsulate the enumeration of a collection. They have a simple interface with the operations to test for the end of the enumeration, to obtain the current element and advance the enumeration. This is the interface of streams, or lazy lists (lists whose cons-cells and elements are produced on demand).

Generators in Icon at first blush seem different. Here is an excerpt from Section 2.2 of `An Overview of the Icon Programming Language.'

The results that a generator produces depend on context. In a situation where only one result is needed, the first is produced, as in
     sentence := "Store it in the neighboring harbor"
     i := find("or", sentence)
which assigns the value 3 to i. If the result produced by a generator does not lead to the success of an enclosing expression, however, the generator is resumed to produce another value. An example is
     	if (i := find("or", sentence)) > 5 then write(i)
Here the first result produced by the generator, 3, is assigned to i, but this value is not greater than 5 and the comparison operation fails. At this point, the generator is resumed and produces the second position, 23, which is greater than 5. The comparison operation then succeeds and the value 23 is written.

These simple examples are straightforward to implement with lazy lists, even more so in Haskell, whose native lists are lazy and which has a rich (lazy) list processing library. First we write the analogue of Icon's find function, to obtain the index of an occurrence of the pattern pat in the string str.

     findIL :: String -> String -> [Int]
     findIL pat str = findIndices (isPrefixOf pat) . tails $ str
Since there may be several or none such occurrences, Icon's find is non-deterministic, and so is our findIL, as seen from its type. We take advantage of the standard library function tails, to compute the list of all suffixes of str, from str itself down to the empty string. The standard library function findIndices checks each element against the predicate isPrefixOf pat and returns the indices of the matching elements, that is, the positions of all occurrences of pat. The clarity of the algorithm gives confidence in its correctness, but also worry about its efficiency. The appeal of lazy lists is the removal of that worry: none of these suffix and index computations are done unless demanded. The demand, in Icon and in Haskell, comes from the context. In our example, the context needs a value to bind to i.
     sentence = "Store it in the neighboring harbor"
     find_ex1 = do
      case filter (>4) $ findIL "or" sentence of
       (i:_) -> print i
       []    -> return ()
The case expression asks for the first element in the list of indices that is greater than 4 (indexing in Haskell starts with 0). The filter expression asks for the first index, filters it out, and ask for the second one, 22. The latter index passes the test, incorporated in the result list and eventually bound to i and printed. Since no more indices are demanded, the sentence is not searched past the second occurrence of "or". The example Icon and Haskell code indeed execute similarly.

The context of a generator may also ask for all values in generator's collection. Here is Icon again:

A generator can be resumed repeatedly to produce all its results by using the every-do control structure. An example is
     every i := find("or", sentence)
        do write(i)
which writes all the positions at which "or" occurs in sentence. For the example above, these are 3, 23, and 33. Generation is inherited like failure, and this expression can be written more concisely by omitting the optional do clause:
     every write(find("or", sentence))

A lazy list likewise can actually produce all of its elements, in the context of an operation that fully traverses the list and prints its elements. The similarity of the Haskell and Icon code here is striking.

     mapM_ print $ findIL "or" sentence

It is regrettable that the similarity between Haskell lazy lists and Icon generators holds only for simple examples. We need better abstractions to chain and combine the generators. Haskell lazy lists become less suitable for generators with a lot of state -- and of no use when the generation has an observable effect such as IO.

 

Generators and LogicT

Haskell lazy lists, although similar to generators on simple examples, fall short of an adequate model. Chaining of generators calls for better abstractions, which become necessary for generators with observable effects. The obvious next candidate model, MonadsPlus, is not sufficient either. At the very least, we need LogicT.

We again turn to `An Overview of the Icon Programming Language,' this time, for larger examples:

There are several other control structures related to generation. One is alternation,
     expr1 | expr2
which generates the results of expr1 followed by the results of expr2. Thus
     every write(find("or", sentence1) |
        find("or", sentence2))
writes the positions of "or" in sentence1 followed by the positions of "or" in sentence2. Again, this sentence can be written more compactly by using alternation in the second argument of find():
     every write(find("or", sentence1 | sentence2))
Another use of alternation is illustrated by
     (i | j | k) = (0 | 1)
which succeeds if any of i, j, or k has the value 0 or 1.
Writing nested generators like those above with Haskell lists becomes sufficiently cumbersome, prompting us to adopt a better abstraction. First, we change the primitive building blocks: instead of the standard list constructors [] and (:), we will use mzero for the empty generator, return for the singleton one, mplus to model alteration and (>>=) to model chaining. These are the four operations of MonadPlus. Although the operations can be expressed in terms lists -- lists are one implementation of MonadPlus -- we shall maintain the abstraction and treat the operations as primitive.

We easily write the last Icon's example, as tcomp below. First we define the equivalent of Icon equality comparison, which fails if the equality does not hold.

     equal :: MonadPlus m => m Int -> m Int -> m Int
     equal i j = do
                 iv <- i
                 jv <- j
                 if iv == jv then return iv else mzero
     
     tcomp i j k = (i `mplus` j `mplus` k) `equal` (return 0 `mplus` return 1)
We could have simplified the notation by writing 0 instead of return 0, taking advantage of overloaded numeric literals in Haskell. Although MonadPlus lets us write tcomp, it does not, however, let us use it, for example, in a conditional operator. We cannot implement the conditional, the negation-as-failure, or the collector of all generated values -- all the operations that test, query and advance the generator. MonadPlus lets us build non-deterministic computations, but not examine them. We need a richer interface, such as LogicT. The latter extends MonadPlus with the operation msplit that implements the minimal generator interface, of testing for emptiness, obtaining the current element and advancing. The LogicT library includes the Icon-style conditional, called ifte, written in terms of msplit. We are now able to use our tcomp:
     tcomp_ex1 :: (LogicT t, Monad m, MonadPlus (t m)) => t m String
     tcomp_ex1 = ifte (tcomp (return 2) (return 1) (return 3))
                      (\i -> return $ "Yes: " ++ show i)
                      (return "No")
To implement the first Icon example in the quoted excerpt, we need the analogue of Icon's find. We `lift' our earlier implementation in terms of lists to an arbitrary MonadPlus:
     -- The MonadPlus analogue of the lazy cons
     mcons :: MonadPlus m => a -> m a -> m a
     mcons h t = return h `mplus` t
     
     -- The operation foldr below replaces nil with mzero and cons with mcons
     findIM :: MonadPlus m => String -> String -> m Int
     findIM pat str = foldr mcons mzero $ findIL pat str
To make the example better, we pretend that sentence1 and sentence2 read their sentences from a file. (We only pretend, printing a trace message instead.) Our generators now have an observable effect, ruling out Haskell lists. Fortunately, LogicT has other implementations. In fact, LogicT is a monad transformer, parameterized by the monad responsible for the underlying effects.
     sentence1, sentence2 :: (MonadIO (t IO), LogicT t) => t IO String
     sentence1 = liftIO (putStrLn "sentence1") >> return sentence
     sentence2 = liftIO (putStrLn "sentence2") >> return "Sort of"
     
     twosen = observe $ iter Nothing $
              (liftIO . print =<< findIM "or" =<< sentence1 `mplus` sentence2)
The Haskell code looks quite similar to the Icon code, especially if we view =<< as a (call-by-value) function application. The function iter is a variant of bagofN of the LogicT library. The latter accumulates all (if the first argument is Nothing) or the first several values produced by a non-deterministic computation, returning the accumulated list. The function iter throws away the result since the computations were executed for their side effects. Both bagofN and iter are written in terms of msplit.
Version
The current version is May 2011.
References
LogicT: backtracking monad transformer with reification
The LogicT library provides several implementations, which can be ported to other languages.

GenCode.hs [16K]
The complete code for the Haskell examples.

 

Suspended computations as generators

This section describes the salient, nowadays, feature of generators -- the suspension, or yield. We introduce typical examples and show several unsatisfactory attempts to implement them with the already introduced tools such as LogicT. In the languages with generators, yield is a keyword, a special syntactic form with the dedicated support by the run-time system. We would like however to define yield, as a regular library function, in terms of the existing language features. In this section, we try several such features, without full satisfaction. We achieve our goal in the next section.

The generators so far have been built `by hand' (as lists) or by chaining and alternation. Although theoretically sufficient, these methods become ungainly when building generators encapsulating traversals with large amount of state. In a recursive traversal function, the state is implicit in the local variables of recursive invocations of the function. When writing generators, or zippers, the state has to be explicated, to be manipulated by the three operations of the generator interface (emptiness test, query and advance). Generators encapsulating traversal are closely related to zippers. Huet zippers however deal with algebraic data structures; the algebraic structure helps writing zippers, essentially by differentiation. Generators are more general, encapsulating traversals of virtual collections, with arbitrary effects.

When CLU borrowed generators from Alphard it introduced a new way of building them. Rather than writing the emptiness test, the query and the advance operations and packaging them up, a CLU programmer would use yields. The only way to write a generator (called iterator) in CLU was by suspending, usually an iterative computation. Since all iterative computations in CLU were built around iterators, yields was a mode of composing them. The inspiration for yields came from Simula 67 coroutines. Icon has a similar form, called suspend, borrowed from SL5, also inspired by Simula's coroutines.

Here is an example of suspend in Icon from the Icon's overview. Applying the procedure findodd defined below to the strings s1 and s2 non-deterministically gives all odd-valued positions at which s1 occurs in s2.

     procedure findodd(s1, s2)
        every i := find(s1, s2) do
           if i % 2 = 1 then suspend i
     end
Since find is itself a generator in Icon, findodd is a derived generator (in our case, a `filtered' generator), which gives us an idea of writing findodd with just a MonadPlus:
     findodd :: MonadPlus m => String -> String -> m Int
     findodd s1 s2 = do
       i <- findIM s1 s2
       if i `mod` 2 == 1 then return i else mzero
Alas, the translation from Icon to Haskell is not idiomatic; we would like to reproduce the explicit iteration over find's collection. We accomplish that in the next section.

The construct suspend, often called yield, has become the emblem of generators, spreading to many other languages. Our next example comes from Python, which, since version 2.2, has generators and the keyword yield to build them. The example is quoted from David Mertz's Charming Python article, as an elegant, exemplary generator. The similarity to Icon is quite striking.

     >>>> # A recursive generator that generates Tree leaves in in-order.
     >>> def inorder(t):
     ...     if t:
     ...         for x in inorder(t.left):
     ...             yield x
     ...         yield t.label
     ...         for x in inorder(t.right):
     ...             yield x

We describe several attempts at implementing this example in Haskell. First, we need a tree datatype

     type Label = Int
     data Tree = Leaf | Node Label Tree Tree deriving Show
and a sample complete tree tree1 that looks as follows:
     Node 1 (Node 2 (Node 4 Leaf Leaf) 
                    (Node 5 Leaf Leaf)) 
            (Node 3 (Node 6 Leaf Leaf) 
                    (Node 7 Leaf Leaf))

The first attempt builds the in_order generator by alternation. The current element may come from the left branch, from Node's label, or from the right branch:

     in_order1 Leaf = mzero
     in_order1 (Node label left right) =
        in_order1 left `mplus` 
        return label `mplus`
        in_order1 right
We can either collect the labels of all traversed nodes in a list, printing it afterwards, or print the labels as we go:
     in_order1_r  = observe (bagofN Nothing $ in_order1 tree1) >>= print
     -- [4,2,5,1,6,3,7]
     
     in_order1_r' = observe $ iter Nothing $ 
                    in_order1 tree1 >>= liftIO . print
This attempt is not satisfactory. Our translation has replaced chaining, implicit in the Python code, by alternation. That trick will not work in general, e.g., when traversing the tree post-order and yielding at each node the sum of the labels, including all descendants. We need the result of a branch traversal, and so we cannot replace chaining (bind) with alternation (mplus).

Traversing with yield looks quite similar to the logging of values as we compute -- to the Writer monad. It is tantalizing to implement yield as tell. It indeed works, in the first approximation. Here is our traversal generator:

     in_orderW :: (MonadWriter [Label] m, MonadIO m) => Tree -> m ()
     in_orderW Leaf = return ()
     in_orderW (Node label left right) = do
        in_orderW left
        liftIO . putStrLn $ "traversing: " ++ show label
        tell [label]
        in_orderW right

We have added the statement to print the label, the trace, right before `yielding.' Alas, this code is too strict, making the traversal not incremental. We have to complete the traversal before we get hold of the accumulated `log'. The following makes the problem clear:

     in_orderW_r = do
       (_,labels) <- runWriterT $ in_orderW tree1 
       let some_labels = take 3 labels
       print some_labels

We only wanted the first three labels encountered during the traversal. The printed trace shows that we have traversed the whole tree nevertheless.

Our last attempt uses `final zippers', implemented with the library of delimited control, the CC monad.

     in_orderCC :: Monad m => Tree -> CC (P m Label) m ()
     in_orderCC Leaf = return ()
     in_orderCC (Node label left right) = do
         in_orderCC left
         yield label
         in_orderCC right
The code is clear and idiomatic. Yet we hoped to avoid using delimited control explicitly, counting at gaining insight into delimited control and its idioms. We hoped that LogicT might turn out sufficient.
Version
The current version is May 2011.
References
David Mertz
Charming Python: Iterators and simple generators.
Autodidact, Gnosis Software, Inc. September 2001.
< http://www.ibm.com/developerworks/library/l-pycon.html >

Generator1.hs [3K]
Generator2.hs [7K]
Implementing Generators directly with the delimited continuations, using the CCMonad libraries.

GenCode.hs [16K]
The complete code for the Haskell examples.

 

Deriving yield

The earlier unsuccessful attempts inspire us to derive yield by equational reasoning.

The earlier examples have likened yield to raising an exception: both yield and raise set up a side-channel, for `side-results' of a computation. Both yield and raise leak values from the depths of an expression. Unlike raise, yield gives a choice to continue the computation. We need non-determinism then. Thus, we need a monad with the operations raise and mplus. The former should satisfy the law

     raise e >>= k === raise e
formalizing that raising an exception aborts the rest, k, of the computation. The operation mplus should satisfy the law of left-distributivity:
     (m1 `mplus` m2) >>= k === (m1 >>= k) `mplus` (m2 >>= k)
formalizing that a choice splits the computation. Monad transformers let us build the monad with the required properties, by applying the exception monad transformer, ErrorT, to an instance of MonadPlus. (Traditionally, the transformer ErrorT e has required the type of the exception e to be a member of the Error class. Since a generator should be able to yield values of any type, we need an error monad transformer free from this silly restriction. Such a transformer is called EitherT.) Since both monads have control effects, the order of the composition matters: applying a MonadPlus transformer to an exception monad fails the left distributivity (we leave showing that as an exercise).

Let e be a complex computation, which can be written like

     e = m >>= k
or, eta-expanded,
     e = m >>= \x -> k x
Let us regard m as an `already computed' part of e, with k being the remaining part. Since m is already computed, it is equivalent to return v, where v is the (intermediate) result, to be bound to x and used in the remainder of e.

Let us calculate:

     ey =def=  m >>= \x -> (raise x `mplus` return ()) >> k x
     ===   m >>= \x -> (raise x `mplus` return ()) >>= \() -> k x
           {left distributivity}
     ===   m >>= \x -> (raise x >>= \() -> k x) `mplus` (return () >>= \() -> k x)
           {the law of exception}
     ===   m >>= \x -> raise x `mplus` (return () >>= \() -> k x)
           {return is the left unit of bind}
     ===   m >>= \x -> raise x `mplus` (k x)
           {recall that m === return v}
     ===   (raise v) `mplus` m >>= k
     ===   (raise v) `mplus` e
Thus ey is a non-deterministic computation that gives us a choice of e and of the intermediate result of e, up to the point m. We have obtained that
     yield :: MonadPlus m => e -> EitherT e m ()
     yield x = raise x `mplus` return ()
The exception carries the yield's argument `out of band', on the emergency channel so to speak; the non-deterministic choice mplus effectively lets the computation continue despite raising the exception. We have unwittingly obtained the typing of generators: MonadPlus m => EitherT e m a is the type of a generator returning the result of type a while yielding intermediate results of type e.

The derived yield let us implement the examples of generators from the previous section -- this time, idiomatically.

Version
The current version is May 2011.
References
MonadPlus - HaskellWiki
< http://www.haskell.org/haskellwiki/MonadPlus >

 

Generators with yield in Haskell

We have achieved our goal of defining yield, expressing it in terms of exceptions and alternation. We can implement the examples of building generators by suspension -- this time, with satisfaction. We start with the example of the in-order tree traversal, yielding the node labels encountered along the way. The code below looks as it sounds. We have added the statement to print the trace of the encountered nodes, so we can later tell how far we have gone.
     in_order2 :: (MonadIO m, MonadPlus m) => Tree -> EitherT Label m ()
     in_order2 Leaf = return ()
     in_order2 (Node label left right) = do
        in_order2 left
        liftIO . putStrLn $ "traversing: " ++ show label
        yield label
        in_order2 right

We emphasize that in Python, Ruby, etc., yield is a keyword and generators are built-in. In Haskell, yield is a regular, small, user-defined function, and generators are programmed-in.

Recall that a yielded value is transmitted as an exception. To use the value, we have to catch the exception -- to `run' the EitherT monad:

     catchError :: Monad m => EitherT e m e -> m e
     catchError (EitherT m) = m >>= check
      where
      check (Left x)  = return x
      check (Right x) = return x
A reader familiar with delimited continuations may notice that catchError is essentially reset. Merging the `side-channel' into the `main channel' is the function of the exception handling form, or reset.

If the generator only yields, returning no useful result, the following variant of catchError is fitting. The code supports the standard, in CLU, Icon and other languages, convention that the normal return from a generator is the end of the traversal.

     catchError' :: MonadPlus m => EitherT e m () -> m e
     catchError' (EitherT m) = m >>= check
      where
      check (Left x)  = return x
      check (Right x) = mzero
We traverse our sample tree1 completely:
     in_order2_r :: IO ()
     in_order2_r = observe $ iter Nothing $ do
       i <- catchError' (in_order2 tree1)
       liftIO . putStrLn $ "Generated: " ++ show i
or partially, asking only for the first two elements:
     in_order2_r' :: IO ()
     in_order2_r' = observe $ iter (Just 2) $ do
       i <- catchError' (in_order2 tree1)
       liftIO . putStrLn $ "Generated: " ++ show i
The trace confirms the incremental, on-demand traversal. The trace of using a node label is printed right after the trace of encountering the label during the traversal. In the second example, the traversal has stopped right after the desired two elements have been consumed.

We implement the more complex post-order example, yielding at each node the sum of the labels for the subtree rooted at the node. The generator now returns a useful value (the computed sum for all labels in the tree), in addition to yielding the intermediate results. We print the intermediate results, as they are computed, and the final result.

     post_order :: MonadPlus m => Tree -> EitherT Label m Label
     post_order Leaf = return 0
     post_order (Node label left right) = do
       sum_left  <- post_order left
       sum_right <- post_order right
       let sum = sum_left + sum_right + label
       yield sum
       return sum
     
     post_order_r = observe $ iter Nothing $ 
                    catchError (post_order tree1) >>= liftIO . print

We finish with the Icon example of yielding while consuming the results of another generator.

     procedure findodd(s1, s2)
        every i := find(s1, s2) do
           if i % 2 = 1 then suspend i
     end
Our translation now preserves the structure of Icon's code. The function iterE below is our old iter `lifted' to the EitherT monad, propagating exceptions.
     findodd2 :: (Monad m, LogicT t, MonadPlus (t m)) =>
                 String -> String -> EitherT Int (t m) ()
     findodd2 s1 s2 = iterE Nothing $ do
       i <- findIM s1 s2
       if i `mod` 2 == 1 then yield i else return ()
Version
The current version is May 2011.
References
GenCode.hs [16K]
The complete code for the Haskell examples.

 

Generator from any Foldable

We have seen that yield turns any traversal to a generator. We confirm this result, literally. Any abstract data type that implements the Data.Foldable interface implements the generator interface. In fact, any (collection) type T with the operation like mapM_ :: Monad m => (Element T -> m b) -> T -> m () that lets a monadic action examine its elements (of the type Element T) implements the generator interface. The claims holds regardless of the implementation of the data type, whether it is a data structure or if the elements to traverse are computed on-the-fly.

The proof of the claim is constructive, short, and trivial:

     foldable_gen :: (MonadPlus m, F.Foldable t) => t a -> EitherT a m ()
     foldable_gen = F.mapM_ yield
The type says it all: any Foldable is a generator.

As an example, the following code uses the generator to extract the first three elements of a foldable and return them in a list. The traversal is done only to the extent needed to obtain the desired three elements.

     trav :: (Monad m, F.Foldable t) => t a -> m [a]
     trav t = observe $ bagofN (Just 3) $
              catchError' $ foldable_gen t
Version
The current version is March 2014.
References
GenCode.hs [16K]
The complete code for the Haskell examples.

Zipper from any Traversable
Zipper does need Traversable, or the operation like mapM :: Monad m => (a -> m b) -> t a -> m (t b), since the result of unzipping has to be a (possibly modified) input data structure t b. Generator merely visits the elements of the data structure; it does not have to rebuild and return it. Thanks to Roman Cheplyaka for this observation of the difference between the generator and the zipper.

Simple Generators v. Lazy Evaluation
The operation foldG is one of the basic operations of simple generators

 

Generators in Logic Programming

Logic Programming is built around non-determinism; logic programming languages also have exceptions. Logic programming has generators then? Prolog has come tantalizingly close to generators. Unfortunately, raising an exception in ISO Prolog cuts choice points. That is, in ISO Prolog, non-determinism and exception effects are composed like BactTrackT (EitherT e Identity) a, essentially in the monad Either e [a]. The computation with the raised exception becomes simply Left e; the list of choices [a] is forgotten. It does not have to be that way: we have implemented exception handling in the logic programming language Kanren that preserves choice points. In other words, Kanren's monad is essentially [Either e a]; raising an exception when evaluating a goal does not affect the other choices. It immediately follows that Kanren supports generators, which we demonstrate below. Hopefully the implementors of other logic programming systems might consider an option to preserve choice points upon exceptions.

The poster example of generators, the in-order traversal of a tree, is still implementable in Prolog. The example in_order1 described earlier on this page made clear that a stateless traversal with suspensions is implementable with non-determinism only. Here is the code in Prolog. Assume that make_full_tree(+Depth,-Tree) makes the full node-labeled binary tree with the given depth:

     ?- make_full_tree(3,T).
     % T = node(1, node(2, node(4, leaf, leaf), node(5, leaf, leaf)), 
     %             node(3, node(6, leaf, leaf), node(7, leaf, leaf))).

We define the predicate in_order1(+Tree,?Label) that holds holds if the Tree has the node with the given Label. The tree is traversed in-order.

     in_order1(node(LabN,Left,Right),Label) :-
       in_order1(Left,Label);
       LabN = Label;
       in_order1(Right,Label).
We can traverse the tree interactively
     ?- make_full_tree(3,Tree1), !, in_order11(Tree1,L).
     % L = 4 ;
     % L = 2 ;
     % L = 5 ;
     % ...
accumulate all traversal results in a list
     ?- make_full_tree(3,Tree1), !, findall(L,in_order11(Tree1,L),B).
     %   B = [4, 2, 5, 1, 6, 3, 7].
or print labels until the label 5 is encountered, which stops the traversal.
     ?- make_full_tree(3,Tree1), !, in_order11(Tree1,L), write(L), nl, L == 5, !.
     % 4
     % 2
     % 5

Alas, we cannot implement the post_order example, of an accumulating traversal yielding intermediate results. The running sum is the state of the traversal, threaded through the whole computation. Prolog's disjunctions launch `processes' that execute, at least conceptually, in parallel and so do not share state (without stooping to imperative features like record). Accumulating traversals cannot be implemented using only the `parallel composition'.

Discarding choice points upon raising an exception was Prolog's design choice rather than a necessity. The choice points can be preserved, as we demonstrated in Kanren. To implement generators in Kanren, the only thing remains is to define yield:

     (define (yield term)
       (anye (throw term) succeed))
The definition transcribes the earlier derived formula yield x = raise x `mplus` return () in the most straightforward way. All examples of generators on the present page can thus be written in Kanren. We show the challenging post_order, traversing a tree post-order, yielding the running sum of the current label and the labels in the left and the right branches (that is, the sum of the labels for the subtree rooted at the current node).
     (define (post-order* tree running-sum)
      (conde
        ((== tree #f) (== running-sum 0))
        (else
          (fresh (label left right sum-left sum-right)
            (== tree (list label left right))
            (post-order* left  sum-left)
            (post-order* right sum-right)
            (project (sum-left sum-right label)
              (let ((sum (+ sum-left sum-right label)))
                (printf "traversing: sum ~a~n" sum)
                (== running-sum sum)))
            (yield `(*post-order* ,running-sum))
            ))))
The predicate post-order +Tree ?Sum below holds if there is a branch in a Tree whose labels sum to Sum.
     (define (post-order tree sum)
      (fresh (s)
        (catch (post-order* tree sum)
          `(*post-order* ,s)
          (== s sum))))
We traverse the sample tree1 completely, obtaining the running sums of its labels:
     tree1
     ;; (1 (2 (4 #f #f) (5 #f #f)) 
     ;;    (3 (6 #f #f) (7 #f #f)))
     
     (run #f (q) (post-order tree1 q))
     ;; traversing: sum 4
     ;; traversing: sum 5
     ;; traversing: sum 11
     ;; traversing: sum 6
     ;; traversing: sum 7
     ;; traversing: sum 16
     ;; traversing: sum 28
     ;; (4 5 11 6 7 16 28 28)
The second argument of post-order does not have to be a variable. In other words, it could be the output or the input parameter. If we specify Sum as a number, the very same procedure searches the tree for the occurrence of a branch whose labels sum to the given number. We stop the search as soon as the branch is found.
     (run 1 (q) (post-order tree1 16) (== q 'yes))
     ;; traversing: sum 4
     ;; traversing: sum 5
     ;; traversing: sum 11
     ;; traversing: sum 6
     ;; traversing: sum 7
     ;; traversing: sum 16
     ;; (yes)
Version
The current version is June 2011.
References
generators-kanren.scm [4K]
The complete code for the Kanren examples.

Exceptions in logic programming

 

Generators in ML

We derive a direct-style implementation of generators in OCaml, defining yield and a variety of iterator operators. We implement simple and accumulating traversals and the challenging Icon examples.

Generators are in essence lazy streams of possibly effectful computations. OCaml has so-called streams (whose special syntax is now supported by a camlp4 syntax extension), used for writing parsers. These streams are limited: for example, there does not seem to be a robust way to join two streams, to implement Icon's alteration or Haskell's mplus. OCaml also supports lazy evaluation; however, throwing exceptions from lazy computations (let alone doing other control effects) is undefined.

The formula ``yield = exceptions + non-determinism'' gives us generators with the well-defined behavior in the presence of effects and which we can concatenate and transform. For non-determinism, the initial implementation uses Hansei, a probabilistic programming language embedded as a library in OCaml. For exceptions, we use native OCaml exceptions. The general theory applies immediately:

     let yield x = if flip 0.5 then raise x
It just works. The ordinary in_order tree traversal and the challenging, and rare, accumulating tree traversal, post_order, are straightforward. They work on the first try.

We systematically optimize the implementation, to avoid the overhead of Hansei and probabilities, not relevant for generators. The semantics of Hansei primitives leads, after a bit of equational re-writing, to the specification for the generator primitives yield and msplit (msplit reifies a generator expression into a stream):

     type 'a stream = Nil | Cons of 'a * (unit -> 'a stream)
     
     msplit (fun () -> exp : unit) === Nil
if exp is a value or evaluates without yield, and
     msplit (fun () -> ctx[yield x]) === 
      Cons (x, fun () -> msplit (fun () -> ctx[()]))
if exp has the form ctx[yield x] where ctx[] is the evaluation context, not crossing the msplit boundary. Contrasting the equations with the laws of delimited control operators shift0 and push_prompt from the delimcc library gives the implementation below. (Although Hansei is written in terms of delimcc as well, the Hansei implementation differs.)
     (* val yield : 'a stream Delimcc.prompt -> 'a -> unit *)
     let yield p x = shift0 p (fun k -> Cons (x,k))
     
     (* val msplit : 'a stream Delimcc.prompt -> (unit -> unit) -> 'a stream *)
     let msplit p thunk = push_prompt p (fun () -> thunk (); Nil)

In Python, yield is a keyword. In OCaml, it is a regular OCaml function -- a user-defined one-liner. Various loop iterators also become regular user-defined functions. For example, here is the for-loop:

     let rec stream_iter (f : 'a -> unit) = function
       | Nil        -> ()
       | Cons (x,t) -> f x; stream_iter f (t ())
     
     let for_loop p gen f = stream_iter f (msplit p gen)

We may likewise define the while loop, or the loop that takes at most n elements from the stream, etc. Generator expressions and streams may nest. A prompt, the value of the type 'a stream Delimcc.prompt, identifies the enumerator of a stream for yield to generate. The prompt plays the role of a `loop label' in Perl, for example, letting loop control primitives to reach beyond the inner-most loop.

The challenging post_order example, of traversing a tree post-order yielding the running sum of labels of tree branches takes the following form:

     let pint = new_prompt ()
     
     let rec post_order = function
       | Leaf -> 0
       | Node (label, left, right) ->
          let sum_left  = post_order left in
          let sum_right = post_order right in
          let sum = sum_left + sum_right + label in
          yield pint sum; sum
     
     for_loop pint (fun () -> Printf.printf "Final: %d\n" (post_order tree1))
       (fun x -> Printf.printf "Got %d\n" x)
Unlike Python generators, for example, post_order tree1 is a generator expression: besides yielding intermediate results, it also produces a useful value.

We finish with the Icon example of yielding while consuming the results of another generator:

     procedure findodd(s1, s2)
        every i := find(s1, s2) do
           if i % 2 = 1 then suspend i
     end
In OCaml, the example reads:
     let findodd s1 s2 =
       for_loop pfind (fun () -> find s1 s2)
       (fun i -> if i mod 2 == 1 then yield pfind i)
where find s1 s2 generates the positions at which the string s1 occurs in s2. The function find is not a built-in; rather, it is written in terms of yield. The OCaml translation preserves the structure of the Icon code. We run the findodd example as follows, showing off the nested generation:
     for_loop pfind (fun () -> findodd "a" "abracadabra")
      (fun i -> Printf.printf "found index %d\n" i)
Version
The current version is June 2011.
References
generator-hansei.ml [6K]
The straightforward implementation of generators in OCaml, relying on Hansei for non-determinism. The code contains several easy and complex examples of generators, including the post-order.

generator.mli [2K]
The interface of the generator library

generator.ml [8K]
The complete derivation of the implementation of the generator library, from the general theory of generators and the semantics of Hansei primitives.

test_gen.ml [7K]
Makefile [2K]
Many examples of using generators, borrowed from Python and Icon, and the challenging post_order example. The test code illustrates defining custom loop operators.

HANSEI: Embedded Probabilistic Programming

A direct implementation of delimited continuations for byte- and native-code OCaml



Last updated March 9, 2014

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML