Monadic Programming in Scheme

Introduction
Serially-numbering monad in Scheme
Example
Generalizations to several monads
References and pointers to further discussion

An example of a monadic programming in Scheme that juxtaposes Haskell code for a particular state monad with the corresponding Scheme code.

Introduction

This article sprang from a question posted on comp.lang.functional about building of trees whose nodes are tagged with unique integers. "Ideally every term/node has to have its id associated with it at construction time," the poster said. "An obvious way to implement it in this [referentially-transparent] way is to have a collection of state which is passed along all the time to every function. But this seems rather tedious."

An imperative solution is obvious: introduce a global variable -- current_id -- and use its value to tag every allocated tree node, incrementing the variable afterwards. A functional solution is relatively straightforward too: every function that creates or operates on trees must take an extra argument -- the current value of the tagging counter. The function should return multiple values: the result of its computation and (perhaps incremented) value of the counter. The latter solution does seem to be tedious, requiring multiple values, and ugly. Is there a way to solve the problem purely functionally and elegantly?

A proposed Haskell solution [SN-monad] relied on monads to hide the counter holding the tag for a new node. The solution did seem to reach the goal. Building a tagged tree looked just as simple as building an untagged one. The counter variable was well-hidden. And yet it was there, invisibly threading through all computations. Rendering this solution in Scheme can be fun.

Serially-numbering monad in Scheme

We will cite the Haskell code in our Scheme code, for comparison and documentation. We will identify Haskell code by ;-- comments.

We start by defining a datatype for integer-tagged values. We could use records or associative lists; However, a simple pair will suffice:

             ;-- type Numbered a = (Int,a)
     (define (make-numbered-value tag val) (cons tag val))
             ; accessors of the components of the value
     (define (nvalue-tag tv) (car tv))
     (define (nvalue-val tv) (cdr tv))

We need to define a counting monad and its two fundamental operations: bind (aka >>=) and return. In Haskell, we write

     ;-- newtype NumberedM a = NumberedM (Int -> Numbered a)
     ;-- instance Monad NumberedM where
     ;--    NumberedM m >>= f = NumberedM $ \n -> let (n1,v)       = (m n)
     ;--                                              NumberedM m' = f v
     ;--                                              (n1',v')     = (m' n1)
     ;--                                          in (n1',v')
     ;--    return x = NumberedM $ \n -> (n,x)

There are two ways of rendering that in Scheme. In this section, we will be discussing an easy way, which suffices as long we are working with a single monad, or do not mix objects of different monads. In that case, all the type tagging apparent in the Haskell solution is unnecessary. Our monad is simply a function Int -> Numbered a. A monad is a delayed computation: a lambda-expression in our case. It takes the value of the counter, performs whatever computation it is meant to do, and returns the result tagged with the (perhaps incremented) counter.

The two fundamental operations are introduced as:

             ;-- return:: a -> NumberedM a
     (define (return val)
       (lambda (curr_counter)
         (make-numbered-value curr_counter val)))

and

           ;-- (>>=):: NumberedM a -> (a -> NumberedM b) -> NumberedM b
     (define (>>= m f)
       (lambda (curr_counter)
         (let* ((m_result (m curr_counter))
                (n1 (nvalue-tag m_result))   ; result of the delayed computation
                (v  (nvalue-val m_result))   ; represented by m
     
                (m1 (f v))                   ; feed the result to f, get another m1
                )
           (m1 n1))))                        ; The result of the bigger monad

The bind operator, >>=, makes a "bigger" monad out of a monad m and a function f (which yields a monad when applied to a value). That bigger monad is a delayed computation that incorporates both computations: those represented by m and by f. The job of the bind operator is to merge these computations and to take care of passing the state -- the counter in our case -- from one computation to the other. The code of the bind operator first applies the monad m to the curr_counter. This executes the delayed computation and gives us its result and the new counter. We apply f to the result of the delayed computation m and get another monad, m1. The latter is a function, which we apply to the counter returned by m and get a tagged value -- the result of the combined monad.

It is easy to verify that monad axioms are satisfied. Indeed, return is the left and the right unit of bind, that is,

     (>>= (return v) f) ==is-the-same-as==> (f v)
     
     (>>= m (lambda (v) (return v))) ==is-the-same-as==> m

Indeed, the direct inspection shows that (>>= m (lambda (v) (return v))) takes the result of (m counter) into two parts and puts them back together again.

The bind operation is clearly associative,

     (>>= m (lambda (v) (>>= (f v) g)))
       <==is-the-same-as==>
     (>>= (>>= m f) g)  ; v is not free in g nor in f

Thus we have indeed built a monad. Besides the canonical return and bind operations, we need one more operation (so-called morphism):

     ;-- get the current id and increment it
     ;-- incr:: NumberedM Int
     ;-- incr = NumberedM $ \n -> (n+1,n)
     
     (define incr 
       (lambda (n)
         (make-numbered-value (+ 1 n) n)))

We also need an operation to run the monad, that is, to take the delayed computation represented by the monad and execute it. In our case, a delayed computation needs a value -- the initial value of the tagging counter.

     ;-- run_numberedM:: NumberedM a -> Int -> Numbered a
     ;-- run_numberedM (NumberedM m) init_count = m init_count
     
     (define (runM m init-counter)
       (m init-counter))

The result of runM is a numbered value, a pair of the final counter and the final value.

Example

Let us now see what this painstakingly defined monad can do for us. Let us consider trees

     ;-- data Tree a = Nd a (Forest a) deriving Show
     ;-- type Forest a = [Tree a]

In Scheme terms, a node is (value . forest) where forest is (node ...). A node is a list whose head is the node's value and the tail is the list of its children. A forest is a list of zero or more nodes.

Let's define a function to make a node -- a node that is tagged with a unique integer:

     (define (make-node val kids)
       (>>=
        incr
        (lambda (counter)
          (return (cons (make-numbered-value counter val) kids)))))

That's what the original posted wanted: associate with each node a unique integer at node-construction time.

The code is hardly surprising: we obtain the value of the "global" counter, increment the counter and build a node out of the tagged value and the list of kids. Only there is no global counter and no mutation.

Now let's try to build a full binary tree, where the value of each node is that node's height:

     ;-- make_btree 0 = make_node 0 []
     ;-- make_btree depth = do { 
     ;--                        left <- make_btree (depth -1);
     ;--                        right <- make_btree (depth -1);
     ;--                        make_node depth [left, right]           
     ;--                        }

This function that takes the desired depth and returns a monad, which -- when run -- produces a tagged binary tree.

The first attempt In Scheme is:

     (define (build-btree-r depth)
       (if (zero? depth) (make-node depth '())
           (>>=
            (build-btree-r (- depth 1))
            (lambda (left-branch)
              (>>=
               (build-btree-r (- depth 1))
               (lambda (right-branch)
                 (make-node depth (list left-branch right-branch))))))))

A syntactic sugar is direly needed. First, we introduce letM:

     (letM ((name initializer)) expression) ==> 
        (>>= initializer (lambda (name) expression))

Compare with a regular let:

     (let ((name initializer)) body) ==>
         (apply (lambda (name) body) (list initializer))

There are some differences, however: for one thing, letM takes only one binding, while the regular let may take several. The body of letM is a single expression that has to evaluate to a monad. The let's body can have several expressions. There is a deep reason for these differences. If a let-form has several bindings, their initializing expressions are evaluated in an undefined order. Such non-determinism does not exits in the monadic world: since the evaluation of our monad involves threading of a state through all computations, the computations must execute in the precisely defined order, one after another. Monads introduce sequencing (single-threading) into the functional world. In that sense, monadic computation emulates imperative computation. However, in the imperative world, statements execute in the order they are written because such is the semantics of an imperative language. We trust the system honoring our statement order, because the order of executing mutations is crucial to knowing the global state of the system at any point in time. In contrast, in monadic world there is no really global state. A state is explicitly passed from one computation to another. Two computations are executed in order because the second needs the result of the first.

     (define-macro letM
       (lambda (binding expr)
         (apply
          (lambda (name-val)
            (apply (lambda (name initializer)
                     `(>>= ,initializer (lambda (,name) ,expr)))
                   name-val))
          binding)))

We can also introduce an even sweeter form letM*:

     (letM* (binding binding ...) expr) ==> 
     (letM (binding) (letM* (binding ...) expr))

which relates to letM exactly as let* relates to let.

     (define-macro letM*
       (lambda (bindings expr)
         (if (and (pair? bindings) (pair? (cdr bindings)))
             `(letM ,(list (car bindings))
                    (letM* ,(cdr bindings) ,expr))
             `(letM ,bindings ,expr))))

With these sweets, we can re-write our build-btree as

     (define (build-btree depth)
       (if (zero? depth) (make-node depth '())
           (letM* ((left-branch (build-btree (- depth 1)))
                   (right-branch (build-btree (- depth 1))))
                  (make-node depth (list left-branch right-branch)))))

Note the code does not explicitly mention any counter at all! Let's see how it runs:

     > (pp (runM (build-btree 3) 100))
     (115
      (114 . 3)
      ((106 . 2)
       ((102 . 1) ((100 . 0)) ((101 . 0)))
       ((105 . 1) ((103 . 0)) ((104 . 0))))
      ((113 . 2)
       ((109 . 1) ((107 . 0)) ((108 . 0)))
       ((112 . 1) ((110 . 0)) ((111 . 0)))))

Each node of the tree is uniquely tagged indeed. The counter starts at 100 and counts forward. The value of the node (the second component of a pair) is that node's height. The node tagged with 114 is the root node; its value is 3. Number 115 is the final value of the counter.

It is interesting to compare the build-btree code with a code that constructs a regular, non-tagged full binary tree:

     (define (build-btree-c depth)
       (if (zero? depth) (cons depth '())
           (let ((left-branch (build-btree-c (- depth 1)))
                 (right-branch (build-btree-c (- depth 1))))
                  (cons depth (list left-branch right-branch)))))
     
     > (pp (build-btree-c 3))
     (3 (2 (1 (0) (0)) (1 (0) (0))) (2 (1 (0) (0)) (1 (0) (0))))

The similarity is staggering. It seems we have achieved our goal: building of a tagged tree looks almost identical to that of an un-tagged tree. The tagging counter is well-hidden from view. The counter does not get in the way and does not clutter the look and feel of the computation. The letM* form reminds us however that the counter does exist. The form emphasizes the important difference between the two functions. The function build-btree-c constructs left and right branches in an indeterminate order. The branches could even be built in parallel, hardware permitting. The order does not matter -- the result is the same regardless of the sequence. The form letM* however makes the computation in the function build-btree strictly sequential: there, the right branch is constructed strictly after the left branch, with their parent node following.

Generalizations to several monads

Thanks to type classes, Haskell monads are more general than the previous sections showed. We can mix several monads in the same expression:

     f = do {
             putStrLn "Enter a number: ";
             n <- readLn;
             all_n <- return [1..n];
             evens <- return $
                             all_n >>= (\i -> if (i `rem` 2) == 0 then return i
                                                                  else fail "odd");
             return evens
             }
     main = f

This silly code "prints" all even numbers up to the one entered by the user. There are two monads in this fragment: IO a monad and [a] monad: a list is also a monad. Expression return [1..n] returns an object of a type IO [Int] whereas return i yields a monad [Int]. The outer return in the same expression returns IO [Int]. It is remarkable that a Haskell compiler is able to figure out which monad each return function yields. The compiler bases its decision on the type of the result expected in each case. To render the above code in Scheme, we have to explicitly identify monad operations of different monads:

     (define f
       (IO::>> (put-line "Enter a number: ")
        (IO::>>= (read-int)
          (lambda (n)
            (IO::>>= (IO::return (iota 1 n))
              (lambda (all-n)
                (IO::>>=
                  (IO::return
                     (List:>>= all-n
                       (lambda (i)
                         (if (even? i) (List::return i) (List::fail "odd")))))
                  (lambda (evens) (IO::return evens)))))))))

However, even in this pathologically mixed case we note that some sub-expressions use only the IO monad, and one subexpression uses only the List monad. Therefore, we can write:

     (define f
      (let ((>>= IO::>>=) (return IO::return)) ; define the "current" monad
       (beginM
          (put-line "Enter a number: ")
          (letM*
             ((n (read-int))
              (all-n (return (iota 1 n)))
              (evens (return
                     ; re-define the current monad
                     (let ((>>= List::>>=) (return List::return)
                           (fail List::fail))
                       (letM ((i all-n))
                         (if (even? i) (return i) (fail "odd"))))))
             )
             (return evens)))))

This is not as elegant as in Haskell, yet is readable. With a bit of syntactic sugar and a module system similar to that of DrScheme, we could even replace

     (let ((>>= IO::>>=) (return IO::return))

with (using IO).

References and pointers to further discussion

[SN-monad] A serially-numbering monad.
<../Computation/monads.html#serially-numbering-monad>

[monads] Monads in Haskell and other languages
<../Computation/monads.html>

[MShell] UNIX pipes as IO monads. Monadic i/o and UNIX shell programming.
<../Computation/monadic-shell.html>

[MScheme-IO] Monadic scheme of I/O
<misc.html#monadic-io>

[MP-Scheme] Monadic programming in Scheme: an example.
An article posted on a newsgroup comp.lang.scheme on Wed, 10 Oct 2001 20:27:21 -0700.

[MV-Reif] Multiple values as an effect. Monadic reflection and reification.
<misc.html#multiple-value-effect>

Last updated July 4, 2005

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML