Monads -- the IO monad in particular -- encode IO and other effectful
computations in lambda-calculi and pure functional programs. Monads
cleanly separate out pure, order-insensitive simplifications from
imperative executions and help us formally reason about effectful
Once a computation is written monadically, to add a new effect
(exceptions or non-determinism) one merely needs to introduce another
constant such as
fail rather than to change the overall
The unabated debate about exactly how much category theory one needs to know to understand that strange beast of IO prompts a thought if monads, like related continuations [Reynolds-1993], are the things that are destined to be rediscovered time and time again.
An old (1994) paper on category theory monads and functional
programming included an interesting historical side-note. It turns out
that the essence of monads has been fully grasped back in 1965,
by at least one person. That person has also discovered that
imperative, control-flow-dependent computations can be embedded into
calculus by turning control flow into data flow. That person
is Peter Landin. His 1965 paper [Landin-1965]
anticipated not only IO but also
State and Writer monads,
call/cc, streams and delayed evaluations,
the relation of streams with co-routines, and even stream fusion.
Here is the historical aside, cited from [Hill-Clarke-1994].
3 An historical aside
Monads are typically equated with single-threadedness, and are therefore used as a technique for incorporating imperative features into a purely functional language. Category theory monads have little to do with single-threadedness; it is the sequencing imposed by composition that ensures single-threadedness. In a Wadler-ised monad this is a consequence of bundling the Kleisli star and flipped compose into the bind operator. There is nothing new in this connection. Peter Landin in his Algol 60 used functional composition to model semi-colon. Semi-colon can be thought of as a state transforming operator that threads the state of the machine throughout a program. The work of Peyton-Jones and Wadler has turned full circle back to Landin's earlier work as their use of Moggi's sequencing monad enables real side-effects to be incorporated into monad operations such as print. This is similar to Landin's implementation of his sharing machine where the
assignandholdfunction can side-effect the store of the sharing machine because of the sequencing imposed by functional composition. Landin defined that `Imperatives are treated as null-list producing functions' [In Landin's paper,
()is the syntactic representation of the empty list and not the unit.]. The
assignandholdimperative is subtly different in that it enables Algol's compound statements to be handled. The function takes a store location and a value as its argument, and performs the assignment to the store of the sharing machine, returning the value assigned as a result of the function. Because Landin assumed applicative order reduction, the
Kcombinator was used to return
(), and the imperative was evaluated as a side effect by the unused argument of the
K-combinator. Statements are formed by wrapping such an imperative in a lambda expression that takes
()as an argument. Two consecutive Algol-60 assignments would be encoded in the lambda calculus as:Algol 60 | Lambda Calculus x:= 2; | ( (\() -> K () (assignandhold x 2)) . x:= -3; | (\() -> K () (assignandhold x (-3))) ) ()
By using a lambda with
()as its parameter,
()can be thought of as the `state of the world' that is threaded throughout a program by functional composition.
()-passing trick is described in full on p100 of Landin's paper,
with remarkable clarity:
Statements. Each statement is rendered as a 0-list- transformer, i.e. a none-adic function producing the nullist for its result. It achieves by side-effects a transformation of the current state of evaluation. ... Compound statements are considered as functional products (which we indicate informally by infixed dots).
We should contrast Landin's phrase with a more modern explanation [Peyton-Jones-2000]:
A value of type IO a is an "action" that, when performed, may do some input/output, before delivering a value of type a. This is an admirably abstract statement, and I would not be surprised if it means almost nothing to you at the moment. So here is another, more concrete way of looking at these "actions":
type IO a = World -> (a, World)This type definition says that a value of type IO a is a function that, when applied to an argument of type
World, delivers a new
Worldtogether with a result of type
a. The idea is rather program-centric: the program takes the state of the entire world as its input, and delivers a modified world as a result, modified by the effects of running the program. [p5]
Simon Peyton-Jones then goes on to explain [p14] that the functional
world-passing realization of the IO monad is indeed the implementation
of the monad in GHC. The IO monad operation
(>>) is a
a functional composition then -- just like it is in Landin's representation.
One may object that
World is abstract in GHC but is represented as
the empty list
() in Landin's paper. Whereas GHC threads the
through, Landin's encoding discards and re-generates the
token. Furthermore, GHC uses call-by-need evaluation. Landin relied on
call-by-value: although the
K combinator discards the value of its
second argument, the side-effects of its evaluation are performed
The differences are not that great. First of all, the
trick appears only in the encoding of Algol primitives such as the
assignment. One may as well introduce
assignandhold' x v = \() -> K () (assignandhold x v)
and take it as a new primitive. Regardless, the sequencing of compound
statements is ensured by functional composition, and hence does not
depend on the choice of call-by-name or call-by-value strategy.
Strictly speaking, we should treat
World as an abstract data type,
to prevent the compiler from optimizing the composition of two
World -> World functions to a single function. We also need an
additional constraint that the
World is to be used linearly, so
the compiler will not be tempted to replicate
Landin's calculus, induced by his sharing machine, does not allow such
optimizations. The implementation of call-by-need in GHC does have the
additional constraint of not replicating redices.
Incidentally, call-by-name (CBN) and call-by-value (CBV) were introduced ten years later, in [Plotkin-1975], in the context of Landin's ISWIM.
Both evaluation strategies have the notion of value and both take
lambda-abstraction to be a value. The body of a function is evaluated
only when the function is applied to an argument. Landin wrote
``the use of lambda, and in particular (to avoid an
irrelevant bound variable) of
lambda (), to delay and possibly avoid
evaluation is exploited repeatedly in our model of Algol 60. A
function that requires an argument-list of length zero is called a
non-adic function'' [p90]. We now call that a thunk.
Peter Landin's 1965 paper had many other discoveries.
First the reader will notice the
where notation. Peter Landin even
anticipated the debate on
``The only consideration in choosing between
will be the relative convenience of writing an auxiliary
definition before or after the expression it qualifies.''
A remark ``However, input/output devices can be modeled as named lists, with special, rather restricted functions associated. ... Writing is modeled by a procedure that, operates on a list, and appends a new final segment derived from other variables. (Alternatively, a purely functional approach can be contrived by including the transformed list among the results.)'' anticipated the State and Writer monads as well.
Another remark, in the section on Streams, ``This correspondence [laws of head/tail/cons] serves two related purposes. It enables us to perform operations on lists (such as generating them, mapping them, concatenating them) without using an `extensive,' item-by-item representation of the intermediately resulting lists; and it enables us to postpone the evaluation of the expressions specifying the items of a list until they are actually needed. The second of these is what interests us here.'' and footnote 6, ``It appears that in stream-transformers we have a functional analogue of what Conway calls `co-routines.' '' show that Peter Landin understood streams, on-demand evaluation and even stream fusion.
P. J. Landin: A correspondence between ALGOL 60 and Church's lambda notation.
Communications of the ACM, 8(2):89-101, February 1965.
(Part 2 in CACM Vol 8(2) 1965, pages 158-165.)
< http://portal.acm.org/citation.cfm?id=363749 >
Eugenio Moggi and Sonia Fagorzi: A Monadic Multi-stage Metalanguage.
Proc. FoSSaCS 2003: Foundations of Software Science and Computational Structures.
LNCS 2620, pp. 358--374
< http://www.disi.unige.it/person/MoggiE/ftp/fossacs03.pdf >
Jonathan M. D. Hill and Keith Clarke: An Introduction to Category Theory, Category Theory Monads, and Their Relationship to Functional Programming.
< http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.6497 >
John C. Reynolds: The Discoveries of Continuations.
Lisp and Symbolic Computation, 1993, v6, N3-4, pp. 233--247.
< ftp://ftp.cs.cmu.edu/user/jcr/histcont.ps.gz >
Simon L. Peyton Jones: Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell.
Tutorial given at the Marktoberdorf Summer School 2000.
< http://research.microsoft.com/en-us/um/people/simonpj/papers/marktoberdorf/mark.pdf >
Gordon D. Plotkin: Call-by-Name, Call-by-Value and the lambda-Calculus.
Theoretical Computer Science, 1975, v1, N3, pp. 125--159.
< http://homepages.inf.ed.ac.uk/gdp/publications/cbn_cbv_lambda.pdf >
An early version of this essay -- ``The IO monad is 45 years old'' -- has was posted on the Haskell-Cafe mailing list on Wed, 29 Dec 2010 01:13:13 -0800 (PST).
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML