Haskell Programming


 

Provably perfect random shuffling and its pure functional implementations

We show two purely functional programs that perfectly, randomly and uniformly shuffle a sequence of arbitrary elements. We prove that the algorithms are correct. We also show why a commonly used sort-based shuffling algorithm is not perfect.
The article first formally defines the perfect random shuffle, outlines the algorithm, and discusses the well-known imperative shuffler that swaps elements in the array. We present and analyze two purely functional implementations. The naive one runs in O(n^2) time. A more sophisticated code has O(n*logn) time complexity. The article contains the full code to arrange a sequence in a complete binary tree and to extract an arbitrary element from the tree maintaining the completeness of the remaining tree.
Version
The current version is 1.2, Jul 14, 2002.
References

perfect-shuffle.txt [plain text file]
The article with the full source code for the two implementations, including the code for building and deconstructing a complete binary tree. It was originally posted as Provably perfect shuffle algorithms on a newsgroup comp.lang.functional on Mon, 3 Sep 2001 13:59:46 -0700

Dave Bayer, Persi Diaconis: Trailing the dovetail shuffle to its lair
Ann. Appl. Probab. 2 (1992), no. 2, pp. 294-313.
In the context of card shuffling, `perfect shuffle' means a deterministic interlacing of the two halves of the deck. The paper studies the perfect and other card shuffles in detail.

 

Pure-functional transformations of cyclic graphs and the Credit Card Transform

Cycles certainly make it difficult to transform graphs in a pure non-strict language. Cycles in a source graph require us to devise a way to mark traversed nodes -- however we cannot mutate nodes and cannot even compare nodes with a generic ('derived') equality operator. Cycles in a destination graph call for keeping track of already constructed nodes so we can make back-references. An obvious solution is to resort to a state monad or IORefs. There is also a monad-less solution, which involves maintaining a dictionary of already constructed nodes. This solution is less obvious: seemingly we cannot add a node to the dictionary until we have built the node. This fact means that we cannot use the updated dictionary when building the descendants of the node -- which need the updated dictionary to link back. The problem can be overcome however with a credit card transform (a.k.a. "buy now, pay later" transform). To avoid hitting the bottom, we just have to "pay" by the "due date".

For illustration, the article below considers the problem of printing out a non-deterministic finite automaton (NFA) and transforming it into a deterministic finite automaton (DFA). Both NFA and DFA are represented as cyclic graphs.

Version
The current version is 1.3, Aug 18, 2002.
References

CCard-transform-DFA.lhs [10K]
A literate Haskell code expounding the technique. The code was first posted in a message Transformations of cyclic graphs [Was: Efficiency of list indices in definitions] on Thu, 08 Aug 2002 16:11:30 -0700 on the Haskell-Cafe mailing list.

Tying the knot: How to build a cyclic data structure
<http://haskell.org/wiki/wiki?TyingTheKnot>
A CommonHaskellIdioms Haskell Wiki page.

Trying to tie the knot
<http://www.mail-archive.com/haskell@haskell.org/msg10687.html>
A message on the Haskell mailing list about a fixed-point combinator helping to tie a knot with forward links. Posted on Fri, 12 Apr 2002 18:44:07 -0700.

 

From enumerators to cursors: turning the left fold inside out

The fold-stream article below is about traversing collections, including files, databases and generating functions. The article introduces a non-recursive left-fold and argues that it is an optimal traversal interface in a language without first-class continuations. Indeed, given the non-recursive left-fold we can:

Both instantiation procedures are generic, as evidenced by their polymorphic types.

The article arose during the discussion of a good database interface for Oracle. For portability, the article illustrates the enumerator inversion technique on an example of a file considered a collection of characters. Haskell provides a stream interface to that collection: hGetChar. We implement a left fold enumerator. We then turn that enumerator back into a stream: we implement a function myhgetchar only in terms of the left fold enumerator.

All the code in the article is in Haskell98. No unsafe operations are used.

Version
The current version is 1.5, Dec 31, 2003.
References

fold-stream.lhs [14K]
The article as a well-commented literate Haskell code. An earlier version was posted on the Haskell mailing list on Tue, 23 Sep 2003 23:59:45 -0700 (PDT).

Towards the best collection traversal interface
The extended abstract of a LL3 presentation. It lists advantages of enumerators for traversing collections and discusses the enumerator inversion procedure in a broad context.

 

De-biforestation

Deforestation is usually defined as the elimination of an intermediate data structure, often a collection of items, sent from a single producer to a single consumer. Jorge Adriano showed an interesting example of one producer feeding two independent consumers. The two streams of data are mutually dependent. Furthermore, the rate of production is non-uniform: Generating an element in one stream may require generating trillion items in the other stream. If the first consumer is being evaluated, that consumer causes the evaluation of the producer until the latter yields an item. The trillion of items incidentally produced in the other stream have to be stored somewhere. In more OS-centric terms, the evaluator can't generally reduce two expressions in parallel. If one stream consumer is running, the other consumer sits in the ready queue. Therefore, data sent to the second consumer must be buffered, sometimes at great expense.

We will consider the problem of one producer and two consumers in the general setting. We derive a deforested version, which no longer needs to buffer any produced items. When we run that version, the memory retaining profile shows no space leaks. We also discuss a ``parallel'' writing of several streams into two distinct files. Our solution is safe and yet the i/o is effectively interleaved. The deforested solution is derived, via a sequence of equivalent transformations. In fact, the derived code worked on the first try.

Version
The current version is 1.2, Dec 5, 2002.
References

de-biforestation.lhs [12K]
The article as a well-commented literate Haskell code. An earlier version was posted on the Haskell-Cafe mailing list on Tue, 3 Dec 2002 22:05:05 -0800 (PST).

Jorge Adriano: producing and consuming lists
A message posted on the Haskell-Cafe mailing list on Tue, 5 Nov 2002 16:04:58 +0000. The message describes the problem of consuming two inter-dependent streams while avoiding space leaks.

 

Similarity between instruction scheduling and imperative functional programming

The article makes the case that a transformation of a typical imperative code into functional code is similar to the data and control dependency analyses in compilers for modern CPUs. By typical imperative code we mean a "typical numerical" code that involves nested loops and updateable arrays.

We consider three examples of such transformations, including two pure functional implementations of a bubble-sort. The imperative code is actually quite complex: nested loops, data dependencies across iterations, reading from and writing into the array, conditional branches that depend on data and the iteration history, control dependencies across nested loops. We show two solutions in Haskell. Both use fast, imperative STArrays; both try to separate the imperative part from a functional part. The imperative part is performed "in a monad" while the functional part is written in ordinary Haskell. The two solutions are closely related, and written to emphasize their similarity. One of the solution is an interpreter for a trivial "virtual machine".

The analysis of data and control dependencies across loop iterations, scheduling of instructions and interlocking of their results, pre-fetching of data -- is the job of an advanced compiler or the CPU. On the other hand, we have to do the same kind of analyses to separate the functional part of a computation from its imperative part. It appears functional programming is closer to "the bare metal" as one would have thought.

The assertion that a dependency analysis of imperative array code is related to extracting a functional program from an imperative one has been stated by Andrew W. Appel, in his paper "SSA is Functional Programming". The present article has shown several illustrations of the thesis. We have done manual dependency analyses of several code snippets that update arrays in place. The analyses let us extract pure-functional parts of imperative programs. The rest requires serialization and can be handled by an ST monad.

Version
The current version is 1.2, Feb 10, 2002.
References

The article with the full source code [plain text file]
The article itself. It was originally posted as FP vs assembly programming for super-scalar CPUs [Was: Nested Loops in Clean] on a newsgroup comp.lang.functionalon Thu, 16 Aug 2001 18:24:24 -0700

Andrew W. Appel SSA is Functional Programming
ACM SIGPLAN Notices v. 33, no. 4, pp. 17-20, April 1998.

 

Takusen: a DBMS access library built around a left-fold enumerator

Takusen is a joint project with Alistair Bayley. Takusen is a DBMS access library. It has a backend for Oracle on Unix, Linux or Windows via OCI, and a stub to test the library without any database. The infrastructure and the stub let one interface natively with other databases.

The distinguished feature of Takusen is processing query results using a left-fold enumerator. The user supplies an iteratee function, which receives rows one-at-a-time from the result-set. The number of the arguments to the iteratee is the number of the columns in the result-set, plus the seed. Each column in the result-set has its own Haskell type. The latter could be a Maybe type if the particular iteratee wishes to process NULLs.

The benefits are: more convenient and intuitive enumeration, iteration, and accumulation (see tests for examples); the retrieved data are not merely strings but have Haskell types: Int, Float, Date, etc.; buffer preallocation; pre-fetching; support for both enumerators and cursors, proper handling of all errors including various IO errors. No unsafe operations are used.

Takusen is a part of the Haskell-libs project at SourceForge.

Version
The current version is 1.5, May 10, 2004.
References
From enumerators to cursors: turning the left fold inside out
The invertible non-recursive enumerator designed in that article is the foundation of the Takusen interface.
<http://cvs.sourceforge.net/viewcvs.py/haskell-libs/libs/takusen/>
 

Controlled memoization

The problem is adding memoization to a recursive function Int -> Int. That seems simple, until we realize that finding the value of that particular function at the argument x requires evaluating the function for smaller arguments and may require evaluating the function for larger arguments. Thus straight-forward memoization leads to a large and sparsely populated memoization table. As the original poster observed, the memoized function begins thrashing sooner than the original code.

The solution is to balance the cost of recomputing the value versus the cost of maintaining the memoization table and doing the lookup. It is worth memoize the values of the function for some small x -- which are being used frequently. For larger x, re-computation is better. The parameter sc_upb in the code determines the balance, the largest argument value for which we still memoize. The simple benchmark shows that this controlled memoization does improve both the space and time performances.

A notable feature of the code is the transparency of adding the controlled memoization. The body of the algorithm remains essentially the same.

Version
The current version is 1.1, Mar 16, 2005.
References
memo-array.lhs [2K]
The literate Haskell code with explanations and the benchmark
The code was originally posted as Re: (Newbie) Dynamic Programming, Memoizing Etc. on Haskell-Cafe mailing list on Wed, 16 Mar 2005 18:57:48 -0800 (PST)
 

How to make a function strict without changing its body

This is a simple trick of making a function strict in some or all of its arguments, without affecting the body of the function.

     iterate2 f x n | seq f $ seq x $ seq n $ False = undefined
     iterate2 f x n = --- as before
That is, we add one simple line to the definition of the function, without changing the proper body of the function at all. We prepend an extra clause with the guard that always yields False. Before failing, however, the guard forces the evaluation of the selected arguments of the function. If needed, seq can be replaced with deepSeq.

Making (some of the) arguments of a function strict can make the function run in constant space, and so an iteration (Runge-Kutta iteration in the above example) can proceed without running out of (stack) space. The function iterate2 (and a similar function rk4Next) were applied to arguments like (n-1) and y + a1/6 -- which is a red flag. These are exactly the kinds of expressions that lead to memory exhaustion. Perhaps it is because the size of an unevaluated thunk for (n-1) is so much bigger than the size of the evaluated thunk. It seems that arithmetic expressions are the best candidates for some opportunistic evaluation...

Version
The current version is 1.1, Oct 2004.
References

Re: How do I get a long iteration to run in constant space
<http://www.haskell.org/pipermail/haskell-cafe/2004-October/006983.html>
Discussion thread on the Haskell-Cafe mailing list, where the message was originally posted on Tue, 5 Oct 2004 00:44:40 -0700 (PDT)

Partial signatures
Another application of the trick of adding a clause with an always failing guard

 

Logic programming in Haskell optimized for reasoning

We demonstrate an executable model of the evaluation of definite logic programs, i.e., of resolving Horn clauses presented in the form of definitional trees. Our implementation, DefinitionTree, is yet another embedding of Prolog in Haskell. It is distinguished not by speed or convenience. Rather, it is explicitly designed to formalize evaluation strategies such as SLD and SLD-interleaving, to be easier to reason about and so help prove termination and other properties of the evaluation strategies. The main difference of DefinitionTree from other embeddings of Prolog in Haskell is the absence of name-generation effects. We need neither gensym nor the state monad to ensure the unique naming of logic variables. Since naming and evaluation are fully separated, the evaluation strategy is no longer concerned with fresh name generation and so is easier to reason about equationally.

Version
The current version is January 2008.
References

DefinitionTree.hs [14K]
The commented source code

Pure, declarative, and constructive arithmetic relations
DefinitionTree has been used to describe SLD and SLD-interleave evaluation strategies and to prove the basic properties of solution sets.

 

Newer FastCGI and the memory-efficient IO interface

We present an efficient library for writing CGI and FastCGI programs. Our goal is the least latency and the least memory consumption. The library underlies a production web application server Metcast. The server has been running since February 2007, sending large amounts of text and binary data in response to a continuous stream of requests. The server processes a request in constant and small memory imposing little latency, encoding and sending to the client chunks of data as soon as the database delivers them. The server handles hundreds of such requests without allocating memory; in fact, it uses only one 16KB buffer for all of its I/O. With the exception of an occasional rank-2 type and extended pattern guards, all of the code is in Haskell98. Not a single unsafeHaskell function occurs in the code.

The library is originally based on the code written by Jesse Tov, who in turn used NewCGI by Bjorn Bringert et al. Most of the code has been re-written. The biggest change is minimizing the amount of state. The library is centered around generalized input and output ports: a simple IO interface to read, write and copy via a single, large, once allocated buffer. The buffer and its dimensions are hidden, to discourage aliasing. We use this interface for the IO to and from the client, to and from files or the PostgreSQL database, and for reading from external servers.

A generalized input port is a procedure to read at most the specified number of bytes into a given buffer. The procedure should return the number of bytes actually read, or 0 on EOF. It may throw various exceptions if reading didn't go well. The procedure is like hGetBuf partially applied to a handle.

     newtype Input = Input (forall m. EMonadIO m => Ptr Word8 -> Int -> m Int)
     inputFd       :: Fd -> Input
     inputStr      :: EMonadIO m => String -> m Input
     inputCombined :: EMonadIO m => Input -> Input -> m Input
One can construct an Input from a Posix file descriptor or a String. One can combine two Inputs into one, which reads from the first generalized port until EOF and then reads from the second. The frequently mentioned EMonadIO is a class of monads permitting IO along with throwing and catching of arbitrary errors. The IO monad and most of its transformations are in that class. EMonadIO lets us write gthrow, gcatch, gbracket, etc. without even thinking of the current monad.

Dually, a generalized output port is a procedure to write the specified number of bytes from the given buffer. It may throw various exceptions if writing didn't go well.

     newtype Output = Output (forall m. EMonadIO m => Ptr Word8 -> Int -> m ())
     outputFd :: Fd -> Output
     newtype BCopy = BCopy (forall m. EMonadIO m => 
                             Input -> Output -> Maybe Int -> m (Maybe Int))
The CGI monad exports (Fast)CGI output and error streams as Output, and lets us access request content as Input. We can use the generalized ports for reading and writing strings. The ports are intended though to be connected via BCopy, which copies from Input to Output the desired number of bytes or till EOF.
Version
The current version is February 2007.
References

NewerCGI.hs [29K]
The commented source code of the library for writing CGI and FastCGI programs
The code was first mentioned in a message Takusen and large PostgreSQL blobs [was: Handling custom types in Takusen] on the Haskell-Cafe mailing list on Fri, 27 Jul 2007 20:34:29 -0700 (PDT)

FastCGI.hsc [3K]
Bindings to the FastCGI C client library

 

Catching exceptions in a general MonadIO

Handling exceptions via catch, bracket, catchDyn, etc. in a MonadIO other than the IO itself has been a fairly frequently requested feature. The requests tend to recur, probably because these functions cannot be defined for a general MonadIO. However, we can define generic exception handling for a large and interesting subset of MonadIO, which includes various (repeated) transformations of IO by ReaderT, WriterT, ErrorT, StateT, and newtype wrapping.

The generic catch has been successfully used since 2006 in a database library Takusen, where many operations work in a monad ReaderT Session IO; the database session data are always available as the environment. Many other foreign libraries are structured around sessions, connections, library environments, which can be encapsulated in a monad. We should nevertheless be able to handle IO errors and user exceptions that arise in such computations.

Back in 2006 Jesse Tov has done an admirably thorough job of implementing exception handling in general monads. His code defines two classes: EMonad and EMonadIO -- which contain most of the interesting monads. The latter is the subclass of the former, permitting arbitrary IO via liftIO. In either case, we use gthrow, gbracket, gcatch, ghandle, gfinally, etc. -- without even thinking which Monad we are in and how error handling is actually implemented, via ErrorT or via IO exceptions. It works universally for most of monads of interest. The experience of using EMonadIO since 2006 in large production code has been most positive.

Version
The current version is 1.1, February 2006.
References

CaughtMonadIO.lhs [6K]
The complete literate Haskell code
The code includes two tests, illustrating throwing and catching of (dynamic) exceptions in monads obtained from IO by several applications of ReaderT, WriterT and ErrorT.
The code was originally posted as generic catch in a MonadIO on the Haskell mailing list on Tue Feb 7 22:48:24 EST 2006

Takusen's MonadIO, with gtry, gtryJust, gbracket, gfinally.The code supports both the new- and the old-style exceptions. Joint work with Alistair Bayley.
<http://code.haskell.org/takusen/Control/Exception/MonadIO.hs>

Discussion threads:

Haskell' ticket, introduced as the result of the above Haskell-Cafe discussions:
<http://hackage.haskell.org/trac/haskell-prime/ticket/110>

 

Simple and reliable uni- and bi-directional pipes

MySysOpen module offers a reliable, proven way of interacting with another local or remote process via a unidirectional or bidirectional channel. It supports pipes and Unix and TCP sockets. MySysOpen is a simple and explicit alternative to the multi-threaded IO processing of the GHC run-time system. The module is the Haskell binding to sys_open -- the extended, user-level file opening interface.

The second half of MySysOpen.hs contains several bi-directional channel interaction tests. One checks repeated sending and receiving of data; the amount of received data is intentionally large, about 510K. Two other tests interact with programs that are not specifically written for interactive use, such as sort. The latter cannot produce any output before it has read all of the input, accepting no input terminator other than the EOF condition. One test uses shutdown to set the EOF condition. The other test programs the handler for a custom EOF indicator, literally in the file name of the communication pipe.

Version
The current version is December 2008.
References

MySysOpen.hs [7K]
The code for the sys_open interface and tests

sys_open.c [20K]
The complete, well-commented code for sys_open

Applications of sys_open

 

A character-efficient Mastermind game code

The following code is a space-efficient implementation of the game Mastermind. It is efficient not in terms of the heap or stack space but in the source text space. Jan Rochel wrote the original code and put it in his e-mail signature. He asked for a shorter version. The present code relies on a different algorithm to compute exact and inexact matches in two strings. The new algorithm made the code a bit faster, but more importantly, a bit (7%) shorter. The new code is also more obscure and exhibits some interesting patterns, in particular f x=(x\\).(x\\).

A version to put in a .signature (285 chars):

     import Random;import List;import Char;p=putStrLn;u=uncurry;f x=(x\\).(x\\)
     main=mapM(\x->randomRIO(49,54))[1..4]>>=n 0.map chr>>=p.("Tries: "++).show
     e=((partition$u(==)).).zip;h(p,q)=['*'|x<-p]++['+'|x<-(u f)$unzip q]
     n a s=getLine>>=m where{m i|i==s=return a;m i=p(h$e i s)>>n(a+1)s}

An expanded version that is easier to read:

     import Random;import List;import Char
     
     main = mapM(\x->randomRIO(49,54))[1..4]
            >>= n 0 . map chr >>= p . ("Tries: "++) . show
     p=putStrLn
     u=uncurry
     e=((partition$u(==)).) . zip
     f x=(x\\).(x\\)
     h(p,q)=['*'|x<-p] ++ ['+'|x<-(u f) $ unzip q]
     n a s=getLine>>=m a s
     m a s i | i==s = return a
     m a s i = p(h$e i s) >> n(a+1)s
The code runs on GHC(i).
Version
The current version is 1.1, May 24, 2003.
References
The code was originally posted as an article Haskell: Mastermind code for a .sig on a newsgroup comp.lang.functional on Sat, 24 May 2003 00:04:40 -0700
 

How to implement AND without pattern-matching

How to implement a function AND, i.e., (&&), without explicit or even implicit pattern-matching? Note that the functions not and (||) are defined in the Prelude with the use of pattern-matching. The if function may rely on pattern-matching as well. The article below gives several answers.
One of the suggested solutions relies on pointer arithmetics, in Haskell.
Version
The current version is 1.1, Jul 13, 2001.
References
The article with the source code for three solutions. [plain text file]
The article itself. It was originally posted as Re: bothTrue on a newsgroup comp.lang.functional on Fri, 13 Jul 2001 17:45:22 -0700


Last updated July 4, 2010

This site's top page is http://okmij.org/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML