General-purpose algorithms and data structures, illustrated in Haskell
The article first formally defines the perfect random shuffle,
outlines the algorithm, and discusses the well-known imperative
shuffler that swaps elements in the array. We present and analyze two
purely functional implementations. The naive one runs in O(n^2) time. A more sophisticated code has the O(n*logn)
time complexity. The article contains the full code to
arrange a sequence in a complete binary tree and to extract an
arbitrary element from the tree maintaining the completeness of the
remaining tree.
Dave Bayer, Persi Diaconis: Trailing the dovetail shuffle to its lair
Ann. Appl. Probab. 2 (1992), no. 2, pp. 294-313.
In the context of card shuffling, `perfect shuffle' means a deterministic interlacing of the two halves of the deck. The paper studies the perfect and other card shuffles in detail.
int. A type system is sound if it
correctly approximates the dynamic behavior and predicts its outcome:
if the static semantics predicts that a term has the
type int, the dynamic evaluation of the term, if it
terminates, will yield an integer.As object language, we use simply-typed and let-polymorphic lambda calculi with integers and integer operations as constants. We use Haskell as a metalanguage in which to write evaluators, type checkers, type reconstructors and inferencers for the object language.
We explore the deep relation between parametric polymorphism and `inlining'. Polymorphic type checking then is an optimization allowing us to type check a polymorphic term at the place of its definition rather than at the places of its use.
Joint work with Chung-chieh Shan.
EvalN.hs [4K]
Evaluator for the untyped (nominal) lambda-calculus with integers and zero-comparison
TEvalNC.hs [4K]
Type reconstruction for the simply-typed Church-style lambda-calculus with integers and zero-comparison
Source terms have type annotations on bound variables. Type reconstruction is implemented as an evaluator with a non-standard semantics.
TEvalNR.hs [4K]
A version of TEvalNC.hs that reports not only the inferred type of the whole term but also the inferred types for each subterm
TInfT.hs [7K]
Type reconstruction for the simply-typed Curry-style lambda-calculus with integers and zero-comparison
Source terms have no type annotations whatsoever.
TInfTM.hs [10K]
A version of TInfT.hs hiding the single-threaded state through simple algebraic transformations (beta-expansions)
TInfLetI.hs [10K]
The Curry-style calculus with Let : polymorphic type checking via `inlining'
TInfLetP.hs [14K]
Let-polymorphism via type schemes
This is a non-canonical presentation of Hindley-Milner type inference, treated as an optimization of polymorphic type checking by inlining.
TInfTEnv.hs [10K]
Simply-typed Curry-style lambda-calculus with integers and zero-comparison: reconstructing not only types but also the type environment
This approach lets us type check open terms. A type checking failure means the term cannot be typed in any type environment.
One often hears a complaint against typing: we can evaluate terms we can't type check. This code shows that we can type check terms we can't even evaluate. We cannot evaluate an open term at all, but we can type check it, inferring both its type as well as the requirement on the environments in which the term must be used.
EvalTaglessI.hs [9K]
Initial Tagless Typed lambda-calculus with integers and the conditional in the higher-order abstract syntax
Haskell itself ensures the object terms are well-typed. This code relies on GADT.
EvalTaglessF.hs [8K]
Final Tagless Typed lambda-calculus with integers and the conditional in the higher-order abstract syntax
Haskell itself ensures the object terms are well-typed. No GADTs are used. Apart from switching off the monomorphism restriction for convenience, the code is in Haskell98.
IORefs. There is also a
monad-less solution, which maintains a dictionary of
already constructed nodes. The latter is less obvious: seemingly we
cannot add a node to the dictionary until we have built the node. Therefore,
we cannot use the updated dictionary when building the
descendants of the node -- which need the updated dictionary to link
back. The problem can be overcome however with a credit card
transform (a.k.a. "buy now, pay later" transform). To avoid
hitting the bottom, we just have to "pay" by the "due date".For illustration, the article below considers the problem of printing out a non-deterministic finite automaton (NFA) and transforming it into a deterministic finite automaton (DFA). Both NFA and DFA are represented as cyclic graphs.
Tying the knot: How to build a cyclic data structure
A CommonHaskellIdioms Haskell Wiki page.
Trying to tie the knot
< http://www.mail-archive.com/haskell@haskell.org/msg10687.html >
A message on the Haskell mailing list about a fixed-point combinator helping to tie a knot with forward links. Posted on Fri, 12 Apr 2002 18:44:07 -0700.
We consider the problem of one producer and two consumers in the general setting. We derive a deforested version, which no longer needs to buffer any produced items. When we run that version, the memory retaining profile shows no space leaks. We also discuss a ``parallel'' writing of several streams into two distinct files. Our solution is safe and yet the i/o is effectively interleaved. The deforested solution is derived, via a sequence of equivalent transformations. The derived code worked on the first try.
Jorge Adriano: producing and consuming lists
A message posted on the Haskell-Cafe mailing list posted on Tue, 5 Nov 2002 16:04:58 +0000. The message describes the problem of consuming two inter-dependent streams while avoiding space leaks.
DefinitionTree, is yet another
embedding of Prolog in Haskell. It is distinguished not by speed or
convenience. Rather, it is explicitly designed to formalize
evaluation strategies such as SLD and SLD-interleaving, to be easier
to reason about and so help prove termination and other properties of
the evaluation strategies. The main difference of the DefinitionTree from
other embeddings of Prolog in Haskell is the absence of
name-generation effects. We need neither gensym nor the state monad to
ensure the unique naming of logic variables. Since naming and
evaluation are fully separated, the evaluation strategy is no longer
concerned with fresh name generation and so is easier to reason
about equationally.Pure, declarative, and constructive arithmetic relations
DefinitionTree has been used to describe the SLD and SLD-interleave evaluation strategies and to prove the basic properties of solution sets.
Int->Int function.
That seems simple until we realize that
finding the value of that particular function at the argument x requires evaluating the function for smaller arguments and may require evaluating the function for larger arguments. Thus
straight-forward memoization leads to a large and sparsely populated
memoization table. As the original poster observed, the memoized
function begins thrashing sooner than the original code.
The solution is to balance the cost of recomputing the value versus
the cost of maintaining the memoization table and doing the lookup. It
is worth memoize the values of the function for some small x --
which are used frequently. For larger x, re-computation is
better. The parameter sc_upb in the code determines the balance, the
largest argument value for which we still memoize. The simple
benchmark shows that this controlled memoization does improve both the
space and time performances.
A notable feature of the code is the transparency of adding the controlled memoization. The body of the algorithm remains essentially the same.
We consider three examples of such transformations,
including two pure functional implementations of a bubble-sort. The
imperative code is quite complex: nested loops, data
dependencies across iterations, reading from and writing into the
array, conditional branches that depend on data
and the iteration history, control dependencies across nested loops.
We show two solutions in Haskell. Both use fast, imperative STArrays;
both try to separate the imperative part from a functional part. The
imperative part is performed in a monad while the functional part
is written in ordinary Haskell. The two solutions are closely related,
and written to emphasize their similarity. One of the solution is
an interpreter for a trivial virtual machine.
The analysis of data and control dependencies across loop iterations, scheduling of instructions and interlocking of their results, pre-fetching of data -- all are the job of an advanced compiler or the CPU. On the other hand, we have to do the same kind of analyses to separate the functional part of a computation from its imperative part. Functional programming appears closer to ``the bare metal'' than one would have thought.
The relationship of the dependency analysis of the imperative array
code with the extraction of a functional program from an imperative
one has been stated by Andrew W. Appel, in his paper "SSA is
Functional Programming". The present article has shown several
illustrations of his thesis. We have done manual dependency analyses
of several code snippets that update arrays in place. The analyses let
us extract pure-functional parts of imperative programs. The rest
requires serialization and can be handled by an ST monad.
fp-arrays-assembly.txt [12K]
The article with the full source code
It was originally posted as FP vs assembly programming for super-scalar CPUs [Was: Nested Loops in Clean] on the newsgroup comp.lang.functional on Thu, 16 Aug 2001 18:24:24 -0700
Andrew W. Appel SSA is Functional Programming
ACM SIGPLAN Notices v. 33, no. 4, pp. 17-20, April 1998.
The following code is a space-efficient implementation of the
game Mastermind. It is efficient not in terms of the heap or stack
space but in the source text space. Jan Rochel wrote the original
code and put it in his e-mail signature. He asked for a shorter
version. The present code relies on a different algorithm to compute
exact and inexact matches in two strings. The new algorithm made the
code a bit faster, but more importantly, a bit (7%) shorter. The new
code is also more obscure and exhibits some interesting patterns, in
particular f x=(x\\).(x\\).
The version to put in a .signature (275 chars):
import Random;import List;p=putStrLn;u=uncurry;f x=(x\\).(x\\)
main=mapM(\x->randomRIO(49,54))[1..4]>>=n 0.map toEnum>>=p.("Tries: "++).show
e=((partition$u(==)).).zip;h(p,q)=['*'|x<-p]++['+'|x<-u f$unzip q]
n a s=getLine>>=m where{m i|i==s=return a;m i=p(h$e i s)>>n(a+1)s}
An expanded version that is easier to read:
import Random;import List
main = mapM(\x->randomRIO(49,54))[1..4]
>>= n 0 . map toEnum >>= p . ("Tries: "++) . show
p=putStrLn
u=uncurry
e=((partition$u(==)).) . zip
f x=(x\\).(x\\)
h(p,q)=['*'|x<-p] ++ ['+'|x<-u f $ unzip q]
n a s=getLine>>=m a s
m a s i | i==s = return a
m a s i = p(h$e i s) >> n(a+1)s
The code runs on GHC(i).oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML