(Int->)^{n} ([]^{n} e)->...
id id id id id True -- True id (\\k -> k 1) (\\v k -> k (v,True)) (\\v k -> k (v,"str")) id -- ((1,True),"str") let go = flip ($) begin = go () push x s = go (x,s) add (x,(y,s)) = go (x+y,s) mul (x,(y,s)) = go (x*y,s) in begin (push 1) (push 2) (push 3) mul (push 4) add add id -- (11,())
The trick is to arrange for the return type of the function to be a type variable. It can always be instantiated to an arrow type, letting the function accept one more argument. Functions in the continuation-passing style are naturally polymorphic in the return, that is, the answer-type. Danvy's Functional Unparsing is the clearest demonstration of the technique.
On this page we collect more elaborate applications. The key idea is that the concrete type of a polymorphic term is determined by the context into which the term is placed. Pattern-matching on that concrete type lets the term determine the context of its use: specifically, the term can determine the number and the types of the values it receives as arguments, and what is expected in return. The term uses that information to chose the appropriate operation. For example, given
class C a where f :: String -> athe term
f "Hello, "
has the polymorphic type forall a. C a => a
. Placed in the contextputStrLn $ f "Hello, " True " world" '!'the type variable a| is instantiated to the concrete type
Bool -> String -> Char -> String
. Hopefully, there is an instance
of C
with this type, defining the required operation. Such instances
can be built inductively:instance C x => C (Char -> x) where f a x = f (a ++ [x]) instance C x => C (Bool -> x) where f a x = f (a ++ show x) instance C x => C (String -> x) where f a x = f (a ++ x)
This class C
, which lets the term f
pattern-match on its continuation, is the template for most
of the code on this page. The pattern-matching will be more
elaborate. For example, the arguments do not have to be processed in
the order they are received: the term can rearrange its arguments into
some canonical order before performing the requested operation. That
is the idea behind the implementation of keyword arguments.
The code referenced below shows that defining polyvariadic
functions takes only a few lines of Haskell code, and requires only
the most common extension of multiparameter classes with functional
dependencies. Here is an example of using a polyvariadic function, build
, which takes an arbitrary number of arguments and
returns them in a list. The function build
is first
class, and so can be passed as an argument to other functions, such as use_build
below.
use_build::(forall r a. (BuildList a r) => a -> r) -> x -> x -> x -> x -> [[x]] use_build bb a b c d = let t1 = bb a -- bb is indeed polyvariadic t2 = bb a b t3 = bb a b c t4 = bb a b c d t5 = bb a b c d a in [t1,t2,t3,t4,t5] test_ub = use_build build 'a' 'b' 'c' 'd' -- result: ["a","ab","abc","abcd","abcda"]
The source code accompanying the HList paper ``Strongly typed
heterogeneous collections'' demonstrates a similar polyvariadic
function hBuild
to make heterogeneous lists and records. Not only hBuild
has a variable number of arguments, those
arguments may all have different types.
Strongly typed heterogeneous collections
A polyvariadic function of a non-regular type (Int->)^{n} ([]^{n} e)->...
Lambda abstractions in C++ vs. Scheme. Currying
Unexpectedly, the Haskell realization of functions with the unlimited number of variously typed arguments turns out almost identical to the implementation of ``currying'' in C++, developed five years prior.
Here's one example: assigning a name to a partially applied keyword argument function without defaults:
tests3 = let f x = kw make_square HNil Color Red x in f Origin (0::Int,10::Int) Size (1::Int)
(tp "The following example illustrates defaults, to be used for" "omitted keyword arguments:") ([mup_code| tests4 = let defaults = Origin .*. (0::Int,10::Int) .*. RaisedBorder .*. True .*. HNil in kw make_square defaults Size (1::Int) Color (RGBColor 0 10 255)
The gist of our implementation is a type-level reflection of a function.
Our solution requires no special extensions to Haskell and works with the existing Haskell compilers; it is tested on GHC 6.0.1 and 6.2.1. The overlapping instances extension is not necessary (albeit it is convenient).
Strongly typed heterogeneous collections
The keyword argument code depends on the HList library, which
implements strongly-typed polymorphic open records. In fact, the
keyword argument code is included as one of the examples in the
HList library.
Functions with the variable number of (variously typed) arguments
Macros with keyword (labeled) arguments
Another example of using the ``compile-time'' reflection to implement keyword arguments. Keyword resolution and argument re-ordering are done at macro-expand time.
(Int->)^{n} ([]^{n} e)->...
Int
, must match the dimensionality
of the list. In other words, the desired function has all of
the following signatures all at the same time.replace :: e -> e -> e -- replacing in a 0-dim list replace :: Int -> [e] -> e -> [e] replace :: Int -> Int -> [[e]] -> e -> [[e]] replace :: Int -> Int -> Int -> [[[e]]] -> e -> [[[e]]] ...Here
e
is the type of list elements; the function
returns the updated list. For example, replace (2::Int) (1::Int) strs 'X'
takes the list of strings
and replaces the second character of the third string with the 'X'
.The gist of the solution is the case analysis of the type of function's continuation. Or: bottom-up deterministic parsing of the type of the continuation.
This problem was posed as a `Type class puzzle' by Yitzchak
Gale on the Haskell-Cafe mailing list in Oct 2006. As a matter of
fact, Yitzchak Gale asked for a function with a slightly different
signature, with the different order of the last two arguments, e.g., replace :: Int -> Int -> e -> [[e]] -> [[e]]
, etc. If we consider
the set of the signatures as a language, we notice the problem:
its LALR-like grammar has a shift/reduce conflict. Indeed,
the type of list elements may itself be Int
. Thus, after
scanning two Int
arguments and looking ahead at another Int
, we cannot tell if we are dealing with the
replacement in a 2D Int
list, or the replacement in a
higher-dimensional non-|Int| list. This problem in the
grammar would compel the use of overlapping instances and other
typeclass tricks. Since the order of replace
's last two
arguments seems to have been chosen quite arbitrarily by the
original poster, it makes sense for us to pick the order that
leads to a LALR(1) language and the simple solution, with no need for
overlapping instances.
This puzzle demonstrates the benefits of bringing the tools from a different area of computer science -- parsing and grammars -- to seemingly unrelated type class programming. Types and parsing are deeply related: see type logical grammars.
The implementations of replace
and the
explanation of the parsing problem and the solutions have been written
by Chung-chieh Shan.
puzzle.hs [3K]
The first, direct-style solution and tests
puzzle-cps.hs [3K]
Another, continuation-passing-style solution and tests
zipWith
elementwise combines (zips)
lists [a]
and [b]
using the combining function a->b->c
,
producing the list [c]
. To likewise combine three lists, the library
offers a separate function zipWith3
. For four lists, zipWith4
,...,
for seven lists, zipWith7
. Writing seven separate functions that do
essentially the same thing is tedious (which is probably why the
authors stopped at seven), let alone inelegant. Incidentally, map
,
which can be called zipWith1
, is also a member of the family (and so
can be repeat
, as zipWith0
). We may also want to combine
elementwise other collections, such as Data.Sequence
or arrays, or
even zip a list with an array obtaining either a new array or a new
list. Should we write the multitude of zipWith
functions for each
of these collections, and for each combination of collections?
We describe an alternative: the single function to elementwise
combine arbitrarily many collections of arbitrary sorts (lists,
sequences, etc). We hence present the double-generic zipWith
--
generic in the number of arguments and in the sort of the collections.
Different collections may have elements of different types; in some
implementations, different collections may be of different sorts --
some arrays, some lists, some unpacked arrays or sequences. We do not
have to separately specify the count of the collections to zip. Before
we go any further, here are a few examples. First, we sum up two or
three lists:
test1 :: [Int] test1 = zipN succ [1,2,3] test2 :: [Int] test2 = zipN (+) [1,2,3] [10,11,12] test21 :: [Int] test21 = zipN (((+) .) . (+)) [1,2,3] [10,11,12] [100,200,300]Different collections may have different element types, as the next example, of selecting elements from two arrays based on the boolean mask, shows. In the example,
mask
is a boolean array with bounds 1 through 10; arr1
and arr2
are Float
arrays with bounds (-1,7)
and (2,20)
respectively. The resulting Float
array has the bounds (2,7)
:a3 = zipN (+) arr1 arr2 :: Array Int Float ar = zipN (\x y z -> if x then y else z) mask arr1 a3 :: Array Int FloatThe last example also selects from two collections according to the boolean mask. It uses the most general
zipN
implementation and
demonstrates that the collections to zip do not have to be of the same
sort: the boolean mask is an Integer
, one collection to select from
is a ByteString
and the other is an ordinary String, that is, [Char]
. The result is returned as a ByteString
:bstr :: B.ByteString test51 = zipN B.pack (0x5::Integer) bstr "xyz" (With $ \x y z -> if x then y else (fromIntegral . ord $ z))
We present three implementations of zipWith
-- in the classical
triad of thesis, antithesis and synthesis. All three are usable,
differing in the sorts of collections they support and if all
collections to zip must be of the same sort. They also greatly differ
in efficiency: the last, synthesis implementation is most efficient,
creating no intermediate collections, filling in the result as it
steps through the argument collections in parallel.
We start with the most obvious, and the least efficient implementation. Recall, our goal is to generalize the family
map :: (a -> b) -> [a] -> [b] zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] zipWith3 :: (a -> b -> c -> d) -> [a] -> [b] -> [c] -> [d] ...replacing it with the single function
zipN :: (a1 -> a2 -> a3 -> ... -> w) -> c a1 -> c a2 -> ... -> c wHere, all collections to zip must be of the same sort
c
, which has to be pairwise zippable.
That is, it should be possible to zip a pair of collections:class Zip2 c where zipW2 :: (a->b->r) -> c a -> c b -> c rClearly, lists are pairwise zippable, and so are arrays, for example.
The idea of the first implementation comes from the similarity between
zipping and the Applicative Functor application: zipN f a1 a2 ... an
is essentially f <$> a1 <*> a2 ... <*> an
,
assuming the collection type implements the Applicative interface, or at
least the (<*>)
operation. Anything that
implements the Zip2 interface is such an unpointed Applicative: (<*>) = zipW2 ($)
. The problem hence reduces to writing a function
that takes an arbitrary number of arguments and
``inserts (<*>)
between them''.
The Introduction section of the page has already described the solution: the type class that pattern-matches on the return type, the type of the continuation. Here it is, adjusted to our circumstances:
class ZipN ca r where zip_n :: ca -> r -- The return type is (c r): produce the result instance (a ~ r) => ZipN (c a) (c r) where zip_n = id -- The return type is (c a' -> r): we are given a new -- collection. We combine it with the accumulator using <*> instance (Zip2 c, a ~ a', ZipN (c b) r) => ZipN (c (a -> b)) (c a' -> r) where zip_n cf ca = zip_n (zipW2 ($) cf ca) -- Our desired zipN is hence zipN f c1 = zip_n (fmap f c1)
This was the entire implementation. The just defined zipN
can be used as:
mask :: Array Int Bool a1, a2 :: Array Int Float a3 = zipN (+) a1 a2 :: Array Int Float -- element-wise addition ar = zipN (\x y z -> if x then y else z) mask a1 a3 :: Array Int Float -- selection according to the mask
In this very simple implementation of zipN
, all collections to zip
must be of the same sort (e.g., all lists) and implement Zip2
and Functor
. Signatures are mandatory, to tell zipN
there are no more
arguments. This is a drawback. The implementation breaks down for
collections that are functions: the instances of ZipN
become
overlapping. A more serious problem is allocating large amounts of
working memory. For example, the zipping of three collections zipN (f::t1->t2->t3->r) (a1::c t1) (a2::c t2) (a3::c t3)
proceeds as:
let w1 = fmap f a1 :: c (t2->t3->r) w2 = w1 <*> a2 :: c (t3 -> r) w3 = w2 <*> a3 :: c r in w3and so produces two intermediate collections (of closures)
w2
and w3
, becoming garbage at the end. (The implementation hence cannot
work with unboxed collections that may not have closures.)
As the antithesis, we negate some design decisions and improve
the type inference, avoiding the cumbersome type annotations.
We make the combining function the last argument of zipN
, and wrap
it in the unique newtype With
. The type of zipN
now has
the following general shape
zipN :: c a1 -> c a2 -> .... c an -> (With (a1 -> a2 -> ... an -> r)) -> c r
With
does not let us confuse the combining function with an argument
collection, even for collections that are functions. We can reliably
tell the end of the argument list without needing type
annotations. With the combining function now coming at the end, the
earlier trick of partially applying (zipping) it through argument
collections no longer works. We use a different trick, of building
pairs alongside the corresponding curring function. That is, zipN (a1::c t1) (a2::c t2) (a3::c t3) (With (f::t1->t2->t3->r))
now
proceeds aslet w1 = a1 :: c t1 cnv1 = id :: forall u. (t1->u) -> (t1->u) w2 = zipW2 (,) w1 a2 :: c (t1,t2) cnv2 = \g -> \ (a1,a2) -> cnv1 g a1 a2 :: forall u. ((t1 -> t2 -> u) -> ((t1,t2) -> u) w3 = zipW2 (,) w2 a3 :: c ((t1,t2),t3) cnv3 = \g -> \ (a12,a3) -> cnv2 g a12 a3 :: forall u. ((t1 -> t2 -> t3 -> u) -> (((t1,t2),t3) -> u) in fmap (cnv3 f) w3The accompanying source code shows the type classes that implement this stepwise curring/pairing for the arbitrary number of arguments. (The reader is encouraged to try writing those typeclasses as an exercise.) The previous example now looks as:
a3 = zipN arr1 arr2 (With (+)) ar = zipN mask arr1 a3 (With (\x y z -> if x then y else z))
The annoying type signatures are gone: all the types are inferred. The
argument collections still have to be of the same sort and be zippable
-- but they can now be functions. We are building collections of
tuples rather than of closures, which is marginally better: tuples
take less space. Nevertheless, we are still building the intermediate
collections w2
and w3
, wasting time and space. There has to be a
better way.
As synthesis, we look back to the starting point,
the standard library function zipWith
. Thanks to lazy lists, it
traverses its two argument lists in parallel: it takes an element from each
list, combines them into the result element, and only then
looks further through the argument lists. Thus zipping up multiple
lazy lists avoids building intermediate lists thanks to laziness. To
handle other collections just as efficiently, we have to lazily
convert them to lazy lists. In other words, the proper zipping should
deal not with the collections themselves but with their views, as
lazy element streams. Following this insight, we define the interface
for a streaming view of a collection:
class Streamable c where type StreamEl c :: * toStream :: c -> [StreamEl c]All
Foldable
collections are Streamable
; other collections, such
as bits and ByteString
are Streamable
too. The view removes the
restriction that all collections to zip must be of the same sort. They can
be anything, so long as they are Streamable
. The result of zipping
is now a stream, which has to be converted to a suitable collection.
We pass that converter as the first argument to the new zipN
, which
otherwise has the interface of the second implementation. Our
running example zipN (tr :: [r] -> c r) (a1::c1 t1) (a2::c2 t2) (a3:: c3 t3)
(With (f::t1->t2->t3->r))
now
proceeds aslet w1 = toStream a1 :: [t1] cnv1 = id :: forall u. (t1->u) -> (t1->u) w2 = zip w1 (toStream a2) :: [(t1,t2)] cnv2 = \g -> \ (a1,a2) -> cnv1 g a1 a2 :: forall u. ((t1 -> t2 -> u) -> ((t1,t2) -> u) w3 = zip w2 (toStream a3) :: [((t1,t2),t3)] cnv3 = \g -> \ (a12,a3) -> cnv2 g a12 a3 :: forall u. ((t1 -> t2 -> t3 -> u) -> (((t1,t2),t3) -> u) in tr $ map (cnv3 f) w3The changes to the second
zipN
implementation are minor.
We have described the double-generic zipN
that efficiently combines
arbitrarily many collections of arbitrary types. The key ideas are
pattern-matching on the result type (the type of the continuation);
defining the interface of zipN
to make this pattern-matching unambiguous;
operating on the streaming view of collections rather than the collections
themselves.
Exercises How to verify that the efficient zipN
really traverses
its arguments in parallel and does not build large intermediate collections?
How to avoid relying on laziness of Haskell strings? We could then deal
with effectful collections such as files and mutable arrays, and also
be sure of the evaluation order. What are the drawbacks of Foldable
? How
to zip arrays and finite maps, like Data.Map
, that is,
collections with indexed (keyed) elements?
Daniel Fridlender and Mia Indrika: Do We Need Dependent Types? (Functional Pearl)
J. Functional Programming, 2000, v10, N4, pp. 409-415
The paper describes the polyvariadic zipN, for lists only. The argument count has to be given explicitly, as a Church-like numeral.
We considered zipN
with N>0
. For lists in
Haskell, one may as well define zipN
even for zero N
, as repeat
. For
an example, see Fridlender and Indrika's paper above. We do not take such
choice here since we deal with arbitrary collections (including arrays),
for which the necessarily lazy repeat
does not make sense.
At first glance, Chris Okasaki's 2002 Haskell puzzle of writing RPN, HP-calculator--like expressions looks impossible:
begin push 5 push 6 add end begin push 5 push 6 push 7 add add end(The expressions are to be evaluated to 11 and 18, resp.) The function
begin
has to take an arbitrary number of arguments, which seems
incompatible with static typing. After one realizes that the simplest
identity id
is already polyvariadic, the solution is immediate, as
we have seen in the Introduction section. For reference, it is
repeated below:begin :: (() -> a) -> a push :: st -> Int -> ((Int,st) -> a) -> a add :: (Int, (Int, st)) -> ((Int,st) -> a) -> a end :: (a,st) -> a begin k = k () push st x k = k ((x::Int),st) add (n1,(n2,st)) k = k (n1+n2,st) end (top,st) = topThe types not only give away the implementation, they also give out the insight:
(t -> a) -> a
is the telltale sign of continuations. Danvy's Functional
Unparsing explains this pattern well. Okasaki's 2002 paper elaborated the
simple solution to a real postfix language, complete with conditionals, side
effects and user-defined recursive procedures.How frustrating it is to realize that this insightful approach is useless! The relatively simple expression
begin push 1 push 2 push 3 push 4 push 5 push 6 push 7 push 8 push 9 push 10 add add add add add add add add add endtakes 5.6 seconds to compile (with no optimizations). Adding one more
push 11
and one more add
blows the GHC compilation time to 19.2 seconds,
the 3-time increase. The exponential blow-up indeed. Okasaki ended his
paper by noting that type checking of the ubiquitous Fibonacci written in
his postfix language crashed GHC after a long wait.We write the long missing final chapter of this story: the embedding of postfix language one can actually use. This page is authored in HSXML, as many others on this site, and relies on the implementation of polyvariadic functions that scales to hundreds of arguments.
Before we get to the solution, we should determine the source of the poor
performance. Let us see the type meant for begin
in the simplest version
of the problematic example:
t11 = (begin `asTypeOf` _b) (push `asTypeOf` _p1) 1 (push `asTypeOf` _p2) 2 (add `asTypeOf` _add) (end `asTypeOf` _end) Found hole ‘_b’ with type: (() -> Int -> ((Int, ()) -> Int -> ((Int, (Int, ())) -> ((Int, ()) -> Int) -> Int) -> ((Int, ()) -> Int) -> Int) -> Int -> ((Int, (Int, ())) -> ((Int, ()) -> Int) -> Int) -> ((Int, ()) -> Int) -> Int) -> Int .... {- elided the other half -}Here,
_b
is a type hole, the recent, very
helpful GHC feature. The size of the type tells it all.
A related problem is the huge and unhelpful type error
messages. The root cause is, alas, the continuation-passing style itself.
Recall the type of begin :: (() -> a) -> a
. In the
expression begin push 1 push 2 add end
, the type variable a
is
instantiated to the (large) type of everything that follows begin
push
, that is, it will be Int -> type-of-push -> Int ... ->
Int
. Remember, in the type of begin
, this type variable a
occurs twice. Thus the common approach of embedding typed postfix languages, in the
continuation-passing style, has the inherent performance flaw.The Introduction section has described a different approach: pattern-matching on the type of the context, with a type class. Here is how we solve the Okasaki puzzle now:
-- Start with the 'stack' and then continue class Forth stack r where build :: stack -> r data End = End instance (a ~ stack) => Forth stack (End -> a) where build stack _ = stack data Add = Add -- Start with (Int, (Int, stack)), see Add and continue with (Int,stack) instance Forth (Int,stack) r => Forth (Int,(Int,stack)) (Add -> r) where build (n1,(n2,stack)) _ = build (n1+n2,stack) data Push = Push instance (a ~ Int, Forth (Int,stack) r) => Forth stack (Push -> a -> r) where build stack _ n = build (n,stack) begin = build () end = End add = Add push = Push
The new solution is also relatively compact and
simple, once we get the basic idea of pattern-matching on the
context. The problematic example, of pushing 11 integers and then
adding them up, now takes only 0.23 seconds, compared to 19.2 seconds
earlier. That is two orders of magnitude improvement! It is
instructive to see what type is now inferred for the subterms of t11
above:
Found hole ‘_b’ with type: Push -> Int -> Push -> Int -> Add -> End -> t Found hole ‘_p1’ with type: Push
In conclusion, the well-known embedding of postfix languages is also well-known to be very slow, because it results in exponentially large types which take exponentially long (with the size of the expression) to handle. The problem is inherent in the reliance on the continuation-passing style. We have demonstrated a new approach, based on the pattern-matching of the type of the context. The approach linearly scales and has been used in practice, for example, to format this page. Whereas originally postfix expressions with about 20 primitives crash the compiler, we now easily handle 10-20-times longer expressions.
Chris Okasaki: Techniques for embedding postfix languages in Haskell
Haskell Workshop, 2002
newtype MyVal = MyVal Int deriving Show -- A sample TIP tip1 = MyVal 20 .*. (1::Int) .*. True .*. emptyTIP -- TIP (HCons (MyVal 20) (HCons 1 (HCons True HNil))) -- Update the Int component of tip1 to 2. The Int component must -- exist. Otherwise, it is a type error tip2 = ttip (2::Int) tip1 -- TIP (HCons (MyVal 20) (HCons 2 (HCons True HNil))) -- Negate the boolean component of tip1 tip3 = ttip not tip1 -- TIP (HCons (MyVal 20) (HCons 1 (HCons False HNil))) -- Update the Int component using the values of two other components tip4 = ttip (\(MyVal x) y -> x+y) tip1 -- TIP (HCons (MyVal 20) (HCons 21 (HCons True HNil))) -- Update the MyVal component from the values of three other components tip5 = ttip (\b (MyVal x) y -> MyVal $ if b then x+y else 0) tip1 -- TIP (HCons (MyVal 21) (HCons 1 (HCons True HNil)))" -- The same but with the permuted argument order. -- The order of arguments is immaterial: the values are looked up by -- their types tip5' = ttip (\b y (MyVal x)-> MyVal $ if b then x+y else 0) tip1 -- TIP (HCons (MyVal 21) (HCons 1 (HCons True HNil)))
Strongly typed heterogeneous collections
The HList paper describes type-indexed products (TIP) in Section 7. The TIPTransform code is included as an example in the HList distribution.
max3 = max2 . (max2 * id) max4 = max2 . (max2 * max2) max5 = max2 . ((max2 * id) . (max2 * max2)) max7 = max3 . ((max2 * max3) * max2)where
*
is a categorical product and .
is a categorical composition. We show the code for the product and the
composition that lets us write the above formulas in Haskell as they
are.
A potentially useful side-effect is an automatic deep uncurrying of
arbitrarily complex pairs, such as ((1,2),(3,(4,5)))
.
categorical-max3.txt [4K]
An older approach
It was originally posted as Re: On products and max3 on the newsgroup comp.lang.functional on Thu, 12 Apr 2001 02:14:27 +0000
mcomp
for
the farthest composition of functions:f:: a1->a2-> .... ->cp g:: cp->d f `mcomp` g:: a1->a2-> .... ->dTo resolve the ambiguity in the above definition we assume that
cp
is not a function type.The problem was originally posed on comp.lang.functional by Immanuel Litzroth, who wrote:
The problem arose quite naturally when I was working on something todo with partial orderPO:: A -> B -> Maybe Ordering
and now I wantedgroupBy (isJust . PO)
.
Categorical products in Haskell
Another application of the combinator mcomp
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML