Functions with the variable number (of variously typed) arguments
* Motivation
* Implementation
* Vararg functions are first-class
* Variable number of variably typed arguments
* Variable number of really variably typed arguments
* How does that work?
* The function build and polymorphic Num constants
* Motivation
In a message 'how to write a list builder? fixpoint?' posted on the
Haskell mailing list on June 1, 2004, Ben Yu wrote:
] Is it possible to write a function to build a list [a]?
] so that I can write [a,b,c,d] as "getBuilt $ build a b c d"?
Essentially this is the question about the possibility of polyvariadic
functions in Haskell. At first blush, the answer seems
negative. Indeed, footnote 2 of the article "The Web" by Ralf Hinze
and Johan Jeuring (JFP, The Functional Pearl) says: "Since Haskell
currently has no support for defining variadic functions, the Web
library only supplies con0, ..., con_max where max is some fixed upper
bound." This message shows that Haskell with the commonest extension
(multi-parameter type classes and functional dependencies) has
supported functions with the indefinite number of arguments all along.
Furthermore, the implementation takes only few lines.
* Implementation
> {-# OPTIONS -fglasgow-exts #-}
>
> module VarArg where
> import Data.FiniteMap -- for an example below
>
> class BuildList a r | r-> a where
> build' :: [a] -> a -> r
>
> instance BuildList a [a] where
> build' l x = reverse$ x:l
>
> instance BuildList a r => BuildList a (a->r) where
> build' l x y = build'(x:l) y
>
> --build :: forall r a. (BuildList a r) => a -> r
> build x = build' [] x
That's it. It works both in GHC and Hugs.
*VarArg> build True :: [Bool]
[True]
*VarArg> build True False :: [Bool]
[True,False]
*VarArg> build True False False ++ build True False False
[True,False,False,True,False,False]
*VarArg> build 'a' 'b' 'c' 'd' 'e' :: [Char]
"abcde"
*VarArg> build (1::Int) (2::Int) :: [Int]
[1,2]
*VarArg> build (1::Int) (2::Int) (3::Int) :: [Int]
[1,2,3]
Note that the type annotation [Bool] or a similar hint at the end is
required: it is the delimiter of the list. Who would have thought that
the type annotation can play the role of Nil...
We must also note that for the function build', it is the type of its
continuation that determines the type of its argument.
* Vararg functions are first-class
Polyvariadic functions are first-class and can be passed to other
functions as arguments. We should keep in mind however that
polyvariadic functions are actually class methods and are therefore
polymorphic. A function that takes a polymorphic function and uses it
polymorphically has a higher-ranked type. Higher-ranked types cannot
be inferred (in general) and must be declared explicitly. In great
detail, this issue is discussed in Ken Shan's survey ``Sexy types
in action,'' ACM SIGPLAN Notices 39(5):15-22, 2004:
http://www.eecs.harvard.edu/~ccshan/cs252/usage.pdf
The following code illustrates passing of the above 'build' function
as an argument to other functions. We can use the latter argument
polyvariadically.
> -- Higher-ranked type: the signature is required
> use_build::(forall r a. (BuildList a r) => a -> r)
> -> x -> x -> x -> x -> [[x]]
> use_build bb a b c d =
> let t1 = bb a
> t2 = bb a b
> t3 = bb a b c
> t4 = bb a b c d
> t5 = bb a b c d a
> in [t1,t2,t3,t4,t5]
>
> test_ub = use_build build 'a' 'b' 'c' 'd'
>
> -- *VarArg> test_ub
> -- ["a","ab","abc","abcd","abcda"]
The following example demonstrates another aspect of first-classness
of 'build' -- storing it in a data structure. The code also requires
fewer signatures. In the example, 'build' builds itself. The function
really has quite a few faces...
> data W = W (forall r a. (BuildList a r) => (a->r))
>
> test2 = let t1 = build (W build)
> t2 = build (W build) (W build)
> t3 = t1 ++ t2
> f (W b) = b (1::Int) ++
> b (1::Int) (2::Int) ++
> b (1::Int) (2::Int) (3::Int)
> in map f t3
>
*VarArg> test2
[[1,1,2,1,2,3],[1,1,2,1,2,3],[1,1,2,1,2,3]]
* Variable number of variably typed arguments
The following simple variation defines a polyvariadic function to
construct associative lists or finite maps:
> class Builder2 c a b r | r -> a where
> build2 :: (a->b->c a b->c a b) -> c a b -> a -> b -> r
> instance Builder2 c k v (c k v) where
> build2 acc seed k v = acc k v seed
> instance Builder2 c a b r => Builder2 c a b (a->b->r) where
> build2 acc seed a b x y = build2 acc (acc a b seed) x y
>
> newtype AL a b = AL [(a,b)] deriving Show
>
> test_bb1::AL String Bool
> test_bb1 = build2 (\x y (AL l) -> AL $ (x,y):l) (AL []) "a" True "b" False
>
> test_bb2:: FiniteMap String Bool
> test_bb2 = build2 (\x y m -> addToFM m x y) emptyFM
> "a" True "b" False "c" True
The function build2 not only has the variable number of
arguments. These arguments don't even have to be of the same type.
* Variable number of really variably typed arguments
Functions with the variable number of variably typed arguments are
especially useful for creating strongly-typed heterogeneous
collections such as lists and records:
http://homepages.cwi.nl/~ralf/HList/
The following is a snippet from the source code accompanying the HList
paper (Haskell Workshop, 2004):
HList> let x = hBuild True in hEnd x
HCons True HNil
HList> let x = hBuild True 'a' in hEnd x
HCons True (HCons 'a' HNil)
HList> let x = hBuild True 'a' "ok" in hEnd x
HCons True (HCons 'a' (HCons "ok" HNil))
HList> hEnd (hBuild (Key 42) (Name "Angus") Cow (Price 75.5))
HCons (Key 42) (HCons (Name "Angus") (HCons Cow (HCons (Price 75.5) HNil)))
HList> hEnd (hBuild (Key 42) (Name "Angus") Cow (Price 75.5)) == angus
True
Here we used a trivial function hEnd for list termination:
hEnd t@(HCons x y) = t
Note:
- x :: HCons a b
means: forall a b. x :: HCons a b
- hEnd x
means: exists a b. x :: HCons a b
* How does that work?
So the key here appears to be that the type-checker can distinguish
between function application and other uses of an expression?
Indeed, the trick is straightforward -- merely basic typechecking.
Let us recall the definition of build'
*> class BuildList a r | r-> a where
*> build' :: [a] -> a -> r
*>
*> instance BuildList a [a] where
*> build' l x = reverse$ x:l
*>
*> instance BuildList a r => BuildList a (a->r) where
*> build' l x y = build'(x:l) y
Let us consider the expression
build' [] 'a' 'b' ++ []
or, in a more abstract syntax
(++) ((build' [] 'a') 'b') []
We make use of the familiar typing rule
G |- e1: t1->t2 G |- e2: t1 <===> G |- (e1 e2) : t2
where G is the typing environment (which associates types with the variables
that appear free in expressions e1 and e2). The typing rule can be applied
in either direction.
Operator (++) has the type [t1] -> [t1] -> [t1]. Given that fact and
applying the typing rule twice, we infer that ((build' [] 'a') 'b')
has the type [t1], [] has the type [t1], and so the whole expression
has the type [t1]. We have inferred that
((build' [] 'a') 'b') :: [t1]
Applying the rule in reverse direction, we find that
(build' [] 'a'):: t3->[t1] 'b':: t3
Or, because 'b' has the type Char, t3 unifies with Char, which gives
us
(build' [] 'a') :: Char->[t1]
Applying the typing rule several more times, we find
[]:: t4
'a' :: t5
build':: t4 -> t5 -> (Char->[t1])
or
[]::t4
build' :: t4 -> Char -> (Char->[t1])
From the declaration of the class BuildList, we know that
build' :: [a] -> a -> r
By unifying the two signatures, we find
build' :: [Char] -> Char -> (Char->[t1])
Of the two instances of BuildList, the second obviously matches.
The functional dependency r->a essentially tells us that 'r' is the
primary key in the instance selection. That is, if we use (Char->[t1])
to match with the second parameter of the instance and found the
matching instance, we can be sure that it is the only matching instance
-- no matter how many instances are added in the future or contained
in other modules.
So, we commit to the second instance and
assign the type to build' in the original expression as
build'_2 :: [Char] -> Char -> (Char->[t1])
However, the second instance of BuildList includes the term
"build' (x:l) y", which we have to type in turn. We know that
(x:l)::[Char]
(build' (x:l) y)::[t1]
from which we determine, with the same typing rule,
build'::[Char] -> t6 -> [t1]
or, unifying with the signature for build',
build'::[Char] -> Char -> [t1]
Again, we search for the appropriate instance of BuildList. Now, the
first instance matches. Again, we commit to it. Unifying
'a' with Char and [t1] with [a] gives us
build'_1 :: [Char] -> Char -> [Char]
and the type of the whole (++) expression as [Char]. Typechecking succeeds,
and we were able to unambiguously choose instances.
The details of the actual typechecking algorithm most likely differ,
yet the general idea -- and the outcome -- are the same.
* The function build and polymorphic Num constants
Why just "build 1 2 3" or "build' [] 1 2 3" don't work?
Well,
*VarArg> build 1 ++ []
[1]
*VarArg> :t build 1 ++ []
build 1 ++ [] :: forall a. (Num a) => [a]
work as expected. Just "build 1" per se cannot work because
the typechecker selects the instance of BuildList based on the return
type. In the case of a mere "build 1", the return type is
indeterminate: it could be a function or could be a list. The
typechecker needs either the explicit type specification, e.g.,
::[Int], or at least a hint (which " ++ []" above provides).
If we specify more than one polymorphic constant, we have to give more
precise hints:
*VarArg> build (1::Int) (2::Int) ++ []
Or, if we wish to keep the constants (and hence the result)
polymorphic:
*VarArg> build (1::Num a =>a) (2::Num a =>a) ++ []
[1,2]
*VarArg> build (1::Num a =>a) (2::Num b =>b) (3::Num c=>c)++ []
[1,2,3]
One may question the need for those annotations: if we can say
[1,2,3], which types to Num a => [a], why can't we say "build 1 2 3"
and have the inferred type to be likewise Num a => [a]?
Here [1,2,3] is a syntactic sugar for 1:(2:(3:[])). The operator (:) has
the type
(:) :: forall a. a -> [a] -> [a]
Since (:) is actually a function, the type of the result (which is
[a]->[a]) is unambiguously determined by the type of the argument,
'a'. Informally, the type of a function has a functional dependency.
The function build' is a member of a class such that the type of the
_result_ of build' unambiguously determines the type of the
argument. Note the reverse implication. Therefore, once we specify
the type of the result, everything works out. And it does:
VarArg> build 1 2 3 4 :: [Int]
[1,2,3,4]
In Hugs. But not in GHC (6.0.1). My impression is that GHC assumes
overlapping instances when choosing instances -- even if no
-fallow-overlapping-instances flag is present. In order to get the
example work in GHC, we have to assure that all arguments to build
have the same type in some other way, e.g., using local type
variables:
*VarArg> let (a::t)=1 in let b=(2::t); c=(3::t) in build a b c a b c ::[t]
[1,2,3,1,2,3]
The result is actually polymorphic, Num t => [t].
One may say that giving each polymorphic argument of build an explicit
signature is a lot of keystrokes. We should keep in mind however that
these problems typically arise only when we enter expressions
involving polymorphic functions at GHCi's prompt. If we use the same
expressions in real code, very often enough type information is
available for the typechecker to successfully resolve overloading. For
example, people often give signatures to top level functions (for
better documentation, for explicitly stating the function's contract
and other social reasons). These signatures also help the typechecker.