From oleg-at-okmij.org Sun Nov 19 16:54:34 2006
Subject: Type-class overloaded functions: second-order typeclass programming
with backtracking
From: oleg-at-pobox.com
Message-ID: <20061120005434.2B3A9AB40@Adric.metnet.fnmoc.navy.mil>
Date: Sun, 19 Nov 2006 16:54:34 -0800 (PST)
To: haskell@haskell.org
Status: OR
We demonstrate functions polymorphic over classes of types. Each
instance of such (2-polymorphic) function uses ordinary 1-polymorphic
methods, to generically process values of many types, members of that
2-instance type class. The typeclass constraints are thus manipulated
as first-class entities. We also show how to write typeclass instances
with back-tracking: if one instance does not apply, the typechecker
will chose the `next' instance -- in the precise meaning of
`next'. The code only uses stable Haskell extensions (undecidable and
overlapping instances) and is tested with GHC 6.4.1 - 6.8.2.
Our running example is a function whose type (modulo the constraint) is
approx_eq :: a -> b -> Bool
It applies to values of any two types. If both arguments are of the
same type and are Fractionals, the function checks if the values are
the same within given tolerance, 0.5. We test the values of other
numeric types for equality give or take 1. For a class of other types
supporting equality, we test the values with the Eq's (==)
operation. For all other types, including the cases when the two
arguments are of different types, approx_eq returns False.
The complete code for this article is in the file poly2.hs of this
directory.
The first inspiration for this message comes from generic
programming. In approaches like SYB1 or Smash, the functions that
process (sub)terms of specific types must be monomorphic. In SYB1, for
example, that requirement comes from the fact that TypeRep only
supports monomorphic types (see also a note in Data.Dynamic
documentation). One would like, for example, to traverse a term and
apply a polymorphic numeric function to all subterms that are members
of the class Num -- without the need to monomorphise the function. The
second inspiration comes from Haskell-Cafe requests for a function
that does something for any Fractional and does something else for a
Num and something else entirely for a value whose type is in the class
Ord. The common solution to those problems is to monomorphise --
enumerate all needed types. That leads to a notable amount of
boilerplate. We'd like to process all Num values, for example, with
one polymorphic function -- without manually instantiating this
function for each Num type. Another drawback of the common solution is
the need to enumerate Num and Fractional classes again should we need
another function that does something for Fractionals and something
else for Nums.
We show how to eliminate that redundancy. We offer a way to describe
classes of types in a _concise_ way, using unions and class
differences. These classes of types may be either closed or open
(extensible). After that set up, we can write arbitrarily many
functions overloaded over these type classes. An instance of our
function for a specific type class may use polymorphic functions to
generically process all members of that type class. Our functions are
hence second-order polymorphic.
Again, the class membership has to be described -- but only once and
for all. Furthermore, we offer quite an expressive notation and the
opportunity of augmenting a class at a later time. For our running
example, we define the following type classes:
> type Fractionals = Float :*: Double :*: HNil
> type Nums = Int :*: Integer :*: AllOf Fractionals :*: HNil
> type Ords = Bool :*: Char :*: AllOf Nums :*: HNil
> type Eqs = AllOf (TypeCl OpenEqs) :*: AllOfBut Ords Fractionals :*: HNil
> data OpenEqs
> instance TypeCls OpenEqs () HTrue -- others can be added in the future
In particular, Eqs specifies the class of all of Ords except
Fractionals, plus the open OpenEqs. The latter currently includes (),
with more types can be added at any point. We excluded Fractionals
from Eqs just because we can -- and to hint that exact equality on
Fractionals is unreliable.
Incidentally, while the following naive definition does not type
type Russel = AllOfBut () Russel :*: HNil
because type synonyms can't be recursive, a more elaborated variant
does work, with the expected result. Thus when we say type _class_
(rather than type _set_), we really mean it.
Our implementation is a rather trivial extension of the Apply class of
HList:
class Apply f a r | f a -> r where
apply :: f -> a -> r; apply = undefined
Our 2-polymorphic functions (and their 1-polymorphic specializations)
are all instances of the Apply class. They are all identified by
labels (singular datatypes) for the reasons to be explained shortly.
Regular typeclasses match on types; to match on a type class, we need
a guard: a type function that decides the membership. We also need the
ability to `backtrack' and select another instances should the guard
fail.
Let us start with an example: a simple 2-polymorphic function
"a -> Bool" that returns True if its argument is a member of the type class
Eqs defined above. It returns False for any other type.
We start by introducing the label for our function:
> data IsAnEq = IsAnEq
Next we define a pair of matching instances. The first instance, of a
typeclass
> class GFN n f a pred | n f a -> pred
specifies the guard for the instantiation number `n' of the
2-polymorphic function named `f' when applied to an argument of the
type 'a'. The guard itself is a label, which, when Apply'ed to 'a'
yields either HTrue or HFalse. Again, we shall see soon why we need so
many labels.
> instance GFN Z IsAnEq a (Member Eqs)
> instance Apply (GFnA Z IsAnEq) a Bool where
> apply _ _ = True
We are saying here that the instantiation number Z of the function named
IsAnEq has the guard (Member Eqs) -- which tests the class membership
of the type 'a' in Eqs. Should this guard succeed, we `apply'
(GFnA Z IsAnEq) to the argument to get the desired result -- in this
case, True.
The second instance is the catch-all, it applies when the guard
(Member Eqs) fails:
> -- the default instance
> instance TypeCast pred Otherwise => GFN n IsAnEq a pred
> instance Apply (GFnA n IsAnEq) a Bool where
> apply _ _ = False
where the Otherwise membership predicate always succeeds. We now write
> test1 = [apply (GFn IsAnEq) (), apply (GFn IsAnEq) (1.0::Double),
> apply (GFn IsAnEq) 'a']
which evaluates to [True,False,True]. Recall that we explicitly
excluded Fractionals from Eqs.
Suppose that we wish to extend the above function to pairs, that is,
given a pair (x,y) it should return True if both 'x' and 'y' are in
Eqs. The simplest way is to extend the type class Eqs accordingly,
given that we specifically made it open. We choose a different
approach however, to illustrate that the function IsAnEq is itself
extensible, and can be recursively applied.
We add the following pair of instances:
> instance GFN (S Z) IsAnEq (x,y) Otherwise
> instance (Apply (GFn IsAnEq) x Bool,
> Apply (GFn IsAnEq) y Bool)
> => Apply (GFnA (S Z) IsAnEq) (x,y) Bool where
> apply (GFnA f) (x,y) = apply (GFn f) x && apply (GFn f) y
We wedge this instance `between' the existing two.
> test2 = [apply (GFn IsAnEq) (True,'a'),
> apply (GFn IsAnEq) (1.0::Double,True)]
> -- [True, False]
Let us move to the main example: the approximate equality: test the
Fractionals with the tolerance 0.5, test other Nums with the tolerance
1, and test Eqs with the exact equality. For the values of all other
types (including the case when the values to compare have different
types), the result is False. The order of guard tests obviously
matters as all of Fractionals are also Nums. If we want to
discriminate Fractionals, we should test for them first.
> data ApproxEq = ApproxEq -- define the label
>
> data PairOf t -- lift membership to pairs
> instance Apply t x r => Apply (PairOf t) (x,x) r
> instance TypeCast r HFalse => Apply (PairOf t) x r
>
> instance GFN Z ApproxEq (x,x) (PairOf (Member Fractionals))
> instance (Fractional x, Ord x) =>
> Apply (GFnA Z ApproxEq) (x,x) Bool where
> apply _ (x,y) = abs (x - y) < 0.5
>
> instance GFN (S Z) ApproxEq (x,x) (PairOf (Member Nums))
> instance (Num x, Ord x) =>
> Apply (GFnA (S Z) ApproxEq) (x,x) Bool where
> apply _ (x,y) = abs (x - y) < 2
>
> instance GFN (S (S Z)) ApproxEq (x,x) (PairOf (Member Eqs))
> instance (Eq x) =>
> Apply (GFnA (S (S Z)) ApproxEq) (x,x) Bool where
> apply _ (x,y) = x == y
-- recursion over pairs skipped...
> -- The default instance
> instance TypeCast pred Otherwise => GFN n ApproxEq a pred
> instance Apply (GFnA n ApproxEq) a Bool where
> apply _ _ = False
>
> -- A convenient abbreviation
> approx_eq x y = apply (GFn ApproxEq) (x,y)
*Poly2> :t approx_eq
approx_eq :: (Apply (GFn ApproxEq) (a, b) r) => a -> b -> r
> test3 = [approx_eq (1.0::Double) (1.5::Double),
> approx_eq (1.0::Float) (1.1::Float),
> approx_eq (1::Integer) (2::Integer),
> approx_eq (1::Int) True,
> approx_eq (Just ()) [],
> approx_eq ((2::Integer),(2::Integer)) ((1::Integer),(2::Integer)),
> approx_eq ((1::Integer),(2::Integer)) ((1::Integer),(2::Integer)) ]
> -- [False,True,True,False,False,True,True]
Now why we need so many labels and why we have to separate the guard
and the real computation across two different typeclasses?
First of all, why can't we write something like the following (as in
`Smash your boilerplate') and so use real functions rather than labels?
approx_eq2 x y = sapply (scons (undefined::PairOf (Member Fractionals)) eqfrac
(scons (undefined::PairOf (Member Nums)) eqnum
(scons (undefined::PairOf (Member Eqs))
(uncurry (==))
snil))) (x,y) False
where eqfrac (x,y) = abs (x - y) < 0.5
eqnum (x,y) = abs (x-y) < 2
If we do that and write
> testeq2 = approx_eq2 (1.0::Double) (1.5::Double)
we get a type error, `Ambiguous Eq b constraint'. Indeed, the use of
(==) in the code above gives rise to an Eq b constraint. That 'b' is
not directly related to the type of the arguments to approx_eq2 -- the
function (==) will be applied only if the guard `PairOf (Member Eqs)'
succeeds. Thus we have no direct way to instantiate the type 'b' in
general, and so the constraint 'Eq b' is left hanging, eventually
leading to a type error.
Second, why can't we define the guard and the corresponding
computation within the same instance? For example, we could have
declared a typeclass
> class GFN n f a pred b | n f a -> pred, n f a -> b where
> gfn :: n -> f -> a -> b
Instances of GFN define a 2-polymorphic function as a set of
clauses. The clauses are `numbered' via the 'n' parameter, which is of
a kind numeral (Z, S Z, etc). Pred is the guard, and the numbering
establishes the order, so we can find the `next' instance if the
guard, applied to the argument 'a', fails (that is, `Apply guard a
HFalse' is derivable). Alas, to find the guard `pred', we have to
select the corresponding instance. We replace a failure with the
sequence of successes, selecting one instance after another, until we
find the one with the succeeding guard. Those instances may come with
the constraints; as we select one instance after another, the
constrains accumulate (forming the conjunction). But these constraints
may be contradictory.
The two failing approaches show the complication we have to struggle
with. A 2-polymorphic function is defined by cases where each case is,
in general, a 1-polymorphic function. Each 1-polymorphic function
comes with a set of constraints. We must make sure that our
2-polymorphic function makes a _disjoint_ union of those constraints
rather than an intersection. Thus, unless we have committed to a
particular 1-polymorphic case (the guard succeeded), we must not
evaluate the corresponding constraints. Our solution has this
property. We notice in the ApproxEq cases above that none of the GFN
instances have any instance constraints. The corresponding Apply
instances do have constraints -- which, fortunately, only take effect
when the guard succeeded and the instance is selected. The name
of the game is to delay the imposition of constraint. We take
advantage of the fact that a label such as ApproxEq _stands_ for a set
of constraints but does not have any actual constraints. We play on
the difference between the notation and denotation.
The need to spread the guard and the corresponding computation across
two instances is less satisfying. Fortunately, the type system watches
for the consistency between the two, that is, the guard admitting no
more types that can be used in the real computation.
It seems appropriate to end this message with 14 lines of
implementation. The complete code with more examples is available at
the URL given above.
> newtype GFn f = GFn f
> newtype GFnA n f = GFnA f
>
> newtype GFnTest n f flag = GFnTest f
>
> instance (GFN Z f a pred, Apply pred a flag,
> Apply (GFnTest Z f flag) a b)
> => Apply (GFn f) a b where
> apply (GFn f) a = apply ((GFnTest f)::GFnTest Z f flag) a
>
> instance Apply (GFnA n f) a b -- guard succeeded
> => Apply (GFnTest n f HTrue) a b where
> apply (GFnTest f) a = apply ((GFnA f) :: GFnA n f) a
>
> instance (GFN (S n) f a pred, Apply pred a flag, -- else chose the next inst
> Apply (GFnTest (S n) f flag) a b)
> => Apply (GFnTest n f HFalse) a b where
> apply (GFnTest f) a = apply ((GFnTest f)::GFnTest (S n) f flag) a