previous   next   up   top

Type equality predicates: from OverlappingInstances to overcoming them

 

Type equality predicates decide if two types are equal and return the result of the comparison as a type-level boolean. A type equality predicate ought to be total for ground types, that is, types with no free type variables. A more general type matching predicate compares the shape of two types, their top-level type constructors, etc.

Type equality predicates are deeply related to overlapping type-class instances. OverlappingInstances is one of the most controversial Haskell extensions, whose semantics is the least clear and whose interactions with existentials, separate compilation and other Haskell features leave lingering doubts of soundness. Yet OverlappingInstances are too useful to remove -- in part because they decide type equalities. Several examples below tell how useful OverlappingInstances really are.

Yet there is a well-behaving, semantically clear way to implement type equalities and general type pattern-matching that not only avoids OverlappingInstances but subsumes them. The code using OverlappingInstances can be simply and mechanically modified to avoid that extension. We thus propose to eliminate OverlappingInstances as redundant.

Doing without OverlappingInstances

 

Forms of type equalities

One can identify several type equality predicates, differing along several axes: behavior on unground types, supported kinds, simple comparison or pattern-matching. Any type equality predicate ought to be decidable for ground types (at least, for simple ground types -- types without any type variables, bound or free).

The most common, and historically the first, type equality predicate TypeEq was introduced in HList. The type checker simplifies the type expression (originally, constraint) TypeEq t1 t2 to the dedicated type HTrue if the type checker considers the types t1 and t2 the same. If the type checker regards t1 and t2 as different, TypeEq t1 t2 is simplified to the dedicated type HFalse. For ground types -- types without free type variables -- TypeEq t1 t2 always simplifies, to either HFalse or HTrue. If the type under comparison contains a free type variable, the type checker may be still be able to decide disequality: for example, when comparing the unground types Maybe a and [b]. Otherwise, the type checker defers the simplification of TypeEq t1 t2, waiting for t1 and t2 become sufficiently instantiated by the context.

The TypeEq predicate is used all throughout the HList and OOHaskell libraries. In fact, TypeEq is as ubiquitous in HList as Eq is ubiquitous in the standard Data.List library. Most operations on type-level collections -- such as look-ups, deletions, replacements, checking for duplicates -- require equality comparisons.

The predicate TypeEqTotal t1 t2 is like TypeEq t1 t2 but always simplifies to either HTrue or HFalse, regardless of how well the types t1 and t2 are instantiated. A type variable is considered equal only to itself. Therefore, TypeEqTotal decides the identity of two (unground) types: TypeEqTotal t1 t2 simplifies to HTrue if and only if t1 and t2 are equal in every possible grounding.

A type equality assertion is not a predicate, but is closely related. Whereas an equality predicate applied to two types simplifies to HTrue or HFalse depending on the success of the comparison, the assertion simplifies to nothing (or, gets discharged) when the types are unifiable, that is, equal or could be made equal by instantiating type variables. If the types are not unifiable the assertion -- and the overall type checking -- fails. If two types can be made equal by instantiating type variables, the assertion does such an instantiation, as a side effect. The type equality assertion was also introduced in HList, under the name TypeCast t1 t2, which nowadays can be written as t1 ~ t2. For ground types t1 and t2, TypeEq t1 t2 ~ HTrue is equivalent to t1 ~ t2 and TypeEq t1 t2 ~ HFalse asserts the disequality of the two types.

One might wish for a predicate version of TypeCast t1 t2, which we call TypeUni t1 t2. If the types t1 and t2 are equal or could be made equal by instantiating type variables, TypeUni t1 t2 simplifies to HTrue and, like TypeCast, applies the unifying substitution to t1 and t2. Otherwise, TypeUni t1 t2 simplifies to HFalse and the type-checking continues. Alas, TypeUni is incoherent. For example, the type expression (TypeUni x y, TypeUni x (), TypeUni y Bool) (assuming previously uninstantiated x and y) simplifies to (HTrue,HTrue,HFalse) if the simplifications are done left-to-right. If the type checker chooses the opposite order, the result will be (HFalse,HTrue,HTrue).

References
HList-ext.pdf [166K]
Strongly typed heterogeneous collections (Expanded version)
Joint work with Ralf Laemmel and Keean Schupke.
Proc. ACM SIGPLAN 2004 workshop on Haskell, pp. 96 - 107.
doi:10.1145/1017472.1017488
Expanded version: CWI Technical report SEN-E0420, ISSN 1386-369X, CWI, Amsterdam, August 2004.
TypeEq is described in Sec 9.

TypeCast: Type improvement constraint, local functional dependencies, and a type-level typecase

Type-level type introspection, equality and matching
The overlapping-free implementation of TypeEQ that also supports wildcards

Reversing Haskell typechecker: converting from undefined to defined
Implementation of TypeEqTotal, which was called TypeEq, as a functional relation. See the section `Equality predicate of type schemas' in the accompanying literal Haskell code.

 

Early implementations of TypeEq

The HList paper, which introduced TypeEq, described its three implementations. We summarize the description for ease of reference.

The naive implementation of TypeEq specifies the predicate by enumerating all the cases, as shown below. It does not scale.

     data HTrue;  data HFalse
     type family TypeEq x y :: *
     
     type instance TypeEq Bool Bool = HTrue
     
     type instance TypeEq Bool Char = HFalse
     type instance TypeEq Char Bool = HFalse
     type instance TypeEq Char Char = HTrue

A better approach is to relate each type to a unique type-level representation that is easy to analyze and compare. For example, we map each type to a type-level natural number, using an auxiliary injective type function TypeCode:

     data Z;  data S n             -- Peano numerals
     
     type family TypeCode x :: *   -- Type -to- Nat
     
     type instance TypeCode ()   = Z
     type instance TypeCode Bool = S Z
     type instance TypeCode Char = S (S Z)
Composite types like Either t1 t2 can be encoded similarly (e.g., with Goedel numbers). The TypeEq predicate then is defined generically, in a single type instance:
     type family TypeEq x y :: *
     type instance TypeEq x y = TypeEqNat (TypeCode x) (TypeCode y)
We assume a type-level function TypeEqNat to compare Peano numerals. To extend the universe of comparable types with a new type, we merely need to add one instance of TypeCode; the definition of TypeEq is unaffected. This approach does scale. We still need to manually coordinate the assignment of TypeCodes across modules and libraries to ensure uniqueness. Goedel numbers are big, opaque, and cumbersome to analyze. Later we introduce a better representation that overcomes these drawbacks and also supports matching.

The `best' implementation of TypeEq, in three lines of code, works for all types, present or declared in the future. In the HList paper, TypeEq was introduced as a functional relation (a multi-parameter type class with a functional dependency). TypeEq could also be defined as a type function (in GHC parlance, `type family'), as we have just shown. The most generic implementation below, however, can be defined only as a functional relation, since it relies on overlapping instances. Type families do not permit overlapping, for a good reason.

     class  TypeEq x y b | x y -> b
     instance               TypeEq x x HTrue
     instance b ~ HFalse => TypeEq x y b
If the type checker can see that two types t1 and t2 are the same so that the first TypeEq instance can be chosen, the constraint TypeEq t1 t2 b simplifies to b ~ HTrue. Otherwise the third argument of TypeEq is unified with HFalse. The word `otherwise' betrays the complementation, `the default case', which is at the core of overlapping instances. We see the first intimations of the usefulness of overlapping instances and of their relation with type equalities.

 

Comparing types by their shape

Many real-life problems call for overloading with special cases for the types of a particular shape. For example, the implementation of vector spaces as functions has to distinguish function types from other, `scalar' types. We need an operation to compare the top-level type constructors of two types, equating, for example Int->Bool with ()->Char because both types have the same top constructor (->). That `distinguishing the types of one shape from the others' is again about the default case and a specific exception. OverlappingInstances are seemingly required; on the other hand, that extension could be impossible to use. In the vector-space example, the signature of the scalar-vector multiplication includes the type of the scalar field, which is uniquely determined by, or associated with, the vector space. Alas, associated data types cannot be used with OverlappingInstances; even functional dependencies are problematic.

Here we resolve the problem by demonstrating how to use OverlappingInstances-based type-constructor equality with associated types. Later on we re-implement type-shape comparison without resorting to OverlappingInstances.

Our implementation here reduces the type-shape comparison to the previously defined TypeEq, applied to `normalized' types. The normalization replaces `insignificant' parts of the composite type with (), as follows. (Alas, the procedure cannot be defined generically for all arities since type classes have fixed kinds.)

     class Normalize t r | t -> r
     
     instance (c ())   ~ r     => Normalize (c x) r
     instance (c () ()) ~ r    => Normalize (c x y) r
     instance (c () () ()) ~ r => Normalize (c x y z) r
     -- add instances for type constructors of higher arities
     instance y ~ x => Normalize x y   -- The default instance
The last instance is the catch-all for all the types not handled by more specific transformations. Again we resort to overlapping instances.
References
ConEQ.hs [2K]
The complete Haskell code with examples
The code was originally posted as Re: topEq for types on the Haskell-Cafe mailing list on Mon, 20 Nov 2006 23:42:02 -0800 (PST)

NotAFunction.hs [4K]
An application of the type shape comparison: distinguishing function types from any other types when implementing vector spaces as function spaces. This code demonstrates the mixing of functional relations, TypeEq and Normalize, with associated type families. The mixing is possible, but difficult and verbose. The difficulty prompts us to seek the implementation of TypeEq without overlapping instances. TypeEq could be defined as a type function then.

Type-level type introspection, equality and matching
Re-implementation of type matching, including the functional vector spaces example, without overlapping instances.

Nicolas Frisby: topEq for types
A message posted on the Haskell-Cafe mailing list on Thu Nov 16 18:59:05 EST 2006. The message motivated comparing types by shape, and described one solution.

How to write an instance for not-a-function: an earlier, ad hoc solution.

 

Special case and the default -- the characteristic of overlapping

Overlapping instances are so practically appealing because they express the common pattern of adding a special case to an existing set of overloaded functions. We illustrate the pattern on a real-life example of optimizing a generic library. The example demonstrates the conflict between two practically useful features, overlapping instances and associated data types. We give a work-around.

In June 2010, Ryan Newton has posed a problem of optimizing a GMap package of finite maps with the common interface and keys of various types. The module GMap defines the type class GMapKey and optimized instances for Int, Int32, pairs, sums, lists, etc. When the key is a pair (Int16,Int16), it is worthwhile to pack the key into a single Int32 value and use the GMap Int32 instance, rather than using the general GMap (a,b) instance for pairs. Overlapping instances are ideal for such specific overrides to general behavior. Alas, the class GMapKey has associated data types and so does not permit overlapping instances.

We demonstrate an ungainly work-around, which moves overlapping into an auxiliary type class with a functional dependency. The type class `classifies' keys into disjoint subsets, with overlapping to distinguish special cases (of keys that could be packed into a single word) from the default case. The result of the classification is handled by the original GMapKey class.

Later we re-implement the example without OverlappingInstances; the work-around will not be needed then.

Version
The current version is June 2010.
References
GMapSpec.hs [5K]
The complete Haskell code with an example
The code was originally posted as Tricks with GMap -- question about conflicts w/ indexed type families on the Haskell-Cafe mailing list on Tue, 8 Jun 2010 19:24:33 -0700 (PDT)

GMapSpecT.hs [6K]
Overlapping-free implementation of the example, in the approach described below.

 

Objections to overlapping instances

With OverlappingInstances, GHC lets us add a special case to an existing set of overloaded functions. Giving the general case and a set of exceptions is truly useful. The examples on this page -- and many more on Hackage -- speak for such mode of programming and hence for overlapping instances. Why are they so controversial?

Overlapping instances are controversial because they straddle a contradiction. They embody ``negation as failure'': the general type class instance is chosen for the given type only when all more specific instances failed to match the type. Negation-as-failure presupposes closed world, or a fixed set of instances. However, type classes are open: the user may add more instances at any time, in the same or different modules. The unlimited specialization is indeed possible, for example, in the following infinite chain of polymorphic types a > [a] > [[a]] > [[[a]]] > ..... When compiling a module, GHC cannot know if a more specific instance exists in another, perhaps not yet written, module. Still, GHC may have to choose an instance when compiling (monomorphic) terms in the module. Since abandoning separate compilation is not an option, GHC employs various heuristics, which could occasionally lead to surprising results. GHC User's Guide, in the section about overlapping instances, describes an example of incoherence -- of different instance choices made in different parts of the program for the same type. The interaction of overlapping instances with existential quantification is another gray area. If a type class includes associated data or types, overlapping is unsound -- as in segmentation-fault unsound. This bodes ill for OverlappingInstances.

Since implementations of overlapping instances are based on heuristics rather than a sound theory, different implementations have different behavior. For example, Hugs treats overlapping in subtly different ways from GHC; therefore, the type equality predicate of HList, based on overlapping, sometimes works in Hugs and sometimes doesn't. HList is hardly useful in Hugs.

References
Simon Peyton-Jones: Fundeps and overlapping instances
A message posted on the Haskell-Cafe mailing list on Wed, 7 Jul 2010 21:14:31 +0000
The message describes many vexing examples of overlapping, including the unsoundness of overlapping with associated types.

The Glorious Glasgow Haskell Compilation System User's Guide, Version 7.0.4. Section 7.6.3.4. Overlapping instances

 

Type-level type introspection, equality and matching

Overlapping instances are avoidable. Here we re-implement the type equality and matching without OverlappingInstances. These type equality predicates later let us get rid of overlapping in the general case.

The scalable, overlapping-free type-level equality predicate has been described already in the HList paper, recalled earlier on this page. The key was relating each type to a unique type-level representation that is easy to analyze and compare. Type-level numerals are one such representation, which suffers from two drawbacks: the difficulty of implementing matching, and the cumbersome manual process to ensure unique representation. We now describe the representation that overcomes the first drawback and alleviates the second. Template Haskell or a bit of GHC support can ensure unique representation automatically.

The new type representation TYPEREP is much easier to work with than Goedel numbers. It is a type-level equivalent of Data.Typeable.TypeRep and can be informally described by the following declarations:

     data TyCon_name -- name associated with each (registered) type constructor
     data LIST a  = NIL | a :/ LIST a
     data TYPEREP = TYPEREP (TyCon_name, LIST TYPEREP)
TYPEREP is however a type-level data-structure. Strictly speaking, we need data kinds to define it. For now, the above declarations should serve as a guidance. Also as an informal guide, we write kind signatures for our type-level functions. Like type signatures of ordinary functions, kind signatures aid understanding. For now, they are informal and have to be given in comments.

To compare type constructor names, we injectively map them to something easily comparable, for example, natural numbers (Data.Typeable uses a similar trick):

     -- TC_code :: TyCon_name -> NAT
     type family TC_code tycon :: *

The total type function TREPEQ compares two TYPEREP for equality. The function is defined below in closed form, once and for all:

     -- TYPEREP -> TYPEREP -> BOOL
     type family TREPEQ x y :: *
     
     type instance TREPEQ (tc1, targ1) (tc2, targ2) =
        AND (NatEq (TC_code tc1) (TC_code tc2))
     	(TREPEQL targ1 targ2)
     
     -- LIST TYPEREP -> LIST TYPEREP -> BOOL
     type family TREPEQL xs ys :: *
     
     type instance TREPEQL NIL NIL      = HTrue
     type instance TREPEQL NIL (h :/ t) = HFalse
     type instance TREPEQL (h :/ t) NIL = HFalse
     type instance TREPEQL (h1 :/ t1) (h2 :/ t2) = 
        AND (TREPEQ h1 h2) (TREPEQL t1 t2)
where the total NatEq :: NAT -> NAT -> BOOL compares two type-level natural numbers. TYPEREP supports matching, or wildcards, represented by a type ANY (with a distinguished TC_code, e.g., zero). The type function TREPEQW :: TYPEREP -> TYPEREP -> BOOL, the variant of TREPEQ, compares two TYPEREP assuming that the ANY type is equal to any other type. A type-level list membership predicate
     -- ((*,*) -> BOOL) -> * -> LIST * -> BOOL
     type family Member f x l :: *
checks for an occurrence of an element in a type-level list, using the given element equality predicate, specified as AC_TREPEQ (for equality) or AC_TREPEQW (for matching).

The injective type function TYPEOF :: * -> TYPEREP, the analogue of Data.Typeable.typeOf, returns the representation of a type. The function is defined by instances. Here are a few:

     data ANY 				-- The wildcard, for matching
     
     data TRN_any
     type instance TC_code TRN_any = Z       -- distinguished for TRN_any
     type instance TYPEOF ANY = (TRN_any, NIL)
     
     data TRN_unit
     type instance TC_code TRN_unit = S Z
     type instance TYPEOF () = (TRN_unit, NIL)
     
     data TRN_maybe
     type instance TC_code TRN_maybe = S (S Z)
     type instance TYPEOF (Maybe a) = (TRN_maybe, (TYPEOF a) :/ NIL)
     
     data TRN_arrow
     type instance TC_code TRN_arrow = S (S (S Z))
     type instance TYPEOF (a -> b) = (TRN_arrow, (TYPEOF a) :/ (TYPEOF b) :/ NIL)

The instances for TYPEOF and TC_code are regular and could be, hopefully, derived by GHC or a similar tool automatically, similar to the deriving of Data.Typeable. For now, we can use Template Haskell to derive the TYPEOF instances for a new data type. For example, for Foo :: *, $(tderive ''Foo) may expand to something like this:

     data TRN_package_name_module_name_Foo
     type instance TC_code TRN_package_name_module_name_Foo = 
      <very long type-level numeral that spells package_name_module_name_Foo>
     type instance TYPEOF Foo = (TRN_package_name_module_name_Foo, NIL)

Long unary numerals spelling full type constructor names are clearly unappealing. Help is on the way: GHC will soon support efficient numeric kinds and type-level strings. Ideally, GHC would support TYPEREP and TYPEOF the way it currently supports Data.Typeable -- deriving TYPEOF, assigning unique TC_codes and efficiently comparing them.

In principle, GHC could even support ground type equality as a built-in type-level function. However, mere equality comparison of types is not enough to eliminate overlapping instances. We need type comparison at all kinds, and we need matching. The TYPEREP approach fulfills all the needs.

In conclusion, the injective type function TYPEOF reifying a type to its representation TYPEREP gives rise to the type equality comparison and matching. Template Haskell or other such automated deriving of TYPEOF instances can ensure the uniqueness of the representation. Numeric and string kinds, coming to GHC, will greatly improve the efficiency of type-level comparison.

Version
The current version is October 2011.
References
TTypeable.hs [6K]
The TTypeable library, defining the type-level representation TYPEREP, type-level TypeOf, and type-level functions for comparison and matching -- along with a small library for higher-order functional programming at the type level. The library uses neither overlapping instances nor functional dependencies. The code includes a few tests.

NotAFunctionT.hs [3K]
An example of type matching: re-implementing NotAFunction.hs, this time without overlapping instances.

 

Overcoming Overlapping

We have described the TTypeable library of type comparison and matching that avoids overlapping instances. It turns out the library lets us eliminate OverlappingInstances altogether. We re-write the earlier code that relied on overlapping into the overlapping-free code, in a regular and almost mechanical way. We gain a sound overlapping of type families and associated data types. The new approach requires no changes to GHC and does not affect the FC calculus or the instance resolution algorithm. It works right now.

We may still define overloaded functions by giving the default general instance and a set of special cases. The contradiction of OverlappingInstances is avoided because the set of special cases is closed.

Our first example is eliminating overlapping in the overloaded function that converts lists to strings, with the special cases for lists of booleans and characters: [True,False] should be converted to the string "10":

     class ShowList a where
        showl :: [a] -> String
     
     -- General instance
     instance Show a => ShowList a where ...
     
     -- Special instances, for lists of specific types
     instance ShowList Bool where ...
     instance ShowList Char where showl = id

To eliminate overlapping, we re-write the example as follows. We rely on the TTypeable library for type introspection, TYPEOF, and for the type-level list membership predicate Member.

     class ShowList a where
        showl :: [a] -> String
     
     -- Enumeration of special cases
     type Special = TYPEOF Bool :/ TYPEOF Char :/ NIL
     
     -- Discriminator
     instance (flag ~ (Member AC_TREPEQ (TYPEOF a) Special), ShowList' flag a)
        => ShowList a where
        showl = showl' (undefined::flag)
     
     -- A replica of the original class and its instances
     class ShowList' flag a where showl' :: flag -> [a] -> String
     
     -- General instance
     instance Show a => ShowList' HFalse a where ...
     
     -- Special instances, for lists of specific types
     instance ShowList' HTrue Bool where ...
     instance ShowList' HTrue Char where showl' _ = id

The single ShowList instance checks to see if the type in question is mentioned in the list of exceptions Special, thus discriminating between the general and special cases. The class ShowList' and its instances are the replica of the original ShowList, with the extra parameter flag, which resolves overlapping. Modulo the dummy flag parameter, the code of ShowList's members is unchanged. Special defines the list of specializations. The members of the list may contain wildcards ANY and so the set of types to specialize among could be infinite -- but their lower specialization bound is fixed. Adding new special instances does not require modifications to the existing instances but does require changes to the code to extend Special, adding TYPEREPs of those special types. The type alias Special may be in a module of its own, along with other configuration parameters.

In general, suppose we have declared class C a with N special, mutually non-overlapping instances, and one catch-all instance C a. To eliminate overlapping, we define an auxiliary class C' flag a with N special and one catch-all instance, all non-overlapping, and a type-level list Special. The original class C will have one, dispatching instance. Our approach thus calls for one extra class, one extra instance and one type-level list. In a more complex case of full overlap

     class C a
     instance C [Int]
     instance C [a]
     instance C a
the members of Special include wildcards. We use matching rather than the simple equality when searching through Special. The rest remains the same.

The approach lets us implement hypothetical closed type instances. Such an extension is frequently mentioned -- but not implemented in GHC. The implementation will be non-trivial, requiring the change in the instance resolution algorithm. Our approach gives closed type families now.

     -- hypothetical code
     type instance F where
      F Int = Bool
      F a = [a]
Here is the complete code emulating the same closed type family with overlapping, in the existing GHC 7.0.4.
     type family F a :: *
     
     type instance F x = F' (TREPEQ (TYPEOF x) (TYPEOF Int)) x
     type family F' flag x
     type instance F' HTrue  x = Bool
     type instance F' HFalse x = [x]
     
     newtype W x = W (F x)
     w1 = W True  :: W Int
     w2 = W "abc" :: W Char
We stress that TREPEQ is the total equality comparison predicate defined in closed form, once and for all.

There have been other proposals to eliminate overlapping instances: closed type families, instance chains, disequality constraints. Only instance chains have been implemented, in an experimental Haskell-like system. None of these proposal have been realized in GHC and none are easily implementable, requiring significant changes to the instance resolution algorithm or System FC, with unknown interactions with other features. All these proposals to sound overlapping also make the set of special cases closed, thus taking the closed-world assumption.

We have described the re-writing procedure that eliminates the use of OverlappingInstances. It is the only presently available way to sound overlapping, requiring no changes to the type checker or its theory. We advocate abolishing overlapping instances, claiming it results in no loss of expressiveness.

Version
The current version is October 2011.
References
ShowList.hs [<1K]
ShowListNO.hs [2K]
The complete code for the introductory example, written with and without overlapping

ExampleD.hs [4K]
Implementing the challenging example posed by David Mazi`eres. The example seemed to require overlapping type constructor instances, and hence *->*-type equality.

GMapSpecT.hs [6K]
Adding a special case to an existing library of overloaded functions: re-implementing the example GMapSpec.hs, this time without overlapping instances.

The described new approach to type equality and overlapping elimination was first discussed on the Haskell' mailing list in June-August 2011.



Last updated January 1, 2012

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML