From oleg-at-okmij.org Thu Aug 10 21:16:34 2006 Date: 10 Aug 2006 21:16:34 -0000 Message-ID: <20060810211634.71649.qmail@eeoth.pair.com> To: haskell@haskell.org Subject: Smash your boiler-plate without class and Typeable Status: RO We describe a new generic programming approach, which is powerful enough to traverse a term and return another term of a different type, determined by the original term's type/structure. The example is a generic function that replaces all Floats with Doubles in an arbitrary term. The approach combines the features of the original `Scratch your boilerplate' (SYB1, presented at TLDI2003) and the third version SYB3 (ICFP05): -- like SYB1, our generic function is an ordinary function rather than a typeclass method. Unlike SYB3, the end user of our library defines no typeclasses. We do not need any extensions like recursive instances either. -- unlike SYB1, we need neither Typeable nor (safe) typecast -- unlike SYB1 and SYB3, we do not need higher-rank types -- like SYB3, the library needs overlapping and undecidable instances. Unlike SYB3, the end user needs no such extensions. In fact, the use of these extensions is confined to a small part of the library, namely, to the implementation of TypeEq. The essence of our approach is the _typelevel_ typecase. It is inspired by the deepest functor, which has http://okmij.org/ftp/Haskell/typecast.html#deepest-functor This message and code have greatly benefited from discussions with Chung-chieh Shan, who also suggested the title. Our generic function is composed of a specific and generic parts. A specific part tells us what to do if the input term happens to be of some particular type. The generic part tells us what to do with all other terms. The specific part is just an HList of ordinary functions, with different argument types. If the type of the input term matches the argument type of one of the functions in the list, we apply that function. If no specific function applies, we do the generic action, implicit in the traversal strategy (for example, apply the generic function to the subterms and reduce the results). Generic programming described in the original SYB1 paper introduced two fundamental ideas: -- generic functions like |gmapQl op seed f term|, which traverses an arbitrary term and applies the user-defined function f to each subterm and reduces the results with a left-associative operator op. -- generic function f that transforms arbitrary (sub)terms of arbitrarily types. That transformation is specified as a composition of specific transformations (which apply only to the values of some specific types) and a generic, default transformation for all other types. In SYB1, the function gmapQl is implemented as a method of a class Data; the SYB library provides instances for all (interesting) types. The function 'f', OTH, is essentially a `typecase' operator, to tell one specific type from the rest. SYB relies on a run-time typecase, depending on the run-time type representation -- which is provided by the typeclass Typeable. The typeclass also has the method `cast' for a safe cast from the value of `generic type' to the value of a specific type. We observe that the typecase, _on the type level_, has always existed in Haskell (although that fact was not perhaps realized until HList). That typecase, which does not need any `cast' operation, is the type equality predicate TypeEq. We consider here two classes of generic functions. The first class of generics takes a term and returns a value of some fixed type. The example is a generic size function, counting the number of data constructors in a term with the specific size measure assigned to values of particular types, e.g., strings. The second class of generics takes a term and returns a potentially different term -- in value as well as in type. For example, replacing Floats with Doubles or vice versa, or replacing all tuples of integers in a complex term with a 2-element array. Our first example uses a generic query function gmapq, which recursively traverses a term in the depth-first order, and applies reducer to the list of term's children. First, we define a function that counts the number of data constructors and values of primitive types in any term: > gsize a = gmapq SNil (\l -> 1 + sum l) a > test1 = gsize (1::Int) -- 1 > test2 = gsize [1::Int,2,3] -- 7 == 3 integers + three (:) + one [] > test3 = gsize "abc" -- 7, as above, with Char instead of Int > test4 = gsize ["abc"] -- 9: one extra (:) and one extra [] Now, we shall treat strings specially: we assign each string size 999. We only need a small modification to gsize': > gsize' a = gmapq (SCons (\ (_::String) -> (999::Int)) SNil) (succ . sum) a > test1' = gsize' (1::Int) > test2' = gsize' [1::Int,2,3] -- 7 > test3' = gsize' "abc" -- 999 > test4' = gsize' ["abc"] -- 1001 In a different example, we check if a given data structure contains somewhere the letter 'a' (or an integer with the 'a' code): > hasa a = gmapq (SCons (== 'a') (SCons (== (fromEnum 'a')) SNil)) or a > testh = (hasa ('x',False), > hasa ('x',97::Int), hasa 'a', hasa [[["cde"],["abc"]]]) > -- (False,True,True,True) For generic term transformations, we take the following term: > term1 = ([1::Int,2], (True,('2',[(3::Int,4::Int)]))) Let us increment all integers in that term: > inci a = gtmapq (HCons (\ (x::Int) -> x + 1) HNil) a > testi1 = inci term1 -- result: ([2,3],(True,('2',[(4,5)]))) Let us replace all Int tuples (x,y) with an array [x,y], and negate all booleans > p2l a = gtmapq (HCons (\ (x::Int,y) -> [x,y]) (HCons not HNil)) a > test_p2l = p2l term1 -- result: ([1,2],(False,('2',[[3,4]]))) Let us replace an Int with a Double everywhere > i2d a = gtmapq (HCons (fromIntegral::Int->Double) HNil) a > test_i2d = i2d term1 -- result: ([1.0,2.0],(True,('2',[(3.0,4.0)]))) The complete implementation along with the examples is available at http://okmij.org/ftp/Haskell/syb4.hs The following is the key part of the implementation of gmapq. First, we introduce a heterogeneous list of functions a->w where all functions in the list have the same return type w. The argument type 'a' varies from one list member to another. The property that each function in the list has the same result type is guaranteed by the construction of the datatype. > data SNil w = SNil > data SCons a b w = SCons (a->w) (b w) Given the datum of the type 'a', we check if any function in the list 'spec w' has the argument type 'a' and so can be applied. If there is such a function, we apply it. Otherwise, we return the default, the third argument of sapply. > class SApply spec a where > sapply :: spec w -> a -> w -> w > > instance SApply SNil a where > sapply _ _ deflt = deflt > > instance (TypeEq a a' bf, SApply' bf (SCons a' r) a) > => SApply (SCons a' r) a where > sapply = sapply' (undefined::bf) > > class SApply' bf spec a where > sapply' :: bf -> spec w -> a -> w -> w > > instance SApply' HTrue (SCons a r) a where > sapply' _ (SCons p _) x _ = p x > > instance SApply r a => SApply' HFalse (SCons a' r) a where > sapply' _ (SCons _ r) x deflt = sapply r x deflt