From oleg-at-okmij.org Tue Jun 5 00:09:02 2007 To: haskell@haskell.org Subject: Smash along your boilerplate; how to traverse a non-existent term Message-ID: <20070605070902.1DFDEAD43@Adric.metnet.fnmoc.navy.mil> Date: Tue, 5 Jun 2007 00:09:02 -0700 (PDT) Status: OR We describe a generalization of the Smash-your-boilerplate generic programming approach to extensible traversal strategies. To the pre-defined gmap, reduction, and two-terms-in-lockstep strategies, the programmer may at any time add a new class of traversals. The present message shows one such extension, lazy gmap, and its application to mapping of terms that do not exist. This lets us implement gminimum and gmaximum -- which traverse `undefined' and produce the fully-defined smallest (resp. largest) term of a desired type: gminimum () :: (Maybe Int,Either Bool (Maybe Char)) ==> (Nothing,Left False) gmaximum () :: (Maybe Int,Either Bool (Maybe Char)) ==> (Just 2147483647,Right (Just '\1114111')) The result of traversing (undefined::(Maybe Int,Either Bool (Maybe Char))) is (Nothing,Left False). This meaningful traversal of non-existing terms is another benefit of Haskell's non-strictness. The complete code is available at http://darcs.haskell.org/generics/comparison/SmashA/ The code includes sample applications and examples, within the generic programming comparison framework by Alexey Rodriguez Yakushev, Alex Gerdes, and Johan Jeuring. The code is tested with GHC 6.4.1 and 6.6. The Smash approach is based on the typecase-like function stapply defined at the beginning of Syb4A.hs -- which expresses (or, reifies) the typeclass selection in Haskell itself: stapply :: spec -> a -> d -> w Given a heterogeneous list |spec| of functions |ai->wi| (where |ai| and |wi| vary from one function to another in the list) and a datum of the type |a|, we check to see if there is a function in that list whose argument type |ai| is equal to |a|. If so, we apply that function to the datum and return its result of the type |wi|. If no such function is found, we return the supplied default value, of the type |d|. Thus the result type |w| of stapply is a (type-level) function of the types |spec|, |a|, and |d|. The function stapply is dual to typeclasses: for example, the following typeclass method > class C a where fn :: a -> Int > instance C Bool where fn x = if x then 10 else 20 > instance C Char where fn x = fromEnum x is dually > data ResFailure > testtyc_fn x = > stapply ((\ (x::Bool) -> if x then (10::Int) else 20) :+: > (\ (x::Char) -> fromEnum x) :+: > HNil) > x > (undefined::ResFailure) > > testtyc_fn1 = testtyc_fn True > testtyc_fn2 = testtyc_fn 'x' > -- testtyc_fn3 = show $ testtyc_fn () -- type error, can't show ResFailure Here ResFailure is an abstract data type with no non-bottom values and no defined operations. An attempt to actually use it will trigger a type error. (One may say that testtyc_fn corresponds to a closed typeclass. That is true in the above definition. However, HLists may be open too). Our most general generic function is |gapp|, which applies a generic function to a term. A generic function is (quite literally) made of two parts. First, there is a term traversal strategy, identified by a _label_ |tlab|. One strategy may be to `reduce' a term using a supplied reducing function (cf. fold over a tree). Another strategy may rebuild a term. The second component of a generic function is |spec|, the list of `exceptions'. Each component of |spec| is a function that tells how to transform a term of a specific type. Exceptions override the generic traversal. The function |gapp| is defined generically. The code below says: first, check to see if any of the exceptions apply. If not, do the generic traversal. > class (STApply spec a df w, LDat tlab spec a df) > => GAPP tlab spec a df w | tlab spec a -> df w where > gapp :: tlab -> spec -> a -> w > > instance (STApply spec a df w, LDat tlab spec a df) > => GAPP tlab spec a df w where > gapp tlab spec x = stapply spec x (gin tlab spec x) The function |gin| does a generic traversal. It (may) invoke |gapp| on the children of the term, if any (and if the traversal strategy calls for traversing the children). This class is similar to the class Data of SYB: > class LDat tlab spec x w | tlab spec x -> w where > gin :: tlab -> spec -> x -> w However, different traversal strategies of Data are methods of that class: gfold, gunfold, gmapT, etc. Here, different strategies are identified by the label |tlab| and so the set of strategies is _extensible_. In SYB, for each new data type we need to define an instance of Data with several methods. Here, for each new data type we need to define several instances of LDat with one method each. The extensible strategies were discovered when generalizing the classes Dat and TDat in the original Smash approach. The file Syb4A.hs defines the following popular strategies - Reconstruct the term: used in generic mapping > data TL_recon = TL_recon - Reduce the term: used in folding over a term > newtype TL_red w = TL_red ([w]->w) - The same as above but the reducer accepts the name of the term's constructor, a string. This is used for generic show > newtype TL_red_ctr w = TL_red_ctr (String -> [w]->w) - Traverse and reduce two terms in lock-step: we traverse the term (Couple x y) where x and y have the same type, and reduce the result using the supplied function 'f'. This is used for generic equality > data Couple a = Couple a a > data TL_red_lockstep w = TL_red_lockstep w ([w] -> w) The TL_recon strategy is the generalization of fmap. Here are a few instances, for a primitive type and products and recursive sums (list). The analogy with fmap should be obvious. It should also be obvious that the instances below can be automatically generated from the definition of a datatype. That is, LDat instances could be automatically derived. > instance LDat TL_recon spec Int Int where > gin _ spec x = x > instance (GAPP TL_recon spec a dfa wa, GAPP TL_recon spec b dfb wb) > => LDat TL_recon spec (a,b) (wa,wb) where > gin tlab spec (x,y) = (gapp tlab spec x, gapp tlab spec y) > instance (GAPP TL_recon spec [a] [w] [w], GAPP TL_recon spec a dfa w) > => LDat TL_recon spec [a] [w] where > gin tlab spec [] = [] > gin tlab spec (x:xs) = (gapp tlab spec x):(gapp tlab spec xs) We can define the generic gmap in one line: > gmap f = gapp TL_recon (f :+: HNil) It takes a function |f::a->b| and a term and returns a term with all values of the type |a| replaced with the corresponding values of the type |b|. The type of the result is computed. When the input term is a list, gmap is the ordinary map. Given a term > termc = (["ab"],(Just 'c', Just 'd')) we can replace all Chars with their Int equivalents > testc1 = gmap ord termc where ord (c::Char) = fromEnum c *SmashA.Syb4A> testc1 ([[97,98]],(Just 99,Just 100)) Alternatively, we can replace only those Chars that occur in Maybe Char > testc2 = gmap maybeord termc > where maybeord (Just (c::Char)) = Right (fromEnum c) > maybeord Nothing = Left 0 *SmashA.Syb4A> testc2 (["ab"],(Right 99,Right 100)) Unlike the regular fmap, we can fuse two traversals > testc3 = gapp TL_recon (ord :+: maybeord :+: HNil) termc > where maybeord (Just (c::Char)) = Right (fromEnum c) > maybeord Nothing = Left 0 > ord (c::Char) = fromEnum c We can even traverse and replace under functions: > testt3 = gmap (\x -> if x then 'a' else 'b') > ([not], (True,('2',[(&&),(||)]))) whose inferred type is *SmashA.Syb4A> :t testt3 testt3 :: ([Bool -> Char], (Char, (Char, [Bool -> Bool -> Char]))) The source code shows equality on nested data types and various reductions (summing all Ints in a term; finding the size of the term; printing the term, treating Strings differently from other lists, etc). The programmer may introduce a new traversal strategy. For illustration, let us define a lazy version of the strategy TL_con. The motivation is determining the smallest or the largest term of a given type. The first stab at such a generic minimum may be gmin x = gmap (\ (x::Bool) -> False) x which replaces all Bools with the smallest Bool in a sample term x. For example, |gmin (True,False)| should give (False,False). The mapping function is clearly non-strict in its argument, so the same result can be achieved by applying gmin to the sample term (undefined::Bool,undefined::Bool). The sample term merely gives the structure of the desired type -- the structure to traverse. One may observe that we don't even need that! We can apply gmin to the sample term that is just (undefined::(Bool,Bool)) -- hence we don't need to supply any real sample term. The key is that (fst (undefined::(Bool,Bool))) is a correct expression, which evaluates to (undefined::Bool). Hence, "undefined" is deconstructible and traversable, standing for a term of any structure we may want. It can be traversed and mapped with real results -- providing the mapping and traversal functions are not strict. The TL_recon strategy above is strict, however, which is apparent from the instance of LDat for pairs: > gin tlab spec (x,y) = (gapp tlab spec x, gapp tlab spec y) That is easy to fix: just replace (x,y) with an irrefutable match ~(x,y). Alas, the problem is more complex in the case of sums (cf the instance LDat for lists above): > gin tlab spec [] = [] > gin tlab spec (x:xs) = (gapp tlab spec x):(gapp tlab spec xs) If we replace patterns with irrefutable match ~[] and ~(x:xs), then the first alternative will always match, and so the second clause will never be effective. We should recognize that sums give us a choice. We should propagate this choice back to the users: we should report the available alternatives, and let the users make the choice. We arrive thus at the following design, implemented in Syb4ABuild.hs in the DARCS repository above. We define a new traversal label > newtype TL_recon_lazy = TL_recon_lazy (forall a. [a] -> a) where the argument of TL_recon_lazy is a user-supplied choice function. The instance for pairs is simple as we don't have any choice here: > instance (GAPP TL_recon_lazy spec a dfa wa, > GAPP TL_recon_lazy spec b dfb wb) > => LDat TL_recon_lazy spec (a,b) (wa,wb) where > gin tlab@(TL_recon_lazy f) spec ~(x,y) = > f [(gapp tlab spec x, gapp tlab spec y)] We do have the choice for sums, e.g., lists: > instance (GAPP TL_recon_lazy spec [a] [w] [w], > GAPP TL_recon_lazy spec a dfa w) > => LDat TL_recon_lazy spec [a] [w] where > gin tlab@(TL_recon_lazy f) spec x = > f [[], > (gapp tlab spec (head x)):(gapp tlab spec (tail x))] The term to traverse may be either an empty list, or non-empty list. In the latter case, we speculatively map its head and tail. We let the user make the choice. Due to non-strictness, we don't actually traverse anything unless it is required. This lets us define generic minimum simply as > __ = error "nonexistent" > gminimum () = r > where r = gapp (TL_recon_lazy head) lst (__ `asTypeOf` r) > lst = (mb (__::Int)) :+: (mb (__::Bool)) :+: (mb (__::Char)) > :+: HNil > mb :: Bounded t => t -> t -> t > mb _ _ = minBound The definition for gmaximum is similar, modulo replacing minBound with maxBound and head with last. We thus obtain the results mentioned at the beginning. Other examples: > test_gmin3 = gminimum () :: (Maybe Int,[Bool]) *SmashA.Syb4ABuild> test_gmin3 (Nothing,[]) > test_gmax1 = take 5 $ gmaximum () :: [Bool] *SmashA.Syb4ABuild> test_gmax1 [True,True,True,True,True] The presence of "take 5" is telling. The largest term of the type [Bool] is the infinite list of True -- which gmaximum faithfully computes. The function gminimum is akin to the de-typechecker since both convert `undefined' to `defined'. De-typechecker produces only polymorphic functions, whereas gminimum yields data (including monomorphic functions).