Data.Data
module of the GHC base.
Our gmap
is optimal: the parts of the data
type that are not affected by mapping are not even traversed. The code
has not a single unsafe operation. Furthermore, if the function to map
is total and Data.Data
instances are properly implemented, gmap
is total. Although written in the parametricity-breaking
framework SYB
that relies on run-time type-introspection,
our gmap
is parametric.Generic map is the generalization of the list map and related functions:
map :: (a->b) -> [a] -> [b] map2 :: (a->b) -> [[a]] -> [[b]] mapl :: (a->b) -> Either a w -> Either b w mapx :: (a->b) -> Tree (Maybe (a,Int)) -> Tree (Maybe (b,Int)) ...All these diverse mapping operations become particular instances of a single function
gmap
. It is also a generic version of Functor
's fmap
:class Functor c where fmap :: (a->b) -> c a -> c bNormally, we have to write code for
fmap
every time we define a new
instance of Functor
. The generic map is written once and for all, for any
data type that is a member of Data.Data
. The gmap
code is the
`boilerplate', which expresses in an abstract way what it means to
map an (a->b)
function over a data type.
For a long time, it was considered impossible to write gmap
in SYB.
Although SYB has many functions whose names contain `map', none of
them generalize List.map
. At first blush, the generic fold gfold
,
a generalization of list fold, lets us easily define something quite close
to a generic map, namely
gmappish :: (Data a, Data (c a)) => (a -> a) -> c a -> c a gmappish f x = traverse x where traverse :: Data a => a -> a traverse x = unID $ gfoldl k z x z = ID k (ID ca) b | Just fb <- castfn f = ID (ca (fb b)) k (ID ca) b = ID (ca (traverse b)) newtype ID a = ID{unID :: a} castfn :: (Typeable a, Typeable b, Typeable c, Typeable d) => (a -> b) -> Maybe (c -> d) castfn f = cast fThe operation
gfold
traverses the immediate subcomponents of a value.
If a subcomponent can be passed as an argument to the transforming function f
(i.e., the types match), it is transformed.
Otherwise, we traverse it recursively. The function gmappish
does seem to act like gmap
when f
is type-preserving. Alas, there are two problems: gmappish
does not
generalize to mappings that do change the type of the transformed value.
Furthermore, gmappish
is not truly a map: it is not parametric.
The generalization failure is easy to see, from the type of gfoldl
on
which gmappish
is based:
gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> a -> c aThe function
gfoldl
folds over a value of type a
and produces
the result c a
, with the type constructor c
providing the context
of the traversal. Crucially the result is parameterized by the type
of the value to fold. For generic map however we need the result of
a different type, related to the input type in complex ways, see map2
, mapl
, mapx
signatures before. Generic fold gfoldl
is
not generic enough.
The non-parametricity of gmappish
is subtle, so we explain it
on the example of mapping over a Tricky
data type:
data Tricky a = Tricky a Int tricky_mappish :: Data a => (a->a) -> Tricky a -> Tricky a tricky_mappish = gmappish trickish1 = tricky_mappish not (Tricky True 1) -- Tricky False 1 trickish2 = tricky_mappish (+1) (Tricky (0::Int) 1) -- Tricky 1 2Here
tricky_mappish
is just an instance of gmappish
to map over Tricky a
; for example, trickish1
maps not
over Tricky Bool
with the expected result. The example trickish2
maps successor over Tricky Int
-- and so it increments every Int
it can find in the
data type, whether that Int
is an instance of the type parameter of Tricky
, or an unrelated Int
component. Hence gmappish
has a
`side-effect' of transforming more components of the data type than
intended. It cannot be reasoned as fmap
at all: gmappish f . fmap g
/= fmap (f . g)
. It is not parametric and breaks fusion laws.
The two key insights of our solution address the two problems of gmappish
. First is the inability of gfoldl
to express mappings
that change the type of the result. A
serialize-transform-deserialize trick gets around this
problem. One may think of gmap
as writing its argument to a file,
transforming the file systematically and reading the result back, at
possibly a different type. Generic show and read are well-supported
by SYB. Of course we do not actually write to any file; our
`serialization':
type Serialized = (Constr,[Dyn]) data Dyn = forall a. Data a => Dyn a data Kids a = Kids{growUp:: [Dyn]} serialize :: Data a => a -> Serialized serialize x = (toConstr x, growUp $ gfoldl k (const (Kids [])) x) where k (Kids l) a = Kids (l ++ [Dyn a])`writes' a value as the data constructor descriptor followed by the list of its arguments, which are lazily `serialized' (that is, wrapped in
Dyn
). The deserialization step, written
in terms of gunfold
, reassembles the
value from its possibly transformed subcomponents. The function gmapt
(described below) maps over the subcomponents. Since gmapt
needs a template, to be discussed next, the deserialization
handles the serialized value and the template in parallel.data UnfldStateT a = UnfldStateT a [Dyn] [Dyn] rebuild :: (Data a, Data b, Data t) => (a->b) -> Serialized -> Serialized -> t rebuild f (tcon, tkids) (con, kids) = case gunfold k (\g -> UnfldStateT g tkids kids) con of UnfldStateT a [] [] -> a where k (UnfldStateT ca (tkid:tkids) ((Dyn kid):kids)) = UnfldStateT (ca (gmapt f tkid kid)) tkids kids
The compositionality problem of gmappish
comes from its promiscuity:
when the polymorphic gmappish :: (a->a) -> c a -> c a
is
instantiated (say, to a concrete type Int
for a
and a concrete Tricky
for c
), it tries to map any component whose type happens to be Int
, whether that Int
came from instantiating a
for Int
, or
not. The solution is to distinguish the two, using a template, which
is an instance c X
for some private type X
. The mapping function
traverses the value (of the type Tricky Int
, say), and the template
(of the type Tricky X
) in parallel. When we encounter an Int
component
and wish to transform it, we check if the corresponding component in the
template has the type X
. This ``X marks the spot'' idea was first
described by Claus Reinke. We develop it further: our template is ephemeral,
that is, undefined
.
We are only interested in its type and the type of its components. Therefore,
our approach avoids any unsafe operations. We build the template
incrementally, as we further traverse the real value and learn its structure
layer-by-layer. Here is the rest of the gmap
code. The workhorse is gmapt
, which takes, in addition to the value to transform, its template,
of the type Dyn
.
gmap :: forall a b c . (Data a, Data b, Data (c a), Data (c b), Data (c X)) => (a -> b) -> c a -> c b gmap f = gmapt f (Dyn (undefined::c X)) data X = X deriving (Data,Typeable) gmapt :: (Data a, Data b, Data x, Data y) => (a -> b) -> Dyn -> x -> y gmapt f trep = maybe (\x -> traverse (trep,x)) ifmarked $ castfn f where hasmark :: Dyn -> Bool hasmark (Dyn x) = typeOf x == typeOf X -- The gmapped value x has the right type to be transformed by 'f' -- We do the transformation only if 'x' has the mark ifmarked f x | hasmark trep = f x ifmarked f x = traverse (trep,x) -- optimization: t has no mark, there is nothing to map under it traverse (Dyn t,x) | typeOf t == typeOf x = maybe (error "traverse1") id $ cast x traverse (Dyn t,x) | (tcon,tkids) <- splitTyConApp (typeOf t), (con,kids) <- splitTyConApp (typeOf x), not (length tkids == length kids && tcon == con) = error $ unwords ["template type", show (typeOf t), "inconsistent with value type", show (typeOf x)] traverse (Dyn t,x) = rebuild f (serialize t1) xdyn where xdyn@(con,kids) = serialize x t1 = fromConstr con `asTypeOf` t -- build the ephemeral template
The code has many internal self-consistency checks that should never fire.
The code includes an optimization: if the template (after removing Dyn
)
has the same type as the mapped value, there is no X
in there and hence
there is nothing to map. Therefore, we immediately return the value
as it is, with no transformation or even traversal.
In conclusion, we have presented a surprising application of Data
/Typeable
to define Functor's fmap
generically, once and
for all. As behooves to the generic fmap
, our gmap
is total (for
the total mapping function) and parametric. It is also optimal,
traversing only the needed parts of the value. Our approach is a
synthesis of a fake serialization-deserialization with an sham `X
marks the spot' template. Claus Reinke has summarized the approach
aptly: ``if I had followed my own choice of metaphor to the end, I
would have known that X marks the spot only on a map, not on the real
thing (or else everyone could find the treasure..). Your shallow,
incrementally built not-quite copy is just another representation of a
map, and since both map and X are hidden from users of the function,
everyone is happy.''.
I greatly appreciate many critical and inspiring discussions with Claus Reinke and Alexey Rodriguez-Yakushev on the Hs-Generics mailing list in June-July 2008.
Threads entitled ``gMap in SYB1'' and ``compositional gMap in SYB1'' on the Hs-Generics mailing list, July 2008.
< http://www.haskell.org/pipermail/generics/2008-July/000349.html >
< http://www.haskell.org/pipermail/generics/2008-July/000362.html >
Claus Reinke: Traversable Functor Data,or: X marks the spot
Message posted on the Hs-Generics mailing list on Tue Jun 24 14:45:18 EDT 2008
< http://www.haskell.org/pipermail/generics/2008-June/000343.html >
That message proposed the idea of `X marking the spot'.
It has been further developed in gmap
, specifically to avoid
any unsafe operations.
Ralf LĂ¤mmel and Simon L. Peyton Jones: Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming TLDI 2003, pp. 26-37
< http://www.research.microsoft.com/~simonpj/papers/hmap/ >
This is the paper that introduced SYB.
Claus Reinke: Re: compositional gMap in SYB1
Message posted on the Hs-Generics and Haskell-Libraries mailing lists on Tue Jul 29 15:25:02 EDT 2008
< http://www.haskell.org/pipermail/generics/2008-July/000367.html >
The message turns gmap
into a generic traverse, that is,
Traversable.traverse
implemented once and for all.
Data.Data
, which is
at the core of the generic programming framework SYB. Data.Data
and
the related Data.Typeable
are part of the GHC base. As an
illustration of our approach we answer Wadler's and Clayden's
questions.
Reifying a data type may seem easy. Given a value of the data type, we
traverse it with gMapQ
to find out all its components and their
types. For example, the traversal of [(Just 'a', True)]
might
produce the tree
TData "[((Maybe Char),Bool)]" [TCon "(:)" [TData "((Maybe Char),Bool)" [TCon "(,)" [TData "Maybe Char" [TCon "Just" [TData "Char" []]], TData "Bool" [TCon "True" []]]], TCon "[]" []]]What if the only values we have are
[(Nothing, True)]::[(Maybe Char,Bool)]
or []::[(Maybe Char,Bool)]
? Given just the type Maybe Char
,
with no concrete value, we have to determine the data constructors, Just
and Nothing
, and the types of their arguments, and their
data constructors, etc. The question still seems easy: Data.Data
offers the function dataTypeOf :: Data a => a -> DataType
that describes the data type a
without requiring any concrete value
of it (in other words, dataTypeOf
does not evaluate its argument).
The datatype description has the name of the data type, its sort
(algebraic, Integer
, Float
, etc.) and the names of its constructors.
Alas, the description gives just the names of data constructors --
but not the number and the types of their
arguments. In other words, dataTypeOf
gives a shallow description
of the data type, with too little information to construct its graph.
Fortunately, Data.Data
has another way to describe a data type
without needing its value:
gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c aPer se,
gunfold
does only the shallow traversal, but it can be applied
recursively. Its signature is a bit strange. Let's
look at one of its instances, when a
is Maybe Char
:gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c (Maybe Char) gunfold k z c | c == constructor_Nothing = z Nothing gunfold k z c | c == constructor_Just = k (z Just)In the last line,
k
receives z Just
of the type c (b->r)
where r
is Maybe Char
and b
is Char
; b
is exactly what we need:
the argument type of the constructor Just
. Not only k
, which we
provide, finds out the constructor argument type -- it is assured
by the gunfold
signature that the type is a member of Data
and
hence can be introspected and gunfold
ed in turn.
That is the general idea of our solution. It is implemented
via a pair of mutually recursive functionsdescribe_type' :: Data a => IgnoreTypes -> SeenTypes -> CTTree a describe_ctor :: Data a => IgnoreTypes -> SeenTypes -> Constr -> CCList a type SeenTypes = [TypeRep] type IgnoreTypes = [TypeRep]The former finds out the names of constructors and the latter describes the arguments of a given constructor. Together, they reify a data type
a
as
a term of the following structuredata TRepTree = TData TRep [TCon] | BackRef TypeRep data TRep = forall a. Data a => TRep a data TCon = TCon Constr [TRepTree] -- a data constructor and its argsWhereas
TRep
tells the data type name, sort, and the names of its
constructors, TCon
describes one data constructor, named by Data.Data.Constr
, with all its arguments.
Since a data type can be recursive the representation may contain cycles,
which are encoded with BackRef
. Our functions have to keep track
of the already described data types (the argument SeenTypes
) and
build BackRef
accordingly. Finally, we let the user prevent
the traversal of particular data types. For example,
one may wish to treat [Char]
as an atomic data type of strings rather than
as a list of characters. The function describe_type
puts it all
together:describe_type :: forall a. Data a => IgnoreTypes -> a -> TRepTree describe_type ignore _ = tree where CTTree tree = describe_type' ignore [] :: CTTree a
The data type from the initial example is described as follows, with the
result shown in the comments. Compared to the description at the beginning
of the article, the one below has more detail: we now see all data
constructors of Bool
and of Maybe Char
.
t1 = describe_type [] ([] :: [(Maybe Char, Bool)]) {- TData [((Maybe Char),Bool)] [TCon [] [], TCon (:) [TData ((Maybe Char),Bool) [TCon (,) [TData Maybe Char [TCon Nothing [], TCon Just [TData Char []]], TData Bool [TCon False [],TCon True []]]], BackRef [((Maybe Char),Bool)]]] -}
Once the TRepTree
is constructed, answering Wadler's and Clayden's
questions is easy. The former was asking if a data type has a
recursive subcomponent; if not, its maximum depth. For example, Pair
Int (Either Int Int)
has depth 3. The depth computation is the
standard depth-first traversal of a TRepTree
accumulating the
depth maximum over all branches. If we encounter a BackRef
, the
data type is recursive and the depth is infinity. Clayden's question,
of picking out all newtype descriptions, is also answered by the
traversal.
We have seen that the often-mysterious gunfold
tells us about
arguments of a data constructor just from its name and its data type,
with no concrete value at hand. Moreover, gunfold
can be applied
recursively on the arguments. In the result, we reify a data type
as a TRepTree
term, the graph of the data type. We have answered
Wadler's general question of type introspection:
``whether the information about types could be turned into a graph
and manipulated as such''.
Our polymorphic variants are literally open co-products: dual of extensible polymorphic records of HList. The duality is negation:
NOT (A | B) -> (NOT A & NOT B)This implication is a part of the deMorgan law that holds both in classical, and, more importantly, in intuitionistic logics. Our encoding of sums is the straightforward Curry-Howard image of that law.
Our implementation of polymorphic variants in terms of HList records uses no type classes, no type-level programming or any other type hacking. In fact, there are no type annotations, type declarations, or any other mentioning of types, except in the comments.
open-coproduct.txt [4K]
Polymorphic variants (extensible, recursive sums) with HList
The message posted on the Haskell mailing list on Sun, 2 Jul 2006 18:35:12 -0700 (PDT)
< http://guppy.eng.kagawa-u.ac.jp/~kagawa/PVH >
Koji Kagawa: Polymorphic Variants in Haskell
Presented at the Haskell Workshop 2006
This paper proposes a quite different solution to the expression problem, encoding constructors of open `data types' as class methods. Our example is patterned after the running example of this paper.
The code files below closely follow the layout of the running example in Laemmel and Peyton-Jones' ICFP2005 talk. The comments describe which part of the code is the `Library', which part of the code is the (data-structure--independent) `generic function', and which part of the code overrides the general processing of the generic function. The code has two examples; one of them is gsize of the original SYB3 paper.
Our variation of SYB3 takes advantage of the fact that
type-class-bounded and higher-rank polymorphisms can be partially
traded for each other. To be more precise, we use type codes for
functions and dictionaries. The type code must then be interpreted by a type
class similar to Apply
of HList (cf. HMap
).
This extra interpretation step is the main conceptual deviation
from SYB3, largely taking the important idea of function combinators
out of it.
dat2.hs [4K]
With a little bit of CPS, we now eliminate recursive instance dependency even for recursive datatypes. This is our final solution.
Re: Scrap your boilerplate and Variants
The discussion thread started from a message posted on the Haskell-Generics mailing list on Tue Oct 24 23:28:24 EDT 2006
Our inspiration, SYB3: Scrap your boilerplate with class: extensible generic functions by Ralf Laemmel and Simon Peyton-Jones. In: Proceedings of ICFP 2005.
SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the `generic type' to the specific type. We observe that the typecase, at the type level, has always existed in Haskell: it is the type equality predicate TypeEq
of HList.
Our generic functions, like those of SYB1, are ordinary functions. Since the transformation for each subterm is chosen at compile-time, however, we do not need any casts, run-time type representations, or Typeable. We do not need higher-ranked types either. Also unlike SYB1, the set of traversal strategies is extensible: To the pre-defined strategies for gmap, gshow, gfold and geq, the programmer may at any time add a new class of traversals.
Our most general generic function is gapp
, which applies a generic function to a term. A generic function is, quite literally, made of two parts. First is a term traversal strategy, identified by a label. One strategy may be to `reduce' a term using a supplied reducing function (cf. fold over a tree). Another strategy rebuilds a term. The second component of a generic function is spec
, the HList of `exceptions', or ad hoc transformations. Each element of spec
is an ordinary function that tells how to transform a term of a specific type. Exceptions override the generic traversal: If the type of the input term matches the argument type of one of the functions in spec
we apply that function. If no specific function applies, we do the generic action implicit in the traversal strategy.
The algorithm that selects for each input subterm the appropriate transformer is tantamount to overloading resolution algorithm. Smash thus implements typeclass instance selection algorithm in the typechecker itself. The messages below describe the implementation of typeclasses in terms of Smash, with an additional flexibility of reordering, inspecting and deleting `instances'.
smash-first.txt [8K]
Smash your boiler-plate without class and Typeable
The message describing the original Smash approach, posted on the Haskell mailing list on 10 Aug 2006 21:16:34 -0000
smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The improved and generalized approach, described in the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)
< http://www.haskell.org/haskellwiki/Applications_and_libraries/Generic_programming/Smash >
`Smash' on Haskell Wiki
undefined
and produce the fully-defined smallest
(resp. largest) term of the desired type:gminimum () :: (Maybe Int,Either Bool (Maybe Char)) -- ==> (Nothing,Left False) gmaximum () :: (Maybe Int,Either Bool (Maybe Char)) -- ==> (Just 2147483647,Right (Just '\1114111'))
The result of traversing
undefined::(Maybe Int,Either Bool (Maybe Char))
is
(Nothing,Left False)
. We take advantage of the fact that undefined
stands for a term of any structure and is deconstructible and
traversable. Haskell's non-strictness lets us
meaningfully traverse non-existing terms.
Traversing non-existing sums such as
undefined::Either Int Char
requires special attention.
Sums give us a choice. We should report the available alternatives,
and let the users make the choice of a component of the sum.
The implementation of gminimum
always chooses the first
available alternative; gmaximum
chooses the last available.
The function gminimum
is akin to the de-typechecker since
both convert `undefined' to `defined'. De-typechecker produces only
polymorphic functions, whereas gminimum
yields data, including monomorphic
functions.
smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The explanation, described in the second half of the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)
Serialization can be written generically, with generic fold. The following Smash code says the result of serializing an object is the list of all its primitive fields.
serialize xs = gapp (TL_red concat) primitive_fields_show xs primitive_fields_show = (\ (s::Int) -> [show s]) :+: (\ (s::Float) -> [show s]) :+: (\ (s::String) -> [s]) :+: HNil -- more can be added in the futureFor example, for the following data type of a company and a sample term (from the famous example of SYB papers)
data Company = C [Dept]; data Dept = D Name Manager [Unit] data Unit = PU Employee | DU Dept data Employee = E Person Salary; data Person = P Name Address data Salary = S Float; type Manager = Employee type Name = String; type Address = String genCom = C [D "Research" (E (P "Laemmel" "Amsterdam") (S 8000.0)) [PU (E (P "Joost" "Amsterdam") (S 1000.0)), PU (E (P "Marlow" "Cambridge") (S 2000.0))], D "Strategy" (E (P "Blair" "London") (S 100000.0)) []]
serialize genCom
produces the list["Research","Laemmel","Amsterdam","8000.0","Joost","Amsterdam", "1000.0","Marlow","Cambridge","2000.0","Strategy","Blair", "London","100000.0"]
Hugh Perkins has posed the problem of writing a generic deserializer, which should work for terms of any structure in the domain of a generic programming library. His particular application is complex terms representing XML documents. The deserializer can indeed be written generically: in the first approximation, it is a generic monadic map over a term. It takes a term, the prototype, and the list of serialized primitive fields as a monadic state. The deserializer traverses the prototype replacing each primitive field with the corresponding value from the monadic state. For example, assuming the following list of serialized fields:
retro = ["Metaphysics", "Kant","Koeningsberg","800.0", "Hume","Edinburgh","100.0", "Marlowe","Cambridge","200.0", "Ruling","Thatcher","London","50000.0"]and
genCom
as the prototype, deserialize genCom retro
`upgrades' the company giving us the termC [D "Metaphysics" (E (P "Kant" "Koeningsberg") (S 800.0)) [PU (E (P "Hume" "Edinburgh") (S 100.0)), PU (E (P "Marlowe" "Cambridge") (S 200.0))], D "Ruling" (E (P "Thatcher" "London") (S 50000.0)) []]
In the discussion thread, Jeremy Gibbons has pointed out that McBride and Paterson's idiomatic traverse subsumes monadic gmap. See his and Oliveira's paper below for the detailed discussion.
Hugh Perkins et al. Thread on the Hs-Generics mailing list, June 26-July 3, 2007.
Jeremy Gibbons and Bruno C. d. S. Oliveira. The essence of the Iterator pattern.
< http://web.comlab.ox.ac.uk/jeremy.gibbons/publications/#iterator >
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML