Data-Generic and Data-Extensible Programming in Haskell

Seemingly impossible generic map in SYB
Polymorphic variants: solving the expression problem
A variation on the theme of SYB3
Smash along your boilerplate
Generic traversal of non-existent terms
Generic de-serialization
Reify a data type as a graph term

Seemingly impossible generic map in SYB

We present the implementation of generic map in the generic programming framework SYB -- which for a long time was considered impossible. We use only the operations provided by the Data.Data module of the GHC base. Our gmap is optimal: the parts of the data type that are not affected by mapping are not even traversed. The code has not a single unsafe operation. Furthermore, if the function to map is total and Data.Data instances are properly implemented, gmap is total. Although written in the parametricity-breaking framework SYB that relies on run-time type-introspection, our gmap is parametric.

Generic map is the generalization of the list map and related functions:

     map   :: (a->b) -> [a] -> [b]
     map2  :: (a->b) -> [[a]] -> [[b]]
     mapl  :: (a->b) -> Either a w -> Either b w 
     mapx  :: (a->b) -> Tree (Maybe (a,Int)) -> Tree (Maybe (b,Int))
     ...

All these diverse mapping operations become particular instances of a single function gmap. It is also a generic version of Functor's fmap:

     class Functor c where fmap :: (a->b) -> c a -> c b

Normally, we have to write code for fmap every time we define a new instance of Functor. The generic map is written once and for all, for any data type that is a member of Data.Data. The gmap code is the `boilerplate', which expresses in an abstract way what it means to map an (a->b) function over a data type.

For a long time, it was considered impossible to write gmap in SYB. Although SYB has many functions whose names contain `map', none of them generalize List.map. At first blush, the generic fold gfold, a generalization of list fold, lets us easily define something quite close to a generic map, namely

     gmappish :: (Data a, Data (c a)) => (a -> a) -> c a -> c a
     gmappish f x = traverse x
      where
      traverse :: Data a => a -> a
      traverse x = unID $ gfoldl k z x
      z = ID
      k (ID ca) b | Just fb <- castfn f = ID (ca (fb b))
      k (ID ca) b = ID (ca (traverse b))
     
     newtype ID a = ID{unID :: a}
     
     castfn :: (Typeable a, Typeable b, Typeable c, Typeable d) =>
               (a -> b) -> Maybe (c -> d)
     castfn f = cast f

The operation gfold traverses the immediate subcomponents of a value. If a subcomponent can be passed as an argument to the transforming function f (i.e., the types match), it is transformed. Otherwise, we traverse it recursively. The function gmappish does seem to act like gmap when f is type-preserving. Alas, there are two problems: gmappish does not generalize to mappings that do change the type of the transformed value. Furthermore, gmappish is not truly a map: it is not parametric.

The generalization failure is easy to see, from the type of gfoldl on which gmappish is based:

     gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) ->
               (forall g. g -> c g) -> a -> c a

The function gfoldl folds over a value of type a and produces the result c a, with the type constructor c providing the context of the traversal. Crucially the result is parameterized by the type of the value to fold. For generic map however we need the result of a different type, related to the input type in complex ways, see map2, mapl, mapx signatures before. Generic fold gfoldl is not generic enough.

The non-parametricity of gmappish is subtle, so we explain it on the example of mapping over a Tricky data type:

     data Tricky a = Tricky a Int
     
     tricky_mappish :: Data a => (a->a) -> Tricky a -> Tricky a
     tricky_mappish = gmappish
     
     trickish1 = tricky_mappish not (Tricky True 1)
     -- Tricky False 1
     
     trickish2 = tricky_mappish (+1) (Tricky (0::Int) 1)
     -- Tricky 1 2

Here tricky_mappish is just an instance of gmappish to map over Tricky a; for example, trickish1 maps not over Tricky Bool with the expected result. The example trickish2 maps successor over Tricky Int -- and so it increments every Int it can find in the data type, whether that Int is an instance of the type parameter of Tricky, or an unrelated Int component. Hence gmappish has a `side-effect' of transforming more components of the data type than intended. It cannot be reasoned as fmap at all:

gmappish f . fmap g
/= fmap (f . g)

. It is not parametric and breaks fusion laws.

The two key insights of our solution address the two problems of gmappish. First is the inability of gfoldl to express mappings that change the type of the result. A serialize-transform-deserialize trick gets around this problem. One may think of gmap as writing its argument to a file, transforming the file systematically and reading the result back, at possibly a different type. Generic show and read are well-supported by SYB. Of course we do not actually write to any file; our `serialization':

     type Serialized = (Constr,[Dyn])
     data Dyn = forall a. Data a => Dyn a
     
     data Kids a = Kids{growUp:: [Dyn]}
     serialize :: Data a => a -> Serialized
     serialize x = (toConstr x, growUp $ gfoldl k (const (Kids [])) x)
       where k (Kids l) a = Kids (l ++ [Dyn a])

`writes' a value as the data constructor descriptor followed by the list of its arguments, which are lazily `serialized' (that is, wrapped in Dyn). The deserialization step, written in terms of gunfold, reassembles the value from its possibly transformed subcomponents. The function gmapt (described below) maps over the subcomponents. Since gmapt needs a template, to be discussed next, the deserialization handles the serialized value and the template in parallel.

     data UnfldStateT a = UnfldStateT a [Dyn] [Dyn]
     
     rebuild :: (Data a, Data b, Data t) =>
                (a->b) -> Serialized -> Serialized -> t
     rebuild f (tcon, tkids) (con, kids) = 
            case gunfold k (\g -> UnfldStateT g tkids kids) con of
              UnfldStateT a [] [] -> a
       where k (UnfldStateT ca (tkid:tkids) ((Dyn kid):kids)) = 
                 UnfldStateT (ca (gmapt f tkid kid)) tkids kids

The compositionality problem of gmappish comes from its promiscuity: when the polymorphic gmappish :: (a->a) -> c a -> c a is instantiated (say, to a concrete type Int for a and a concrete Tricky for c), it tries to map any component whose type happens to be Int, whether that Int came from instantiating a for Int, or not. The solution is to distinguish the two, using a template, which is an instance c X for some private type X. The mapping function traverses the value (of the type Tricky Int, say), and the template (of the type Tricky X) in parallel. When we encounter an Int component and wish to transform it, we check if the corresponding component in the template has the type X. This ``X marks the spot'' idea was first described by Claus Reinke. We develop it further: our template is ephemeral, that is, undefined. We are only interested in its type and the type of its components. Therefore, our approach avoids any unsafe operations. We build the template incrementally, as we further traverse the real value and learn its structure layer-by-layer. Here is the rest of the gmap code. The workhorse is gmapt, which takes, in addition to the value to transform, its template, of the type Dyn.

     gmap :: forall a b c .
             (Data a, Data b, Data (c a), Data (c b), Data (c X)) => 
             (a -> b) -> c a -> c b
     gmap f = gmapt f (Dyn (undefined::c X))
     
     data X = X deriving (Data,Typeable)
     
     gmapt :: (Data a, Data b, Data x, Data y) => (a -> b) -> Dyn -> x -> y
     gmapt f trep = maybe (\x -> traverse (trep,x)) ifmarked $ castfn f
      where
       hasmark :: Dyn -> Bool
       hasmark (Dyn x) = typeOf x == typeOf X
     
       -- The gmapped value x has the right type to be transformed by 'f'
       -- We do the transformation only if 'x' has the mark
       ifmarked f x | hasmark trep = f x
       ifmarked f x = traverse (trep,x)
       
       -- optimization: t has no mark, there is nothing to map under it
       traverse (Dyn t,x) | typeOf t == typeOf x = 
                   maybe (error "traverse1") id $ cast x
       traverse (Dyn t,x) | (tcon,tkids) <- splitTyConApp (typeOf t),
                            (con,kids)   <- splitTyConApp (typeOf x),
                            not (length tkids == length kids && tcon == con) =
          error $ unwords ["template type", show (typeOf t),
                           "inconsistent with value type", show (typeOf x)]
       traverse (Dyn t,x) = rebuild f (serialize t1) xdyn
         where xdyn@(con,kids) = serialize x
               t1 = fromConstr con `asTypeOf` t  -- build the ephemeral template

The code has many internal self-consistency checks that should never fire. The code includes an optimization: if the template (after removing Dyn) has the same type as the mapped value, there is no X in there and hence there is nothing to map. Therefore, we immediately return the value as it is, with no transformation or even traversal.

In conclusion, we have presented a surprising application of Data/Typeable to define Functor's fmap generically, once and for all. As behooves to the generic fmap, our gmap is total (for the total mapping function) and parametric. It is also optimal, traversing only the needed parts of the value. Our approach is a synthesis of a fake serialization-deserialization with an sham `X marks the spot' template. Claus Reinke has summarized the approach aptly: ``if I had followed my own choice of metaphor to the end, I would have known that X marks the spot only on a map, not on the real thing (or else everyone could find the treasure..). Your shallow, incrementally built not-quite copy is just another representation of a map, and since both map and X are hidden from users of the function, everyone is happy.''.

I greatly appreciate many critical and inspiring discussions with Claus Reinke and Alexey Rodriguez-Yakushev on the Hs-Generics mailing list in June-July 2008.

Version

The current version is July 2008.

References

GMap.hs [9K]
The complete code with many tests

Threads entitled ``gMap in SYB1'' and ``compositional gMap in SYB1'' on the Hs-Generics mailing list, July 2008.
< http://www.haskell.org/pipermail/generics/2008-July/000349.html >
< http://www.haskell.org/pipermail/generics/2008-July/000362.html >

Claus Reinke: Traversable Functor Data,or: X marks the spot
Message posted on the Hs-Generics mailing list on Tue Jun 24 14:45:18 EDT 2008
< http://www.haskell.org/pipermail/generics/2008-June/000343.html >
That message proposed the idea of `X marking the spot'. It has been further developed in gmap, specifically to avoid any unsafe operations.

Ralf Lämmel and Simon L. Peyton Jones: Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming TLDI 2003, pp. 26-37
< http://www.research.microsoft.com/~simonpj/papers/hmap/ >
This is the paper that introduced SYB.

Claus Reinke: Re: compositional gMap in SYB1
Message posted on the Hs-Generics and Haskell-Libraries mailing lists on Tue Jul 29 15:25:02 EDT 2008
< http://www.haskell.org/pipermail/generics/2008-July/000367.html >
The message turns gmap into a generic traverse, that is, Traversable.traverse implemented once and for all.

Reify a data type as a graph term

Representing a data type as a graph term comes up from time to time. In August 2011 Philip Wadler was interested in finding out if a data type is recursive, and if it is not, what is its maximum depth. In November 2013 Anthony Clayden wished to turn the data type describing a data base table into a term representing the schema of that table (to be put into the schemas table). All such questions are easy to answer once we have a term representation of a data type declaration -- with the descriptions of data constructors and their arguments. Obtaining such a term, that is, reifying a data type is surprisingly hard, as the questioners have pointed out. We present a general solution using Data.Data, which is at the core of the generic programming framework SYB. Data.Data and the related Data.Typeable are part of the GHC base. As an illustration of our approach we answer Wadler's and Clayden's questions.

Reifying a data type may seem easy. Given a value of the data type, we traverse it with gMapQ to find out all its components and their types. For example, the traversal of [(Just 'a', True)] might produce the tree

     TData "[((Maybe Char),Bool)]"
       [TCon "(:)"
         [TData "((Maybe Char),Bool)"
           [TCon "(,)" [TData "Maybe Char" [TCon "Just" [TData "Char" []]],
                        TData "Bool" [TCon "True" []]]],
          TCon "[]" []]]

What if the only values we have are [(Nothing, True)]::[(Maybe Char,Bool)] or []::[(Maybe Char,Bool)]? Given just the type Maybe Char, with no concrete value, we have to determine the data constructors, Just and Nothing, and the types of their arguments, and their data constructors, etc. The question still seems easy: Data.Data offers the function dataTypeOf :: Data a => a -> DataType that describes the data type a without requiring any concrete value of it (in other words, dataTypeOf does not evaluate its argument). The datatype description has the name of the data type, its sort (algebraic, Integer, Float, etc.) and the names of its constructors. Alas, the description gives just the names of data constructors -- but not the number and the types of their arguments. In other words, dataTypeOf gives a shallow description of the data type, with too little information to construct its graph.

Fortunately, Data.Data has another way to describe a data type without needing its value:

     gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c a

Per se, gunfold does only the shallow traversal, but it can be applied recursively. Its signature is a bit strange. Let's look at one of its instances, when a is Maybe Char:

     gunfold :: (forall b r. Data b => c (b -> r) -> c r) ->
                (forall r. r -> c r) -> Constr -> c (Maybe Char)
     gunfold k z c | c == constructor_Nothing = z Nothing
     gunfold k z c | c == constructor_Just    = k (z Just)

In the last line, k receives z Just of the type c (b->r) where r is Maybe Char and b is Char; b is exactly what we need: the argument type of the constructor Just. Not only k, which we provide, finds out the constructor argument type -- it is assured by the gunfold signature that the type is a member of Data and hence can be introspected and gunfolded in turn. That is the general idea of our solution. It is implemented via a pair of mutually recursive functions

     describe_type' :: Data a => IgnoreTypes -> SeenTypes -> CTTree a
     describe_ctor  :: Data a => IgnoreTypes -> SeenTypes -> Constr -> CCList a
     
     type SeenTypes   = [TypeRep] 
     type IgnoreTypes = [TypeRep]

The former finds out the names of constructors and the latter describes the arguments of a given constructor. Together, they reify a data type a as a term of the following structure

     data TRepTree = TData TRep [TCon] | BackRef TypeRep
     
     data TRep = forall a. Data a => TRep a
     data TCon = TCon Constr [TRepTree]      -- a data constructor and its args

Whereas TRep tells the data type name, sort, and the names of its constructors, TCon describes one data constructor, named by Data.Data.Constr, with all its arguments. Since a data type can be recursive the representation may contain cycles, which are encoded with BackRef. Our functions have to keep track of the already described data types (the argument SeenTypes) and build BackRef accordingly. Finally, we let the user prevent the traversal of particular data types. For example, one may wish to treat [Char] as an atomic data type of strings rather than as a list of characters. The function describe_type puts it all together:

     describe_type :: forall a. Data a => IgnoreTypes -> a -> TRepTree
     describe_type ignore _ = tree
      where
      CTTree tree = describe_type' ignore [] :: CTTree a

The data type from the initial example is described as follows, with the result shown in the comments. Compared to the description at the beginning of the article, the one below has more detail: we now see all data constructors of Bool and of Maybe Char.

     t1 = describe_type [] ([] :: [(Maybe Char, Bool)])
     {-
     TData [((Maybe Char),Bool)]
       [TCon [] [],
        TCon (:)
         [TData ((Maybe Char),Bool)
           [TCon (,)
            [TData Maybe Char [TCon Nothing [], TCon Just [TData Char []]],
             TData Bool [TCon False [],TCon True []]]],
           BackRef [((Maybe Char),Bool)]]]
     -}

Once the TRepTree is constructed, answering Wadler's and Clayden's questions is easy. The former was asking if a data type has a recursive subcomponent; if not, its maximum depth. For example, Pair Int (Either Int Int) has depth 3. The depth computation is the standard depth-first traversal of a TRepTree accumulating the depth maximum over all branches. If we encounter a BackRef, the data type is recursive and the depth is infinity. Clayden's question, of picking out all newtype descriptions, is also answered by the traversal.

We have seen that the often-mysterious gunfold tells us about arguments of a data constructor just from its name and its data type, with no concrete value at hand. Moreover, gunfold can be applied recursively on the arguments. In the result, we reify a data type as a TRepTree term, the graph of the data type. We have answered Wadler's general question of type introspection: ``whether the information about types could be turned into a graph and manipulated as such''.

Version

The current version is December 2013.

References

TypeReflFull.hs [12K]
The complete Haskell code with comments and tests

Polymorphic variants: solving the expression problem

The existing HList records support, without further programming, polymorphic variants, i.e., extensible recursive open sum datatypes, quite similar to polymorphic variants of OCaml. Variant subtyping is automatic. HList thus solves the familiar (in OO, at least) `expression problem' -- the ability to add new alternatives to a datatype and extend old processing functions to deal with the extended variant, maximally reusing old code without changing it. Furthermore, values of unextended datatypes are processed by extended functions without any ado. In contrast, attempting to apply non-extended functions to the values of the extended datatype results in a type error.

Our polymorphic variants are literally open co-products: dual of extensible polymorphic records of HList. The duality is negation:

     	 NOT (A | B) -> (NOT A & NOT B)

This implication is a part of the deMorgan law that holds both in classical, and, more importantly, in intuitionistic logics. Our encoding of sums is the straightforward Curry-Howard image of that law.

Our implementation of polymorphic variants in terms of HList records uses no type classes, no type-level programming or any other type hacking. In fact, there are no type annotations, type declarations, or any other mentioning of types, except in the comments.

Version

The current version is 1.0, Jul 2, 2006.

References

VariantP.hs [5K]
The complete code with many examples. The code is based on (and included in the distribution of) the HList library.

open-coproduct.txt [4K]
Polymorphic variants (extensible, recursive sums) with HList
The message posted on the Haskell mailing list on Sun, 2 Jul 2006 18:35:12 -0700 (PDT)

< http://guppy.eng.kagawa-u.ac.jp/~kagawa/PVH >
Koji Kagawa: Polymorphic Variants in Haskell
Presented at the Haskell Workshop 2006
This paper proposes a quite different solution to the expression problem, encoding constructors of open `data types' as class methods. Our example is patterned after the running example of this paper.

A variation on the theme of SYB3

We describe a variation of SYB3 type-class-based generic programming framework that avoids both higher-rank types and mutually recursive instances. Because of the latter our code, unlike SYB3, works even in Hugs. The code was written back at ICFP05 in Tallinn, as a challenge from Simon Peyton-Jones after I have expressed doubts about the necessity of mutually recursive instances. I am grateful to Chung-chieh Shan for many helpful discussions.

The code files below closely follow the layout of the running example in Laemmel and Peyton-Jones' ICFP2005 talk. The comments describe which part of the code is the `Library', which part of the code is the (data-structure--independent) `generic function', and which part of the code overrides the general processing of the generic function. The code has two examples; one of them is gsize of the original SYB3 paper.

Our variation of SYB3 takes advantage of the fact that type-class-bounded and higher-rank polymorphisms can be partially traded for each other. To be more precise, we use type codes for functions and dictionaries. The type code must then be interpreted by a type class similar to Apply of HList (cf. HMap). This extra interpretation step is the main conceptual deviation from SYB3, largely taking the important idea of function combinators out of it.

Version

The current version is October 5, 2005.

References

dat1.hs [4K]
This file describes an intermediate step, so to make it easier to understand the final code below. This present code still uses recursive instances, but for a different reason and only for recursive data types such as lists. We already manage without higher-rank functions.

dat2.hs [4K]
With a little bit of CPS, we now eliminate recursive instance dependency even for recursive datatypes. This is our final solution.

Re: Scrap your boilerplate and Variants
The discussion thread started from a message posted on the Haskell-Generics mailing list on Tue Oct 24 23:28:24 EDT 2006

Our inspiration, SYB3: Scrap your boilerplate with class: extensible generic functions by Ralf Laemmel and Simon Peyton-Jones. In: Proceedings of ICFP 2005.

Smash along your boilerplate

Smash is a generic programming approach based on a type-level typecase, best understood as a static dual of `Scrap your boilerplate I' (SYB1). The Smash approach is powerful to express traversals where the type of the result is computed from the type of the transformer and the type/structure of the original term. An example is replacing all Floats with Doubles in an arbitrary term, e.g., made of Maybes, tuples, lists; the result type is computed and need not be specified.

SYB1, when traversing terms and invoking user functions for subterms of a particular type, relies on a run-time typecase. The latter requires run-time type representation, which is provided by the typeclass Typeable. The typeclass implements the method `cast' for a safe cast from the `generic type' to the specific type. We observe that the typecase, at the type level, has always existed in Haskell: it is the type equality predicate TypeEq of HList.

Our generic functions, like those of SYB1, are ordinary functions. Since the transformation for each subterm is chosen at compile-time, however, we do not need any casts, run-time type representations, or Typeable. We do not need higher-ranked types either. Also unlike SYB1, the set of traversal strategies is extensible: To the pre-defined strategies for gmap, gshow, gfold and geq, the programmer may at any time add a new class of traversals.

Our most general generic function is gapp, which applies a generic function to a term. A generic function is, quite literally, made of two parts. First is a term traversal strategy, identified by a label. One strategy may be to `reduce' a term using a supplied reducing function (cf. fold over a tree). Another strategy rebuilds a term. The second component of a generic function is spec, the HList of `exceptions', or ad hoc transformations. Each element of spec is an ordinary function that tells how to transform a term of a specific type. Exceptions override the generic traversal: If the type of the input term matches the argument type of one of the functions in spec we apply that function. If no specific function applies, we do the generic action implicit in the traversal strategy.

The algorithm that selects for each input subterm the appropriate transformer is tantamount to overloading resolution algorithm. Smash thus implements typeclass instance selection algorithm in the typechecker itself. The messages below describe the implementation of typeclasses in terms of Smash, with an additional flexibility of reordering, inspecting and deleting `instances'.

Version

The current version is 1.2, Jun 5, 2007.

References

< http://code.haskell.org/generics/comparison/SmashA/Syb4A.hs >
The complete commented code, of the core and of several typical traversal strategies for gmap, gfold, gshow, geq. There are examples of all these strategies. The other files in that directory are sample applications and examples, within the generic programming comparison framework by Alexey Rodriguez Yakushev, Alex Gerdes, and Johan Jeuring. The code is tested with GHC 6.4.1 and 6.6.

smash-first.txt [8K]
Smash your boiler-plate without class and Typeable
The message describing the original Smash approach, posted on the Haskell mailing list on 10 Aug 2006 21:16:34 -0000

smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The improved and generalized approach, described in the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)

< http://www.haskell.org/haskellwiki/Applications_and_libraries/Generic_programming/Smash >
`Smash' on Haskell Wiki

Generic traversal of non-existent terms

The Smash generic programming approach lets the programmer add new classes of traversals. We show one such extension, lazy gmap, and its application to mapping of terms that do not exist. This lets us implement gminimum and gmaximum -- which traverse undefined and produce the fully-defined smallest (resp. largest) term of the desired type:

     	gminimum () :: (Maybe Int,Either Bool (Maybe Char))
     	-- ==> (Nothing,Left False)
     	gmaximum () :: (Maybe Int,Either Bool (Maybe Char))
     	-- ==> (Just 2147483647,Right (Just '\1114111'))

The result of traversing undefined::(Maybe Int,Either Bool (Maybe Char)) is (Nothing,Left False). We take advantage of the fact that undefined stands for a term of any structure and is deconstructible and traversable. Haskell's non-strictness lets us meaningfully traverse non-existing terms.

Traversing non-existing sums such as undefined::Either Int Char requires special attention. Sums give us a choice. We should report the available alternatives, and let the users make the choice of a component of the sum. The implementation of gminimum always chooses the first available alternative; gmaximum chooses the last available.

The function gminimum is akin to the de-typechecker since both convert `undefined' to `defined'. De-typechecker produces only polymorphic functions, whereas gminimum yields data, including monomorphic functions.

Version

The current version is 1.1, Jun 5, 2007.

References

< http://code.haskell.org/generics/comparison/SmashA/Syb4ABuild.hs >
Implementation of the lazy gmap strategy and many examples.

smash-along.txt [12K]
Smash along your boilerplate; how to traverse a non-existent term
The explanation, described in the second half of the message posted on the Haskell mailing list on Tue, 5 Jun 2007 00:09:02 -0700 (PDT)

Generic de-serialization

To send a datum across the network or store it in a file we generally need to convert the object (`dump it') into a stream of bytes, strings or other such platform-independent units. We also should be able to reconstruct the object from such a dumped form -- deserialize it. A serialized object should generally represent both the structure and the content of the original. One may often assume however the structure (`schema', in database parlance) to be fixed, so (de)serialization need only deal with the content.

Serialization can be written generically, with generic fold. The following Smash code says the result of serializing an object is the list of all its primitive fields.

     	serialize xs = gapp (TL_red concat) primitive_fields_show xs
     	primitive_fields_show =
     	    (\ (s::Int)    -> [show s]) :+:
     	    (\ (s::Float)  -> [show s]) :+:
     	    (\ (s::String) -> [s])      :+:
     	    HNil                       -- more can be added in the future

For example, for the following data type of a company and a sample term (from the famous example of SYB papers)

     data Company  = C [Dept];          data Dept     = D Name Manager [Unit]
     data Unit     = PU Employee | DU Dept
     data Employee = E Person Salary;   data Person   = P Name Address
     data Salary   = S Float;           type Manager  = Employee
     type Name     = String;            type Address  = String
     
     genCom = C [D "Research" (E (P "Laemmel" "Amsterdam") (S 8000.0))
                  [PU (E (P "Joost" "Amsterdam") (S 1000.0)),
                   PU (E (P "Marlow" "Cambridge") (S 2000.0))],
                 D "Strategy" (E (P "Blair" "London") (S 100000.0)) []]

serialize genCom produces the list

     ["Research","Laemmel","Amsterdam","8000.0","Joost","Amsterdam",
      "1000.0","Marlow","Cambridge","2000.0","Strategy","Blair",
      "London","100000.0"]

Hugh Perkins has posed the problem of writing a generic deserializer, which should work for terms of any structure in the domain of a generic programming library. His particular application is complex terms representing XML documents. The deserializer can indeed be written generically: in the first approximation, it is a generic monadic map over a term. It takes a term, the prototype, and the list of serialized primitive fields as a monadic state. The deserializer traverses the prototype replacing each primitive field with the corresponding value from the monadic state. For example, assuming the following list of serialized fields:

     	 retro = ["Metaphysics", "Kant","Koeningsberg","800.0",
     	          "Hume","Edinburgh","100.0",
     	          "Marlowe","Cambridge","200.0",
     	          "Ruling","Thatcher","London","50000.0"]

and genCom as the prototype, deserialize genCom retro `upgrades' the company giving us the term

     C [D "Metaphysics" (E (P "Kant" "Koeningsberg") (S 800.0))
        [PU (E (P "Hume" "Edinburgh") (S 100.0)),
         PU (E (P "Marlowe" "Cambridge") (S 200.0))],
        D "Ruling" (E (P "Thatcher" "London") (S 50000.0)) []]

In the discussion thread, Jeremy Gibbons has pointed out that McBride and Paterson's idiomatic traverse subsumes monadic gmap. See his and Oliveira's paper below for the detailed discussion.

Version

The current version is 1.2, Jul 3, 2007.

References

< http://code.haskell.org/generics/comparison/SmashA/Deserialize.hs >
Complete code for serializing a term, and deserializing it using monadic generic map.

Hugh Perkins et al. Thread on the Hs-Generics mailing list, June 26-July 3, 2007.

Jeremy Gibbons and Bruno C. d. S. Oliveira. The essence of the Iterator pattern.
< http://web.comlab.ox.ac.uk/jeremy.gibbons/publications/#iterator >

Last updated January 1, 2014

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML