int->string
:
the function to ``show'' an integer. Since there are many such
functions, the user has to register the ``canonical'' value of this
type. In the simplest case, obtaining the canonical value is a
mere look up in the database of registered values. Instead of a
canonical value itself, however, the database may provide a rule how
to make it, from some other registered values: e.g., how to ``show'' a
pair if we can show its components. Querying this database of facts
and rules is quite like the evaluation of a logic-programming query.
From the point of view of the Curry-Howard correspondence, finding a term of a given type is finding a proof of a proposition. This is how this facility was first developed in Coq, as a programmable type inference for proof search. It is actually a general and flexible technique of overloading resolution, and deserves to be known and used in programming at large.
=
sign. Our experience, and experiments, agree that
precision may hinder (human) comprehension, by burying the important in
details. Certain degree of ambiguity appears necessary.
However useful overloading may be for concise communication, eventually the receiving party has to determine what exactly a given ambiguous word refers to -- that is, to resolve the overloading. The context is the key, literally. One can think of overloading resolution as an `intelligent' search in a database of possible referents, indexed by some representation of the context, often a type. To put it another way, overloading resolution is, conceptually, evaluating a logic-programming query. This view unifies various forms of overloading in programming: ad hoc overloading (C++, CLOS, and many others), type classes (Haskell), implicits (Scala and now others).
This page is about Canonical Structures, first appeared in Coq as a programmable unification. In this form of overloading, the resolution is not just conceptually running a Prolog-like query: it is literally so. Furthermore, the logic program -- the database of facts and rules -- is user-programmable. And so are the details of the resolution algorithm: One may allow `overlapping instances' -- or prohibit them, insisting on uniquely resolvable overloading. One may make the search fully deterministic, fully non-deterministic, or something in-between.
This article and the accompanying code are to explain and implement canonical
structures in plain OCaml and encourage experimentation. The title
could have been `Canonical Structures for the working OCaml
(meta) programmer'. The rudiment of canonical structures is already
present in OCaml, in the form of the registry of printers for
user-defined types: #install_printer
. It is available only at the
top-level, however, and deeply intertwined with it. As a modest
practical benefit, this facility is now available for all programs, as
a plain, small, self-contained library, with no compiler or other
magic. We have used this library in a realistic project: an embedded
session-typed DSL for service-oriented programming.
Although canonical structures have already proved useful in OCaml as it is, they would become more convenient when run-time types get compiler support. Canonical structures are also useful in MetaOCaml: they are by nature a meta-programming facility.
Programming as collaborative reference
`Implicits for the masses'
The first version of this article, posted on the
caml-list on Sep 5, 2019
Thus starts the remarkable paper by Mahboubi and Tassi, which was the inspiration for the present page. The paper is a gentle tutorial, which explains how the inferring of omitted details is made part of type inference, how types come into this, how it is all related to proof search, and how to program this type inference in a manner reminiscent of Prolog. (Alas, the paper does not elaborate the latter point and does not show any Horn clauses.) Albeit gentle, the Mahboubi and Tassi's tutorial is a Coq tutorial aimed at Coq users. Yet their insights go beyond Coq and deserve to be known wider.``One of the key ingredients to the concision, and intelligibility, of a mathematical text is the use of notational conventions and even sometimes the abuse thereof. These notational conventions are usually shaped by decades of practice by the specialists of a given mathematical community.... A trained reader can hence easily infer from the context of a typeset mathematical formula he is reading all the information that is not explicit in the formula but that is nonetheless necessary to the precise description of the mathematical objects at stake.''``Formalizing a page of mathematics using a proof assistant requires the description of objects and proofs at a level of detail that is few orders of magnitude higher than the one at which a human reader would understand this description. This paper is about the techniques that can be used to reproduce at the formal level the ease authors of mathematics have to omit some part of the information they would need to provide, because it can be inferred....''
Inspired by Mahboubi and Tassi's paper, the present page is an attempt
to explain canonical structures for programming rather than theorem
proving, and illustrate them in plain OCaml, elaborating the
connection with logic programming. We show Horn clauses explicitly,
embed them in OCaml and present a very simple and modifiable
Prolog-like interpreter. We changed the running example Mahboubi and
Tassi's, from Eq
to Show
.
We will be implementing the overloaded show
function, to convert
values of different types to a string: an analogue of Haskell's
Show
class. To put it another way, given some type t
we would like
to find a function of the type t -> string
. Since that type will
appear often, let's introduce an abbreviation for it:
type 'a show_dict = 'a -> stringThe desired function, call it
find_show
, receives
a type t
and finds and returns the corresponding, or `canonical',
value of the type t show_dict
. Types, however, cannot be passed as
arguments: OCaml is not a dependently-typed
language. What we can pass is a value
representing the type: see the separate article on Type Representations. Let t trep
be the type of the value representing the type t
.
Thus our goal is to implement the function find_show : 'a trep -> 'a show_dict
. The generic overloaded show
is then
let show trep x = (find_show trep) x val show: 'a trep -> 'a -> string = <fun>to be used as
let ex1 = show Int 4 let ex2 = show (Pair (Int,(Pair (Int,Int)))) (10,(11,12))where
Int
of the type int trep
is the representation of the type
int
, and Pair
is the constructor for the representation of pairs.
The type checker can certainly infer the type of 4
and
(10,(11,12))
and could automatically synthesize the type
representations, Int
and Pair (Int,(Pair (Int,Int)))
, resp.,
saving us trouble specifying them explicitly. (Perhaps in near future
OCaml would indeed do such type representation synthesis, as Scala already
does.) What the type checker cannot synthesize is the value of t show_dict
: after all, there are generally many functions of the type
t -> string
. The user has to give a hint: which function is to
regard as the canonical t -> string
function.
The job of find_show
is to follow the hint and find the canonical
function: in effect, find the
evidence that the type t->string
is populated. Keeping Curry-Howard
in mind, find_show
does a proof search.
The Coq implementation of canonical structures provides the global
registry, a database, in which programmers register the t show_dict
-like values that are to be considered canonical for the
given type t
. Here is our first attempt at such registry. The
following is the type of registry entries: an association of a
type (via its representative) and the canonical dictionary for it.
type show_reg_entry = Showable : 'a trep * 'a show_dict -> show_reg_entry (* 'a is existential *)The global database in our toy implementation is a list of entries:
let show_registry : show_reg_entry list ref = ref []The following function is the analogue of the Coq command `Canonical Structure', to register a canonical value in the database. For simplicity we skip the check if a type already has a canonical value registered for it. The check is useful if we insist on the uniqueness of the resolution and do not allow `overlapping instances'.
let show_register : 'a trep -> 'a show_dict -> unit = fun trep dict -> show_registry := Showable (trep,dict) :: !show_registry
Finally, find_show
is the resolution function: it does the simple
database look-up. Here, teq : 'a trep -> 'b trep -> ('a,'b) eq option
is the type equality comparison. If two type representations are
equal, it returns the evidence, in the form of ('a,'b) eq
GADT,
that the types are equal as well.
let find_show : type a. a trep -> a show_dict = fun trep -> let rec loop : show_reg_entry list -> a show_dict = function | [] -> failwith "find_show: resolution failed" | Showable(trep',dict)::t -> match teq trep trep' with | Some Refl -> dict | _ -> loop t in loop !show_registry
After registering the instance of show
for integers:
let () = show_register Int string_of_intwe can run the first example
ex1
.
To review, we have built a database of facts, like the following:
showable(int,string_of_int).and search it, with the query like
?- showable(int,X).
This may look familiar to some.
Example ex2
involves pairs. It is no fun registering canonical
values for each concrete pair type t1 * t2
. We'd rather
register a rule: how to make the canonical show of a pair if we
already know how to show the two components of the pair. In other
words, we can show pairs provided we can show their
components -- which can be written as the Horn clause:
showable(pair(X,Y),R) :- showable(X,DX), showable(Y,DY), R=make_pair_dict(DX,DY).where the function
make_pair_dict
is defined as follows:
let make_pair_dict : 'a show_dict * 'b show_dict -> ('a * 'b) show_dict = fun (dx,dy) -> fun (x,y) -> "(" ^ dx x ^ "," ^ dy y ^ ")"All that remains is to represent this Horn clause in OCaml and modify
find_show
accordingly. The logic variables such as
X
, Y
and R
above may be a hassle to implement,
however. Fortunately, we can take a shortcut. Our
resolution query always has the form ?- showable(t,R).
where t
is
a concrete type. That is, we always treat the first argument of the
showable
relation as the `input' and the second as the `output'.
Instead of the full unification we, hence, get by with a simpler matching.
In our second attempt, we extend the type of registry entries to carry not only facts about dictionaries but also rules for making them. The representation of rules looks quite like a first-class pattern.
type show_reg_entry = | Fact : 'a trep * 'a show_dict -> show_reg_entry (* as before *) | Rule : {pat : 'a. 'a trep -> 'a show_dict option} -> show_reg_entryWe add a function to register a rule:
let show_register' : show_reg_entry -> unit = fun entry -> show_registry := entry :: !show_registry
The new find_show
does an SLD-like resolution -- but with committed
choice and with fixed modes.
let find_show : type a. a trep -> a show_dict = fun trep -> let rec loop : show_reg_entry list -> a show_dict = function | [] -> failwith "find_show: resolution failed" | Fact (trep',dict)::t -> begin match teq trep trep' with | Some Refl -> dict | None -> loop t end | Rule {pat}::t -> match pat trep with | Some dict -> dict | None -> loop t in loop !show_registryOur resolution clearly does no backtracking: if a rule matches, we are committed to it and do not look further. In Prolog, our rules would look as
showable(T,R) :- T = pair(X,Y), !, showable(X,DX), showable(Y,DY), R=make_pair_dict(DX,DY).Canonical Structures in Coq do no backtracking either. There is nothing that stops us from supporting backtracking though: it is left as an experiment for the reader.
We now can register the rule to show pairs, of any showable types:
let () = let pat : type b. b trep -> b show_dict option = function | Pair (x,y) -> Some (make_pair_dict (find_show x, find_show y)) | _ -> None in show_register' @@ Rule {pat}and run both examples
ex1
and ex2
.
Next we present a `production-ready' version of the library, which works beyond `show' and is also easier to experiment with.
Type-class resolution via intensional type analysis: many parallels to canonical structures
The first example is to show values of different types, and the types themselves. This is the extension of the running example of the previous section. We define the two overloaded functions:
module ShowType = MakeResolve(struct type 'a dict = string end) module Show = MakeResolve(struct type 'a dict = 'a -> string end)and their simple instances: `facts' of overloading:
let () = ShowType.register Int "Int" let () = ShowType.register Bool "Bool" let () = Show.register Int string_of_int let () = Show.register Bool string_of_boolWe can now show integers:
Show.find Int 1Since our Canonical Structures are implemented completely outside the compiler, the types of values to look up have to be explicitly specified as values of the
'a trep
data type, which represents types at the value level. For
example, the value Int
represents the type int
(and
itself has the type int trep
). The data type can be easily
extended with representations of user-defined data types (we show
an example later). The trep
values may be
regarded as type annotations; in particular, as with other type
annotations, if the user sets them wrong, the type error is
imminent. Therefore, they are not an additional source of mistakes,
but still cumbersome. If a compiler could somehow ``reflect'' an
inferred type of an expression and synthesize a trep
value, these
annotations could be eliminated.
To show pairs, we register the generic rule
let () = let open Show in let pat : type b. b trep -> b dict option = function | Pair (tx,ty) -> Some (fun (x,y) -> "(" ^ find tx x ^ "," ^ find ty y ^ ")") | _ -> None in register_rule {pat}(The rule for
ShowType
is elided.)
Our library permits `overlapping instances'. To illustrate, we
register a particular show for pairs of booleans.
let () = Show.register (Pair(Bool,Bool)) @@ let btos = function true -> "T" | false -> "F" in fun (x,y) -> btos x ^ btos yTherefore, the nested pair
(1,(false,true))
shows as "(1,FT)"
:
Show.find (Pair(Int, Pair(Bool,Bool))) (1,(false,true)) (* - : string = "(1,FT)" *)
The library is extensible with user-defined data types, for example:
type my_fancy_datatype = Fancy of int * string * (int -> string)After registering the type with trep library and adding the printer
let () = Show.register MyFancy @@ function Fancy(x,y,_) -> string_of_int x ^ "/" ^ y ^ "/" ^ "<fun>"one can print rather complex data with fancy, with no further ado:
Show.find (List(List(Pair(MyFancy,Int)))) [[(Fancy(1,"fanci",fun x -> "x"),5)];[]] (* - : string = "[[(1/fanci/<fun>,5)];[]]" *)As Mahboubi and Tassi would say, proof synthesis at work!
The next example is Haskell-like Read
, to read (or, parse) values
of various types. Overloading here is resolved based on
the return type.
module Read = MakeResolve(struct type 'a dict = Scanf.Scanning.in_channel -> 'a end) let () = Read.register Int (fun ch -> Scanf.bscanf ch "%d" Fun.id) let () = Read.register Bool (fun ch -> Scanf.bscanf ch "%B" Fun.id) let () = let open Read in let pat : type b. b trep -> b dict option = function | Pair (tx,ty) -> Some (fun ch -> Scanf.bscanf ch "(%r, %r)" (find tx) (find ty) (fun x y -> (x,y))) | _ -> None in register_rule {pat}Here is an example (the comment underneath shows the evaluation result):
Read.find (Pair(Int,Pair(Bool,Bool))) (Scanf.Scanning.from_string "(1, (true, false))") (* - : int * (bool * bool) = (1, (true, false)) *)
The accompanying code shows the final example: ReadShow
,
illustrating inheritance. This overloading function ensures that both
show
and read
are available for a particular type. Furthermore,
(if the programmer has defined them well) they are
coherent. ReadShow
also provides the function norm
which is the
composition of read
and show
: it converts a printing
representation of a value into a canonical form.
In conclusion, we have presented the library for canonical structures in OCaml. The benefits (besides the educational):
#install_printer
--like facility, available to all programs
(not just the top-level), implemented as a plain self-contained
OCaml library with no compiler or other magic.
Session types without
sophistry
An application in a large practical library.
The canonical structures are used to look up the code for
serializers and deserializers, to print types, and to implement
infer
to infer session types of process fragments with an arbitrary
number of free endpoint variables.
The drawbacks of our canonical structures library also disappear if we use it in a meta-program (code generator). Overloading resolution errors reported at run-time of the generator are `compile-time' errors from the point of view of the generated program. Our canonical structures library then becomes a type-class/implicits facility, for the generated code -- the facility, we can easily (re)program.
Therefore, several type representation libraries have been developed
over the years, referenced below. They differ in how much they rely on
the compiler/run-time internals, how
many applications they include. Here we present one of the simplest
libraries. It is implemented in pure OCaml, with no dependence or even
knowledge of the OCaml internals, and without any unsafe operations. It is
extensible to arbitrary user-defined data types, also supporting
parameterized and abstract types. It is a pure-OCaml
version of Haskell's Typeable
.
In our library, a type t
is represented by the value of the type t trep
, which is the (generalized) algebraic data type. It is extensible:
type _ trep = ..initially populated as follows (abbreviated):
type _ trep += | Unit : unit trep | Int : int trep | List : 'a trep -> 'a list trep | Fun : 'a trep * 'b trep -> ('a -> 'b) trepThe library also defines the type equality comparison
val teq : type a b. a trep -> b trep -> (a,b) eq optionwhere
eq
is the type equality (propositional equality) witness:
type (_,_) eq = Refl : ('a,'a) eqThat is,
teq
compares two type representations t1 trep
and t2 trep
. If they are equal, it returns the witness that t1
and
t2
are the same type, which could be used to safely cast
(inter-convert) the values of these types. Using OCaml internals,
teq
may be implemented once and for all, efficiently, by comparing
run-time trep
values. Our library however is above the board and
uses no internals magic. For that reason, the library provides the
operation to extend teq
for newly defined types, illustrated later.
Unlike other libraries, trep
offers no printing or constructor
introspection. These facilities are better implemented via
canonical structures (aka, `implicits for the masses'). It is
bare-bone, intended as the foundation for dynamics,
heterogeneous type-indexed collections and implicits for the masses:
programmable overloading resolution.
Here is the simple illustration: generic showing.
val gshow : 'a trep -> 'a -> stringThe function
gshow
takes a value and a representation of its type
and returns the string `showing' the value. The idea of the
pattern-matching on the type representation is clearly seen in the
following auxiliary:
let gshow_init : type a. a trep -> a -> string option = fun t v -> match t with | Unit -> Some "()" | Int -> Some (string_of_int v) | Bool -> Some (string_of_bool v) ...The function
gshow
can handle pairs, lists and option types
(the result of the expression is shown in the comments underneath):
gshow (Pair (Int,Bool)) (1,true) (* - : string = "(1, true)" *) gshow (List (Pair (Int,Bool))) [(1,true); (2,false)] (* - : string = "[(1, true); (2, false)]" *)It is also extensible for user-defined data types. For example, after introducing a new data type
type 'a myres = R of 'a * int | Err of stringand registering it with the trep library
type _ trep += MyRes : 'a trep -> 'a myres trep let () = let teq : type a b. a trep -> b trep -> (a,b) eq option = fun x y -> match (x,y) with | (MyRes x,MyRes y) ->(match teq x y with Some Refl-> Some Refl | _ -> None) | _ -> None in teq_extend {teq}we may extend
gshow
let () = let gshow : type a. a trep -> a -> string option = function MyRes t -> (function R (x,i) -> Some ("R " ^ gshow (Pair (t,Int)) (x,i)) | Err s -> Some ("Err " ^ s)) | _ -> fun _ -> None in gshow_extend {gshow}We now may show the values involving
myres
, among others
let _ = gshow (Pair (Bool,MyRes (List Int))) (true,R ([1;2;3],4)) (* - : string = "(true, R ([1; 2; 3], 4))" *)
We should stress that gshow
is merely an illustration of the trep
library. Generic functions are more convenient to
define with the canonical structures facility, built upon trep
.
trep.ml [3K]
The complete code of the introduced trep library
gshow.ml [4K]
An illustration of the trep library: generic show
Jacques Garrigue and GrĂ©goire Henry: Runtime types in OCaml
OCaml 2013: The OCaml Users and Developers Workshop
<https://ocaml.org/meetings/ocaml/2013/proposals/runtime-types.pdf>
The classical paper on problems and difficult choices
arising from type abstraction.
Ivan Gotovchits. Extensible and very efficient type
representations in the CMU Binary Analysis Project (BAP)
Messages on the caml-list, 4 Sep 2019
``What could be surprising is that the
universe of types actually grew quite large, that large that the linear
search in the registry is not an option for us anymore.... An
important lesson that we have learned is that treps should not only be
equality comparable but also ordered (and even hashable) so that we can
implement our registries as hash tables. It is also better to keep them
abstract so that we can later extend them without breaking user code (to
implement introspection as well as different resolution schemes)''
Type representations in Jane St. library
<https://github.com/janestreet/base/blob/master/src/type_equal.mli>
<https://github.com/janestreet/typerep>
It is quite extensive, and relies on OCaml internals
Patrik Keller, Marc Lasson: LexiFi Runtime Types OCaml 2020 Workshop
<http://github.com/mirage/repr>
Type representation library from Mirage, with combinators
for generic programming