previous   next   up   top

MetaOCaml -- an OCaml dialect for multi-stage programming

 

BER MetaOCaml is a conservative extension of OCaml for ``writing programs that generate programs''. BER MetaOCaml adds to OCaml the type of code values (denoting ``program code'', or future-stage computations), and two basic constructs to build them: quoting and splicing. The generated code can be printed, stored in a file -- or compiled and linked-back to the running program, thus implementing run-time code optimization. A well-typed BER MetaOCaml program generates only well-scoped and well-typed programs: The generated code shall compile without type errors. The generated code may run in the future but it is typechecked now. BER MetaOCaml is a complete re-implementation of the original MetaOCaml by Walid Taha, Cristiano Calcagno and collaborators.


 

Summary

MetaOCaml is a conservative extension of OCaml with staging annotations to construct and run typed code values. See below for a brief introduction to these features. The generated code can be printed, stored in a file -- or compiled and linked-back to the running program, thus implementing run-time code optimization. MetaOCaml code without staging annotations, or with annotations erased, is regular OCaml 4.

MetaOCaml has been successfully used for specializing numeric and dynamic programming algorithms, building FFT kernels, compilers for an image processing DSL, OCaml server pages, generating families of specialized basic linear algebra and Gaussian Elimination routines, and high-performance stencil computations.

MetaOCaml is distinguished from Camlp4 and other such macro-processors by: hygiene (maintaining lexical scope); generating assuredly well-typed code; and the integration with higher-order functions, modules and other abstraction facilities of ML, hence promoting modularity and reuse of code generators. A well-typed BER MetaOCaml program generates only well-typed programs: The generated code shall compile without type errors. There are no longer problems of puzzling out a compilation error in the generated code (which is typically large, obfuscated and with unhelpful variable names). The errors are reported in terms of the generator rather than the generated code.

MetaOCaml is purely generative: the generated code is treated as a black box and cannot be examined. One can put code together but cannot take it apart. Pure generativity significantly simplifies the type system and strengthens the static assurances. It may seem that pure generativity precludes code optimizations. Fortunately, that is not the case. The literature has many examples of generating optimal code with MetaOCaml.

BER MetaOCaml is in the spirit of the original MetaOCaml by Taha, Calcagno and contributors but has been completely re-written using different algorithms and techniques. The old MetaOCaml code still runs with little or no changes. Overall, BER MetaOCaml aims at the most harmonious way of integrating staging with OCaml, minimizing the differences. It becomes easier to keep MetaOCaml up-to-date and ensure its long-term viability.

BER MetaOCaml has been re-structured to minimize the amount of changes to the OCaml type-checker and to separate the `kernel' from the `user-level'. The kernel is a set of patches and additions to OCaml, responsible for producing and type-checking code values. The processing of built code values -- so-called `running' -- is user-level. Currently the user-level metalib supports printing, and byte-compiling and linking of code values. Users may add other ways of running the code, for example, compiling it to machine code, C or LLVM -- without any need to hack into (Meta)OCaml or even recompile it.

BER MetaOCaml is almost entirely backwards compatible with the original MetaOCaml. To the user, the major differences of BER MetaOCaml are as follows (see below on this page for the detailed explanation):

Smaller visible differences are better printing of cross-stage persistent values and the full support for labeled arguments. A long-standing problem with records with polymorphic fields has fixed itself. BER N101 no longer permits generating ill-formed recursive let. The BER N101 code is now extensively commented, and has a regression test suite. MetaOCaml has never supported generating code with local or first-class modules and objects. However one may use modules and objects in the code generator. Camlp4 is excluded for now and probably forever. Instead of the syntactic extension pa_monad, BER MetaOCaml adds Nicolas Pouillard's lightweight let! notation for monadic programming. Offshoring is temporarily disabled, to be resurrected as a user-level library.

BER N101 is much less invasive into OCaml: compare the size of the patch to the main OCaml type-checker typing/typecore.ml, which contains the bulk of the changes for type checking the staging constructs. In an earlier version BER N004, the patch had 564 lines of additions, deletions and context; now, only 320 lines. (The core MetaOCaml kernel is trx.ml, with 2100 lines.)

The file NOTES.txt in the BER MetaOCaml distribution describes the features of BER MetaOCaml in more detail and outlines directions for further development. The R&D program is extensive, ranging from the staging theory to the developing of translators of the generated code to C, LLVM, and other targets. There is fun and tasks for everyone. Hopefully the release of BER MetaOCaml would incite using and researching typed meta-programming.

Version
The current version is N101, November 2013.
References
ber-metaocaml-ChangeLog [3K]

The Design and Implementation of BER MetaOCaml

ml2013-talk.pdf [200K]
MetaOCaml lives on: Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML, September 22, 2013. Boston, MA, USA

< http://www.metaocaml.org >
The web site of the original MetaOCaml. All examples and tutorials on that web sites apply.

 

Installation and availability

BER MetaOCaml N101 corresponds to OCaml 4.01. To install BER MetaOCaml through OPAM, please do
     $ opam update
     # temporarily rename ~/.ocamlinit, if exists
     $ opam switch 4.01.0+BER
     # if you renamed ~/.ocamlinit, you may rename it back
     $ eval `opam config env`

One may also install BER MetaOCaml manually. Befitting the separation between the kernel and the user-level, BER MetaOCaml is distributed as a set of patches to OCaml plus a separate library metalib. The installation involves patching the OCaml distribution, building the OCaml system system as usual, and compiling metalib and the BER MetaOCaml toplevel. The patched OCaml compiler is fully source- and binary-compatible with OCaml and can process OCaml code, with staging constructs or without. MetaOCaml can also use OCaml-compiled modules and libraries as they are. BER MetaOCaml is available as:

 

Introduction to staging and MetaOCaml

The standard example of meta-programming -- the running example of A.P.Ershov's 1977 paper that begat partial evaluation -- is the power function, computing x^n. In OCaml:
     let square x = x * x
     let rec power n x =
       if n = 0 then 1
       else if n mod 2 = 0 then square (power (n/2) x)
       else x * (power (n-1) x)
     (* val power : int -> int -> int = <fun> *)
(The comment below the code shows the reply of the MetaOCaml toplevel.)

Suppose our program has to compute x^7 many times. We may define

     let power7 x = power 7 x
to name and share the partial application.

In MetaOCaml, we may also specialize the power function to a particular value n, obtaining the code which will later receive x and compute x^n. We re-write power n x annotating expressions as computed `now' (when n is known) or `later' (when x is given).

     let rec spower n x =
       if n = 0 then .<1>.
       else if n mod 2 = 0 then .<square .~(spower (n/2) x)>.
       else .<.~x * .~(spower (n-1) x)>.;;
     (* val spower : int -> int code -> int code = <fun> *)
The two annotations, or staging constructs, are brackets .< e >. and escape .~e . Brackets .< e >. `quasi-quote' the expression e, annotating it as computed later. Escape .~e, which must be used within brackets, tells that e is computed now but produces the result for later. That result, the code, is spliced-in into the containing bracket. The inferred type of spower is different. The result is no longer an int, but int code -- the code of expressions that compute an int. The type of spower spells out which argument is received now, and which later: the future-stage argument has the code type. To specialize spower to 7, we define
     let spower7_code = .<fun x -> .~(spower 7 .<x>.)>.;;
     (*
     val spower7_code : (int -> int) code = .<
       fun x_1 ->
        (x_1 *
          (((* cross-stage persistent value (id: square) *))
            (x_1 * (((* cross-stage persistent value (id: square) *)) (x_1 * 1)))))>.
      *)
and obtain the code of a function that will receive x and return x^7. Code, even of functions, can be printed, which is what MetaOCaml toplevel just did. The print-out contains so-called cross-stage persistent values, or CSP, which `quote' present-stage values such as square to be used later. One may think of CSP as references to `external libraries' -- only in our case the program acts as a library for the code it generates. If square were defined in a separate file, say, sq.ml, then its occurrence in spower7_code would have been printed simply as Sq.square.

To use thus specialized x^7 now, in our code, we should `run' spower7_code, applying the function Runcode.run : 'a code -> 'a. The function compiles the code and links it back to our program. For historical reasons, Runcode.run is aliased to the prefix operation (!.):

     open Runcode
     let spower7 = !. spower7_code
     (* val spower7 : int -> int = <fun> *)

The specialized spower7 has the same type as the partially applied power7 above. They behave identically. However, power7 x will do the recursion on n, checking n's parity, etc. In contrast, the specialized spower7 has no recursion (as can be seen from spower7_code). All operations on n have been done when the spower7_code was computed, producing the straight-line code spower7 that only multiplies x.

MetaOCaml supports an arbitrary number of later stages, letting us write not only code generators but also generators of code generators, etc.

References
Walid Taha: A Gentle Introduction to Multi-stage Programming
< http://www.cs.rice.edu/~taha/publications/journal/dspg04a.pdf >

A.P. Ershov: On the partial computation principle
Information Processing Letters, 1977, v6, N2, pp. 38-41.
This is the English version of the paper that started partial evaluation as a research area. The power example, in Fig. 1 of the paper, can be seen on the following manuscript scan
< http://ershov.iis.nsk.su/files/rukopisi/fold241/241_75.gif >
The Information Processing Letters paper is the English version of
A. P. Ershov: A Theoretical Principle of System Programming
Doklady AN SSSR (Soviet Mathematics Doklady), 1977, v18,N2, pp.312--315.
For space reasons, the power example was omitted from the Doklady AN SSSR paper, but it can be seen in the notes to the November 1976 draft
< http://ershov.iis.nsk.su/files/rukopisi/fold241/241_17.gif >
This is perhaps the first occurrence of the power function specialization.
For other drafts of this paper, see ``The Ershov Archive for the history of computing''
< http://ershov.iis.nsk.su/archive/eaindex.asp?lang=2&gid=1480 >

MetaOCaml comes with the test suite, directory metalib/test/ in the BER MetaOCaml distribution, which contains many small and a few large examples.

 

The Design and Implementation of BER MetaOCaml

[The Abstract]
MetaOCaml is a superset of OCaml extending it with the data type for program code and operations for constructing and executing such typed code values. It has been used for compiling domain-specific languages and automating tedious and error-prone specializations of high-performance computational kernels. By statically ensuring that the generated code compiles and letting us quickly run it, MetaOCaml makes writing generators less daunting and more productive.

The current BER MetaOCaml is a complete re-implementation of the original MetaOCaml by Taha, Calcagno and collaborators. Besides the new organization, new algorithms, new code, BER MetaOCaml adds a scope extrusion check superseding environment classifiers. Attempting to build code values with unbound or mistakenly bound variables (liable to occur due to mutation or other effects) is now caught early, raising an exception with good diagnostics. The guarantee that the generated code always compiles becomes unconditional, no matter what effects were used in generating the code.

We describe BER MetaOCaml stressing the design decisions that made the new code modular and maintainable. We explain the implementation of the scope extrusion check.

References
ber-design.pdf [297K]
The full version of the paper presented at FLOPS 2014. A shorter version is published in the FLOPS Proceedings, Springer's LNCS 8475, pp. 86-102
doi:10.1007/978-3-319-07151-0_6

talk-FLOPS.pdf [196K]
The annotated slides of the talk presented at FLOPS 2014 in Kanazawa, Japan on June 4, 2014.

Let-insertion without pain or fear or guilt

loop_motion_gen.ml [7K]
Loop-invariant code motion with the convenient let-insertion (the main example in the FLOPS 2014 talk)

 

Implementing staging

An attractive approach to implementing a staged language is to add staging to an existing language, to take advantage of existing code generators, parsers, libraries and users. This section describes three ways of adding staging to a typed functional language, the last of which is actually implemented in MetaOCaml.

The most straightforward way to extend a functional language with staging is to add staging forms to the abstract syntax tree (AST) and adjust the parser; add staging forms to the typed AST modifying the type checker accordingly; add staging forms to the intermediate language and account for them in the code generator. We have to modify everything -- we might as well re-implement the language from scratch. But there is a far simpler way.

The simplest approach is to pre-process away the brackets into expressions built with code combinators. For example, the MetaOCaml expression

     fun x -> .<fun y -> .~x * y + 1>.
is to be translated into the following plain OCaml code
     fun x -> lam "y" (fun y -> (add (mul x y) (int 1)))
which, when evaluated, will generate the code.

The code combinators, or primitive code generators, are typed and have the following signature:

     type 'a cod
     val int: int -> int cod
     val add: int cod -> int cod -> int cod
     val mul: int cod -> int cod -> int cod
     val lam: string -> ('a cod -> 'b cod) -> ('a->'b) cod
     val app: ('a->'b) cod -> ('a cod -> 'b cod)

This translation can be done by camlp4 or a stand-alone pre-processor. The base system (OCaml) can be used as it is without any modifications. Best of all, the translation approach is consistent with typing: if the result of pre-processing, which is plain OCaml, type-checks according to the ordinary OCaml rules, the source MetaOCaml code is well-typed according to the MetaOCaml rules. This simple approach works surprisingly well: Scala's Lightweight Modular Staging (LMS) is based on similar ideas. Scheme's implementation of quasi-quote is also quite similar; only it does not process lambda- or let-forms in any special way.

There are complications, for example, dealing with special forms such as conditionals and loops. We have to introduce thunks. Pattern-matching forms and type annotations pose a nasty problem, which can be hacked away, e.g., by translating to deconstructors and special type annotation functions (like Haskell's asTypeOf). The paper Scala-Virtualized talks about such translations in detail. Alas, the polymorphic let like .<let f = fun x -> x in ...>. is a show-stopper. The let-binding within brackets becomes after translation the lambda-binding, which cannot be polymorphic. Although OCaml supports first-class polymorphism, it requires type annotations -- which means the translation cannot be a macro expansion/preprocessing done before the type-checking. Polymorphism in general is problematic in this simple approach: since a code value is translated to an expression, a polymorphic code value (e.g., let f = .<fun x -> x>.) will be translated into the monomorphic expression, due to the value restriction.

The remaining approach is to translate the staging away after type-checking. For example, fun x -> .<fun y -> .~x * y + 1>. is first type-checked, according to the modified OCaml rules to account for brackets and escapes. It is then translated to

     fun x -> build_fun_simple "y" (fun y ->
                mkApp (mkIdent "+")
                   [mkApp (mkIdent "*") [x; y];
                    mkConst 1])
The future-stage code is type-checked properly, including polymorphic constructs. The translation of staging looks like the one before, but it occurs after the type-checking. A future-stage bound variable of type t becomes the present-stage bound variable y, but at the type t code.

This is the approach taken in MetaOCaml. With staging annotations eliminated after the type checking, the OCaml back-end, the optimizer and the code generator, are used as they are. We must however modify the OCaml type checker to keep track of the stage (bracketing) level and to type check within brackets. In the original MetaOCaml, this modification was significant. Keeping track of the staging level is simple, using a global mutable cell `incremented' and `decremented' when encountering a bracket or escape. Type-checking the future-stage code is almost the same as the present-stage code with one exception. Identifiers bound by future-stage binding forms should be annotated with their stage. The original MetaOCaml added a new field to the data structure describing an identifier in the type environment. This has lead to the cascade of changes at every place a new identifier is added to the type environment. A new OCaml version typically modifies the type checker quite heavily. Integrating all these modifications into MetaOCaml, accounting for the new field, is a hard job. It is avoidable however: we may associate identifiers with levels differently, by adding a new map to the environment that tells the level of each future-stage identifier. Any identifier not in the domain of that map is assumed present-stage. This alternative helped BER MetaOCaml reduce the amount of changes to the OCaml type checker.

The result of a code-generating expression is a value of the type 'a cod. A code value could represent bits of low-level code; however, they are very difficult to compose. For example, depending on the context .<1>. may end up being a part of a machine instruction or a part of the data segment, with different representations. Since the generated code is assuredly well-typed, it does not have to be type-checked. Code combinators could therefore produce bits of the post-typechecking intermediate language. This language (type-annotated AST -- in OCaml, Typedtree) is also difficult to compose: maintaining the type annotations and their type environment is a chore. MetaOCaml code can generate arbitrarily many let-expressions. If code combinators produce the type-annotated AST, they have to extend the type environment with freshly bound identifiers and their types. If the generated let expressions are polymorphic, code generators should be refreshing the type variables. Realizing code values as an untyped AST is the simplest. AST can be pretty-printed to the surface syntax and then compiled as any other source code.

References
simple_staging.ml [5K]
An illustration of the simple implementation of staging: pre-processing to replace brackets and escapes with code combinators.

 

Data constructor restriction

Algebraic data types are one of the salient features of OCaml, which, alas, have not been considered in staged calculi. The theory therefore gives no guidance on staging the code that contains constructors of algebraic, user defined data types, such as the following:
     type foo = Foo | Bar of int
     .<let x = Foo in match x with Bar z -> z>.
The generated code, which can be stored in a file, is let x = Foo in match x with Bar z -> z. Compiling this file will fail since Foo and Bar are not declared. The problem is how to put a data type declaration into the generated code, which is syntactically an expression and hence cannot contain declarations.

The old MetaOCaml dealt with the problem by modifying the AST representing the generated code and adding the field for declarations (actually, the entire type environment). Such a change sent ripples of modifications throughout the type checker, and was one of the main reasons for the divergence of MetaOCaml from OCaml, which contributed to MetaOCaml's demise.

BER MetaOCaml since N100 eschews such AST modification, imposing a constructor restriction instead: all data constructors and record labels used within brackets must come from the types that are declared in separately compiled modules. The restriction follows from the observation that OCaml has no problem compiling code such as Some 1 with constructors of the (standard) data types. The following expressions all satisfy the restriction:

     .<true>.
     .<raise Not_found>.
     .<Some [1]>.
     .<{Complex.re = 1.0; im = 2.0}>.
     let open Complex in .<{re = 1.0; im = 2.0}>.
     .<let open Complex in {re = 1.0; im = 2.0}>.

because the data types bool, option, list, Complex.t are either Pervasive or defined in the (separately compiled) standard library. External type declarations like those of Complex.t are found in complex.cmi, which can be looked up when the generated code is compiled. On the other hand, the following fail the restriction and are rejected by BER MetaOCaml:

     type foo = Bar
     .<Bar>.
     
     module Foo = struct exception E end
     .<raise Foo.E>.

The type declaration foo or the module declaration Foo must be moved into an interface file and compiled separately. The compiled interface must also be available at run-time: either placed into the same directory as the executable, or somewhere within the OCaml library search path -- as if it were the standard library type.

However there is a way to include type declarations within the Parsetree.expression tree. For example,

     type foo = Foo | Bar of int
     .<let x = Foo in match x with Bar z -> z>.
could produce the following generated code
     let module M = 
       struct
         type foo = Foo | Bar of int
       end
     in let open M in
     let x = Foo in match x with Bar z -> z
which compiles without problems. Local modules in OCaml do effectively let declarations appear in expressions. The generated code becomes the closure over the data type environment. Therefore, the constructor restriction will be lifted in some future release. On the other hand, from the early experience the restriction does not seem to be very limiting, or hard to satisfy.

 

Moving free variables around but not letting them escape

MetaOCaml lets us manipulate open code. This section describes the complexities and trade-offs in making sure all free variables in such code will eventually be bound, by their intended binders. BER MetaOCaml N101 reverses the choice of its predecessors and trades a type-level check for a stronger and more informative dynamic scope-extrusion check.

In MetaOCaml, program code is first-class: it can be passed as arguments to functions, returned as their result, stored in mutable cells, thrown in exceptions, etc. That code may contain free variables, which must however occur in the typing environment in order for the MetaOCaml program to type check. One may think that free variables will, therefore, eventually be bound and the generated code will be closed. Running it should never fail because of an unbound variable. Alas, a well-typed BER MetaOCaml program may still attempt, when executed, to run an open code or construct ill-scoped code -- the code with a free variable that `escaped' its binder and hence will remain unbound or, worse, bound accidentally. In BER MetaOCaml N101, these attempts are detected early, aborting the execution of the generator with an informative error message. Previous versions of MetaOCaml used type system, so-called environment classifiers, to prevent some, rarely occurring, attempts to run open code. Alas, the other attempts went undetected and ill-scoped code was generated. Not any more.

Manipulating open code is overshadowed by two dangers. First, the mere presence of the run operation in the language brings the risk it is applied to the code that we have not yet finished constructing. Here is the characteristic example:

     .<fun x y -> .~(let z = !. .<x+1>. in .<z>.)>.
Prior to BER N101, MetaOCaml rejected such code at type checking time, with the help of so-called environment classifiers (Taha and Nielsen, POPL 2003). Back then, the type of an int code value was ('c,int) code, where 'c is the classifier. If the code value is closed, the classifier is generalizable. The running example was rejected with the type error:
     .<fun x y -> .~(let z = .! .<x+1>. in .<z>.)>.;;
                                ^^^^^^^
     .! error: 'a not generalizable in ('a, int) code
(prior to BER N101, the operation to run code was spelled .!).

BER N101 type checks the running example. Executing the resulting generator however aborts with the run-time exception:

     .<fun x y -> .~(let z = !. .<x+1>. in .<z>.)>.;;
     Exception:
     Failure
      "The code built at Characters 29-32:
       .<fun x y -> .~(let z = !. .<x+1>. in .<z>.)>.;;
                                    ^^^
      is not closed: identifier x_1 bound at Characters 6-7:
       .<fun x y -> .~(let z = !. .<x+1>. in .<z>.)>.;;
             ^
     is free".
The error message is informative, telling now the name of the free variable and pointing out, in the generator source code, the binder that should have bound the variable.

Moving open code around may result in ill-scoped code in another way, when the open code is stored in mutable cells, thrown in exceptions or captured in delimited continuations. Alas, environment classifiers cannot not detect such errors and bad things did happen. Here is an example using old MetaOCaml from 2006 (version 3.09.1 alpha 030). (The problem can be illustrated simpler, but the following example is more realistic and devious.)

     let c =
      let r = ref .<fun z->z>. in 
      let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
      .<fun x -> .~f (.~(!r) 1)>. ;;
     (*
     val c : ('a, '_b -> int) code =
       .<fun x_4 -> ((fun x_2 -> 0) ((fun y_3 -> x_2) 1))>.
     *)

One must look hard to see that x_2 is actually unbound in the resulting code (shown in comments, as the output from the interpreter). So-called scope extrusion happened. Only when we attempt to run that code we see that something went wrong somewhere.

     .! c;;
     (*
     Characters 77-78:
       Unbound value x_2
     
     Exception: Trx.TypeCheckingError.
     *)

Since we get an error anyway (without much diagnostics though), one may discount the severity of the problem. Alas, sometimes scope extrusion results in no error -- just in wrong results. A simple variation plasters over the problem:

     let c1 =
      let r = ref .<fun z->z>. in 
      let _ = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
      !r;;
     (*
     val c1 : ('a, '_b -> '_b) code = .<fun y_3 -> x_2>.
     *)
We then use c1 to construct the code c2:
     let c2 = .<fun y -> fun x -> .~c1>.;;
     (*
     val c2 : ('a, 'b -> 'c -> '_d -> '_d) code =
       .<fun y_1 -> fun x_2 -> fun y_3 -> x_2>.
     *)
which contains no unbound variables and can be run without problems.
     (.! c2) 1 2 3;;
     (* - : int = 2 *)

It is most likely that the user did not intend for fun x -> in c2 to bind x of c1. This is the blatant violation of lexical scope. And yet we get no error or other overt indication that something went wrong.

BER MetaOCaml N101 has none of this. Although the type system still permits code values with escaped variables, attempting to use such code in any way -- splice, print or run -- immediately raises an exception. For example, entering the expression c above in the top-level BER MetaOCaml N101 gives:

     Exception:
     Failure
      "Scope extrusion detected at Characters 127-137:
            .<fun x -> .~f (.~(!r) 1)>. ;;
                           ^^^^^^^^^^
      for code built at Characters 80-90:
            let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in
                                         ^^^^^^^^^^
      for the identifier x_2 bound at Characters 65-66:
            let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in
                          ^".
Although the expression c type checked, its evaluation was aborted with an exception and so no code was constructed. Prior to BER N100, c successfully evaluated to the code with an unbound variable. Not only the scope extrusion is detected now; it is reported with an informative message, telling which variable got away, which part of the generator was supposed to bind it, and where it eloped. In the second example, although c1 evaluates successfully (although printing the result reports scope extrusion), splicing c1 immediately aborts any further code generation. The exception again tells the name of the escaped variable, x, and the place in the source code of the generator where that variable was supposed to be bound. No ill-scoped code is generated as the result.

Let us review the trade-offs. Prior to N101, MetaOCaml relied on environment classifiers to prevent, at type-checking time, one class of errors: attempts to run not yet constructed code. Environment classifiers do not stop scope extrusion and the generation of ill-scoped code. From experience, the error the classifiers do prevent is rare (partly because the complex typing of the run form made it difficult to use except at the top level, at which point the code is always fully constructed). The extra type parameter of the code type proved to be a nuisance: every data structure that may contain code has to have a type with the environment classifier. Generator abstractions were cumbersome to define and use. Environment classifiers often gave false positives, because the value restriction prevented the generalization of the classifier.

The scope-extrusion check in BER N101 is dynamic -- it acts when the generator is run rather than when it is compiled. Replacing a type error with a dynamic error is regrettable. On the up side, the generator stops before producing bad code, throwing an exception with a fairly detailed and helpful error message, pointing out the variable that got away. Since the error is an exception, we can obtain the exception stack backtrace and figure out exactly which part of the generator let that variable leak. Before BER N100, we discover the scope-extrusion problem, in the best case, only when compiling the generated code and being taken aback by a compilation error. It could be quite a challenge in figuring out which part of the generator was to blame. The scope-extrusion check requires no extension to the type system and so is easier to implement and maintain.

Thus BER MetaOCaml N101 removed environment classifiers since they give good protection (a type error) against only rare errors, while being cumbersome always. The scope extrusion check, albeit dynamic, is comprehensive and informative, ensuring well-formedness even in the presence of effects. To repeat, since N100, BER MetaOCaml unconditionally guarantees that if code is successfully generated and can be shown or saved to the file, etc -- the code is well-typed and well-scoped. It shall compile without type errors.

 

Many ways to run the code

BER MetaOCaml is structured as the `kernel' and the `user-level'. The former type-checks and compiles staging annotations. Handling the generated code is up to user-level libraries. For example, metalib in the BER MetaOCaml distribution prints and byte-compiles code values. User-level MetaOCaml code can be changed and extended without recompiling MetaOCaml. Furthermore, changes in OCaml will have less effect on the user-level code.

BER MetaOCaml N101 comes with the `kernel' interface.

     type 'a code                            (* abstract: possibly open code *)
     type 'a closed_code = private Parsetree.expression
     
     val close_code : 'a code -> 'a closed_code
     val open_code  : 'a closed_code -> 'a code

The latter is total, the former does a scope extrusion check. Since closed code is essentially OCaml AST, after closing the generated code, the user may examine and `run' it in many ways. One way of running the code is printing it. The user-level print_code library comes with the following interface:

     val print_code         : Format.formatter -> 'a code -> unit
     val print_closed_code  : Format.formatter -> 'a closed_code -> unit
     
     val format_code : Format.formatter -> 'a closed_code -> unit
     
     val print_code_as_ast : 'a closed_code -> unit

The former two functions pretty-print the code; print_code is the composition of close_code and print_closed_code. These two functions are installed as printers of code values by the MetaOCaml toplevel. The function format_code is like print_closed_code, but omits the outer brackets. It comes handy when saving the generated code into a file, to compile later. The last function prints the closed code value as a parse tree, for debugging.

To execute the code, the user-level MetaOCaml currently provides val run_bytecode : 'a closed_code -> 'a to run closed code by bytecode compiling it and then executing. The function Runcode.run : 'a code -> 'a and its alias, the prefix operation (!.), are the composition of close_code and run_bytecode.

More run-like functions are easy to add: they are regular functions to be linked with or loaded at the toplevel as any other library functions. There is no need to recompile MetaOCaml any more. The old MetaOCaml had so-called `off-shoring' to translate the generated code to C and Fortran. For example, it was used to generate FFT kernels. The generated C code was good enough to plug as it was into the FFTW benchmarking and testing framework. During the overhaul the offshoring has been left for clean-up and re-writing. It can now be brought back, as another run-like function.

 

What about the native MetaOCaml

Currently, BER MetaOCaml supports byte-compilation of the generated code and linking it back into the running program -- that is, run-time code specialization. Executing byte-complied code is relatively slow, especially for numeric code. Performance-sensitive OCaml applications are compiled with the ocamlopt, into machine code. It is frequently asked therefore when MetaOCaml would support machine-code compilation of the generated code.

Let us first clear a misconception that byte-code MetaOCaml limits the compilation of the generated code to be byte-code only. In two largish projects MetaOCaml was used to generate the optimal code for FFT kernels and for Gaussian Elimination. Gaussian Elimination is a large family of the algorithms, parameterized by the domain (int, float, polynomial), by the choice of pivoting, by the need to compute the determinant, rank, the permutation matrix, by the layout of matrices and by a half a dozen more aspects. Because Gaussian Elimination is often used in inner loops, the performance matters a great deal. Therefore, all this parameterization has to be done statically. With Jacques Carette we generated many GE procedures for particular parameter choices. The generated code was stored in files, to be compiled in a library, by ocamlopt. Although the code generator was byte-code OCaml, the generated code could be stored and compiled by the optimizing, native OCaml. We used the byte-code run, but only for testing. The FFT project was similar: the generator was byte-code, the byte-code run was used for testing, the generated code was C and compiled with the Intel C compiler.

To stress, the byte-code generator produces the code to be compiled by any suitable compiler. The mode (byte-code or native) of the generator and of the generated code are not related. Hence the current BER MetaOCaml is already useful in practice -- but more could and should be done.

Thus in principle one can already use ocamlopt to compile the generated code. If the code includes CSP only as literals or external references (and any code can be arranged that way), the code can be stored into a file and then passed to ocamlopt. The result can be linked back using OCaml's own dynlink. What is needed is to make this process automatic and convenient. A future version of MetaOCaml will certainly include a more convenient tools for native compilation. Such a tool is `user-level'; developing it could be a good student project.

References
Generating optimal FFT code and relating FFTW and Split-Radix

Multi-stage programming with functors and monads: eliminating abstraction overhead from generic code
Joint work with Jacques Carette on generating many variations of Gaussian Elimination

 

Cross-stage persistence

Cross-stage persistence (CSP) is the ability to use the same expression, value or identifier at several stages: now and later. For example, Pervasives.succ increments integers now: succ 1. The future-stage expression .<succ 1>. -- which will increment 1 later, when this code is compiled and run -- refers to the same succ. Such a reference is CSP. In this section we describe CSP in the original and BER MetaOCaml, their problems, and the plans for the future. Overcoming the problems helps in the development of the native MetaOCaml. Cross-stage persistent expressions, present in some calculi can always be emulated in terms of CSP identifiers such as x in .<.~(let x = e in .<x>.)>.. Therefore, in the following we will be talking only about cross-stage persistent identifiers.

Staged calculi such as lambda^a (Taha and Nielsen, 2003) and lambda_i (Calcagno, Moggi and Taha, 2004) impose no restrictions on CSP. Neither does MetaOCaml, letting us write

     let polylift x = .<x>.;;
       val polylift : 'a -> 'a code = <fun>
Such polymorphic CSPs are well-defined in staged calculi that formalize a small and pure subset of real-world staged languages. Outside that subset, problems emerge. For example, what is the meaning, if any, for CSP of mutable cells or input channels:
     let r = ref 0 in
     let x = !. .<let () = r := 1 in !r>. in
     (x, !r)
     
     let c = open_in "/etc/motd" in .<c>.
       in_channel code = .<(* cross-stage persistent value (id: c) *)>.
Is r above shared or copied between the stages? What can a presently-opened file possibly refer to when the the code .<c>. is run later? After all, the generated code may be saved into a file and compiled into a library, to be linked into programs that may run on a different, from the generator, computer. Generating libraries of specialized computational kernels is an important application of MetaOCaml.

It seems important to separate CSP of global identifiers from CSP of local ones. Incidentally, Template Haskell has such a distinction (Curry, too, treats local and global identifiers differently). The expression .<succ>. referring to the global identifier succ seems no different from .<1>. After all, 1 and succ are both constants, of different types. The definition of a calculus or a logic typically typically begins with the phrase: assume a set Sigma of constants (such as numerals, succ etc.) of appropriate types and arities. In the code

     let x = 1 in
     let y = .<1>. in ...
1 on the first line refers to the constant of Sigma (``the standard library constant'', so to speak) from the present-stage calculus and 1 on the second line is the element of Sigma at the next stage. The code
     let x = succ in
     let y = .<succ>. in ...
should be interpreted the same way. It is crucial that Sigma at both stages be the same. In other words, the computer that runs the generator and the computer that runs the generated code must have the same standard library, which is broadly understood to include user libraries and .cmi and .cmo files. Thus CSP of global identifiers should be understood as references to the standard library.

CSP of a local identifier should be interpreted as a lift (serialization), valid only at specific types. Hence, let x = 1 in .<x+2>. should be interpreted as let x = 1 in .<.~(lift_int x) +2>.. Here lift_int is an internal function that serializes an integer value. Serialization always copies the value.

To summarize the CSP proposal: CSP of identifiers of base types such as int is always interpreted as lift -- because it is simpler and sharing at these types is not observable anyway. CSP of a global identifier is interpreted as a reference to a library constant. CSP of other identifiers -- whose type is an immutable record, variant, product -- is interpreted as lifting, which always copies the value. CSP at all other types, including polymorphic, reference, abstract with an unknown manifest type, is prohibited. The problematic polylift becomes inexpressible.

What if we really need to cross-stage persist functions, code, file descriptors? First one should decide on the meaning of such CSP. To implement them, one could bind such values to global identifiers, whose CSP is possible at all types. If the value to persist cannot be bound to a global identifier, the following trick may help:

     let glob = ref None
     let res = ... glob := Some v; ...
               .<fromSome !glob>. 
This is how CPS used to be implemented in the native MetaOCaml. Now, this trick has to be explicit.

In short, only global, `standard library' identifiers can be assumed to be defined at the time the generated code is compiled and run. Therefore, when such identifiers occur within brackets, the generated code can refer to them by name. For other identifiers, we CSP by value, or lift, which works only at specific types. I am very interested to know how much code has to be changed if this proposal is implemented. The feedback is greatly appreciated.

 

Further plans

MetaOCaml has an extensive research and development program. On the research front is developing a staged calculus that accounts for objects, modules, and GADTs. Since modules and objects are binding forms, their staging is complex. We must perform appropriate renaming and check that the generated identifiers are not left unbound in the resulting code. Also interesting is the integration of staging and modules (which have recently became first-class). Since both staging and modules are abstractions over code, can staging even be used as a module system? Another item is to prove that the restriction on CSP avoids the unsoundness problems with generalizations.

Currently cross-staged persistence is implemented in byte-code via a so-called Const_csp_value, a new sort of constants that merely point to a value in the heap. On one hand, it is tempting to eliminate it (CSPs may be compiled as Const_base or Const_block or Const_immstring -- with copying semantics). On the other hand, Const_csp_value could be useful in the core OCaml, for compiling big array constants. Syntactic extensions in particular could use Const_csp_value.

User-land development is most extensive: we need more ways to `run' code values, by translating them to C, Fortran, LLVM, Verilog. MetaOCaml can then be used for generating libraries of specialized C, etc. code. User-land MetaOCaml development has the least cost of entry (no need to know OCaml internals) and is wide open for developers. Everyone can contribute, everyone is welcome.

 

A brief history of (BER) MetaOCaml

BER MetaOCaml sprang out the original MetaOCaml and owes a great deal to the lessons of its implementation. BER MetaOCaml builds on what has been proven to work. The motivation for it becomes clear from MetaOCaml's history.

A curious file in the original MetaOCaml CVS repository lists several short notes from the meeting in Paris on September 29, 2000 between Xavier Leroy and Walid Taha. One note stands out: ``we (Xavier!) were able to compile and run "run bracket 1"!!'' One may take this moment as the birth of MetaOCaml.

MetaOCaml started as a fork of OCaml along the lines of MetaML. Designed and architectured by Walid Taha and developed by Cristiano Calcagno, MetaOCaml had reached its current form by 2003. The environment classifiers appeared at the end of that year. Native code compilation and offshoring (translating the generated code into C and Fortran) were added in 2004.

As MetaOCaml was being developed, new versions of the mainline OCaml were released with sometimes many fixes and improvements. The MetaOCaml team tracked new OCaml releases and merged the changes into MetaOCaml. (The MetaOCaml version number has as its base OCaml's release version.) The merge was not painless. For example, any new function in the OCaml compiler that dealt with Parsetree (AST) or Typedtree has to be modified to handle MetaOCaml extensions to these data structures. The merge process became more and more painful as the two languages diverged. For instance, native code compilation that first appeared in MetaOCaml 3.07 relied on SCaml, a large set of patches to OCaml by malc at pulsesoft.com to support dynamic linking. OCaml 3.08 brought many changes that were incompatible with SCaml. Therefore, in MetaOCaml 3.08 the native compilation mode was broken. The mode was brought back in the Summer 2005, by re-engineering the SCaml patch and implementing the needed parts of dynamic linking without any modification to the OCaml code. The revived native compilation has survived through the end.

In 2006, MetaOCaml became the basis for Concoqtion. The increasing divergence between OCaml and MetaOCaml made it harder and harder to merge the changes. With the funds for the project dried up and the daunting prospect of merging many changes that appeared in OCaml 3.10 and 3.11, the development of MetaOCaml has ceased. Its last released version was 3.09.1 alpha 030.

Although the original development team could no longer support MetaOCaml, the users of the language, Jacques Carette in particular, could not bear it fall by the wayside. At the Jacques Carette's urging, the merge of OCaml 3.11 changes was undertaken in February 2010. No funding and limited resources made it imperative to keep close to the mainline OCaml and avoid the divergence at all cost. Everything not directly related to staging was removed, including tag elimination and Concoqtion. 23 fewer files in the OCaml distribution were affected by MetaOCaml. Walid Taha suggested to name the new, bare-minimum line of MetaOCaml as BER MetaOCaml. Its first version N002 was released on March 1, 2010.

BER MetaOCaml established the separation of the kernel from the user-level. Like with an OS kernel, changes to the MetaOCaml kernel require `rebooting' (recompiling the MetaOCaml) and may render the system unusable. User-level changes affect only particular programs and are easier to develop. The MetaOCaml kernel is responsible for producing and type-checking code values. User-level deals with processing, or `running' code values: pretty-printing, executing, compiling to byte-code, to native code, to C, etc. Programmers may write new ways of processing code values -- for example, to compile them to LLVM or JavaScript -- without modifying OCaml or BER MetaOCaml. In the BER N002 distribution, the kernel included the 59KB patch to OCaml distribution and the 54KB-long typing/trx.ml; user-level consisted of a pretty-printer (59KB source code) and the support for byte-code execution of the generated code and the custom top level (3KB, all in all). There was also 20KB of test code.

The 59KB patch to OCaml seems large. Part of it is necessary. Since BER MetaOCaml extends the AST data type, Parsetree, with variants Pexp_bracket, Pexp_escape, Pexp_run, Pexp_cspval for the staging constructs, any part of OCaml that processes the AST has to be modified. Quite many parts of a compiler deal with AST. In a multi-stage language, each expression is associated with a stage level, present-stage or one of the progressively future stages. The original MetaOCaml stored the level in an extra field val_level of the value_description record that maintained type and other information for each identifier. Since the map from identifiers to value_description is the typing environment, it is used all throughout the type checker. Therefore, the addition of a new field required great many changes. A new OCaml version typically modifies the type checker quite heavily. Integrating all these modifications into MetaOCaml, accounting for the new field val_level, is a hard job. It is avoidable however: we may associate identifiers with levels differently, by adding a new map to the environment that tells the level of each future-stage identifier. Any identifier not in the domain of that map is assumed present-stage. The value_description record then remains intact. BER MetaOCaml N100 (see below) takes this route. The original MetaOCaml modified the AST further, to store the information about data constructors used in the generated code (actually, the AST that corresponded to the generated code contained the entire present-stage environment of the type-checker.) Great many patches to OCaml came that way. BER N100 imposed the constructor restriction as described elsewhere on this page, eliminating the extra AST modifications and the need to patch OCaml further. Thus, if one takes as the goal bringing MetaOCaml closer to OCaml, the amount of modifications to OCaml can indeed be minimized -- as BER N100 has shown.

OCaml release 3.12 brought unexpectedly many new features to the language, for example, first-class modules. OCaml 4.0 added GADTs and dynamic linking. Integrating that many changes into BER MetaOCaml N002 was too depressing to contemplate. BER MetaOCaml N100, developed in January 2013 and released on January 31, is a clean-state re-implementation of MetaOCaml. Rather than merging OCaml 4.0 into the MetaOCaml branch, BER N100 started anew. It took the OCaml distribution, extended its AST with the staging constructs, and kept adding code until the constructs were fully handled -- until there were no inexhaustive pattern-matching throughout the OCaml code. The exhaustiveness check was very helpful. Great care was taken to modify as fewer OCaml data structures as possible. Here is the break-out of the BER N100 distribution: the kernel consists of the 49KB patch to OCaml files (modifying 28 OCaml files, 6 of which trivially) and the 77KB-long completely re-written typing/trx.ml with very many comments. User-level includes the 58KB pretty-printer (heavily modified by Jacques Carette), and a number of small files to support top-level and run, 3KB all in all. There are 54KB of tests, including, for the first time, a regression test suite. BER N100 is much less invasive into OCaml: compare the size of the patch to the main OCaml type-checker typing/typecore.ml, which contains the bulk of the changes for type checking the staging constructs. In the previous version BER N004, the patch had 564 lines of additions, deletions and context; now, only 328 lines.

BER MetaOCaml N101, released in November 2013, cleaned-up MetaOCaml further. The positive experience with the scope extrusion check has made it possible to remove environment classifiers, and hence notably simplify the type checking of staging constructs. The operation to run the code no longer required dedicated type checking rules. In BER N101, it became an ordinary function, moved out of the MetaOCaml kernel. Internally, BER N101 improved the scope extrusion check and optimized the translation of staging constructs, especially binding forms. Pretty-printing of the code has become part of OCaml and hence no longer has to be maintained separately.

Developing and especially maintaining a new language is gigantic work. Engineering a new language as a dialect of a well-maintained one is hard but pays in many ways.

References
The MetaOCaml files: Status report and research proposal
The extended abstract published in the informal proceedings of the 2010 ACM SIGPLAN Workshop on ML.

Chung-chieh Shan. Slides of the talk at the ML workshop. Baltimore, MD, September 26, 2010.
< http://www.cs.rutgers.edu/~ccshan/metafx/metaocaml-slides.pdf >

ml2013-talk.pdf [200K]
MetaOCaml lives on: Lessons from implementing a staged dialect of a functional language
The annotated slides of the talk at the ACM SIGPLAN Workshop on ML, September 22, 2013. Boston, MA, USA

The Design and Implementation of BER MetaOCaml

 

Credits

BER MetaOCaml is a complete re-implementation of the original MetaOCaml.

The original MetaOCaml was funded primarily by the NSF project "ITR/SY(CISE): Putting Multi-Stage Annotations to Work" led by Walid Taha. Most of the original development and implementation of staging was done by Cristiano Calcagno. Edward Pizzi has implemented the pretty-printing of code, later modified Jacques Carette. Xavier Leroy, INRIA, helped with the compiler specifics.

I am very grateful to Jacques Carette, Walid Taha, Chung-chieh Shan, Cedric Cellier and Jun Inoue for many helpful discussions and encouragement. Reports and testing by Tran Minh Quang and Mustafa Elsheik are appreciated.



Last updated June 9, 2014

This site's top page is http://okmij.org/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML