Optimizations of tagless-final embedded DSLs are possible after all.
They are typed and patently type-preserving. We present the
optimization framework inspired by the normalization-by-evaluation and
demonstrate that tagless-final optimizations are also modular,
convenient and composable. We contrast them with the optimizations
over the familiar deep DSL embedding as a data type.

The optimizations presented on this page are essentially term re-writing. The follow-up course develops optimizations in a semantic style, based on calculation rather than re-writing.

- Introduction
- Tagless-final, systematic optimizations in Haskell and OCaml: Tutorial
- Tagless-final primer
- Warm-up: applying algebraic laws
- Optimization framework
- Zero elimination within the framework
- Linearizing additions by re-association
- Extending the language
- Conclusions
- Tagless-final optimizations, algebraically and semantically

The follow-up course

- The tagless-final embedding of a (domain-specific) language (DSL) in
a host language centers around multiple interpreters. One interpreter
may simply evaluate a DSL expression whereas another may pretty-print
it or do some sort of static analysis. One question inevitably asked
at any presentation of the approach is what to do about operations
that are not thought of as interpreters, which are not (easily)
expressible as folds over the DSL expression. The typical example is
optimizations, which re-organize code in a context-sensitive way.
This article will demonstrate not only that context-sensitive transformations of tagless-final DSLs are possible -- such optimizations are modular, convenient and composable. They are also typed and type-preserving, which is clear not only to us but also to the compiler. As we extend the DSL with new expression forms (such as first-class functions, conditionals, exceptions, etc.) we may still be able to use the previously written optimizations as they are. Many optimizations, such as arithmetic expression simplifications, deal only with a subset of the language. The tagless-final approach lets us express them succinctly, abstracting from the irrelevant parts of the language. Not only optimizations are clearer; they are also robust with respect to language extensions, applicable to any extended language.

We will distinctly see the advantages of the tagless-final DSL embedding compared to the ``deep embedding'', that is, representing the DSL as a data type (AST) in the host language. The extensibility is not the only advantage. Although optimizations are AST transformations, each optimization prefers its own AST form. (The abstract syntax tree may have several subtly different realizations, as we shall see later.) The tagless-final framework lets each optimization work with its own preferred AST view.

This article is written as a tutorial of the tagless-final optimization framework and uses a very simple example. The framework has been successfully used in a complex, practical application, of compiling and optimizing language-integrated queries. Although this tutorial uses OCaml, it may just as well be presented in Haskell.

**Version**- The current version is September 2014
**References**- optimization_intro.ml [13K]

The complete OCaml code for this tutorialTagless-final, systematic optimizations in Haskell and OCaml: Tutorial

The extended version of the tutorial on optimizations in the tagless-final style -- with the different running example: the DSL of logic circuits. The tutorial has been given in Haskell and OCaml.Finally, Safely-Extensible and Efficient Language-Integrated Query

The present framework has been used for real-life projects, such as the compositional and extensible language-integrated query. The user-composed query expression is optimized into the form from which the efficient SQL (without nested subqueries) is generated.Tagless-final optimizations, algebraically and semantically

The follow-up tutorial, stressing semantics of optimizations, their correctness, and calculation rather than re-writing

- This section gives a brief introduction to the tagless-final
approach, explaining how to embed a DSL and write its interpreters
and transformers. The sample DSL is the simplest, to be extended later.
It has only integer literals and the addition operation.
Its abstract syntax tree could be represented by the following data
type:
type expr = Int of int | Add of expr * expr

This data type represents the ``deep'' embedding of the DSL in OCaml. This is*not*the road we take. Rather, we introduce the signature for the two, so far, basic forms of our language; integer literals and the addition operation:module type SYM = sig type 'a repr val int: int -> int repr val add: int repr -> int repr -> int repr end

Here, the (for now, abstract) type`'a repr`

stands for*some*representation of language expressions, indexed by the expression type. One may think of`SYM`

as specifying the DSL syntax as a context-free grammar: just read`int repr`

as ``the start symbol'' and`val int`

and`val add`

as two productions labeled`int`

and`add`

. We do not need know what`'a repr`

really is to write sample DSL expressions, such as:module Ex2(I:SYM) = struct open I let res = add (add (int 1) (int 2)) (add (int 3) (int 4)) end

(As above, the name`I`

will be used for a generic interpreter, an instance of`SYM`

.)To evaluate that expression we write an interpreter for it:

module Ru = struct type 'a repr = 'a let int x = x let add = (+) end

The DSL operations are mapped directly to the corresponding OCaml operations:`Ru`

is a meta-circular interpreter for the tiny subset of OCaml. Correspondingly, a DSL expression is represented by the OCaml value it evaluates to.`SYM`

hence can also be viewed as the signature of the DSL interpreters -- or, in loftier terms, as the specification of the denotational semantics for the language:`'a repr`

defines the domain, and`int`

and`add`

give the meaning to DSL integer literals and the addition in this domain. The denotation for complex expressions is determined compositionally, from the denotations of their sub-expressions. To evaluate the sample`Ex2`

we interpret it with the`Ru`

interpreter:`let module M = Ex2(Ru) in M.res`

, which gives the OCaml value 10, the denotation of`Ex2`

in`Ru`

as the OCaml integer.The interpreter

`Ru`

is not the only way to give the meaning to DSL expressions. We way also interpret them as strings, so to display them. The denotation for an expression is hence its printed representation:module Sh = struct type 'a repr = string let int = string_of_int let add x y = "(" ^ x ^ " + " ^ y ^ ")" end

Interpreting the same`Ex2`

using`Sh`

, as`let module M = Ex2(Sh) in M.res`

, now gives the string`"((1 + 2) + (3 + 4))"`

.Besides evaluating DSL expressions we may also want to transform them. A simple example is negating the expression. The negation transformation can also be written as an interpreter (after all, interpreting is the only thing we can do with tagless-final expressions).

module Neg(From:SYM) = struct type 'a repr = 'a From.repr let int x = From.int (-x) let add e1 e2 = From.add e1 e2 end

`Neg`

interprets the DSL in terms of another interpreter,`From`

(which we will be calling just`F`

). On the surface of it,`Neg`

is the interpreter transformer. Dually, it can also be regarded as the*expression*transformer:module Ex2Neg(F:SYM) = Ex2(Neg(F))

`Ex2Neg`

has the same type as the original`Ex2`

: given an interpreter`SYM`

it computes the denotation of its DSL expression in that`SYM`

's domain. That is,`Ex2Neg`

is a tagless-final representation of a DSL expression -- namely, the negated one. It can be interpreted with the existing`Ru`

and`Sh`

interpreters, or even the`Neg`

interpreter (i.e., it can be negated again):let module M = Ex2Neg(Ru) in M.res -10 let module M = Ex2Neg(Sh) in M.res "((-1 + -2) + (-3 + -4))" let module Ex2NegNeg(F:SYM) = Ex2Neg(Neg(F)) in let module M = Ex2NegNeg(Ru) in M.res 10

(The evaluation result is shown, indented, underneath each expression.)In the following we write more such DSL transformers -- the interpreters that not only evaluate or print an expression but also to re-write it, in generally context-sensitive, apparently non-compositional ways.

- Before introducing the optimization framework, let us write
a simple standalone optimization, using the standard technique
of partial evaluation. Although easy to implement,
the optimization is not easy to extend and harder to
combine with others. We later re-implement it within the framework
and see how the drawbacks are rectified.
The sample optimization is simple: apply the algebraic laws of addition

`x+0 = 0+x = x`

to eliminate the additions to zero. We assume for simplicity that all integer literals are non-negative. It is an exercise to the reader to remove that assumption.The optimization is expressed as an interpretation in the domain of

*partially-known values*, which track and propagate the available static information. In our case, this information is whether the value is statically known to be zero.module SuppressZeroPE(F:SYM) = struct type 'a repr = {dynamic: 'a F.repr; known_zero : bool} let int x = {dynamic = F.int x; known_zero = (x = 0)} let add e1 e2 = match (e1,e2) with (* add is not recursive! *) | ({known_zero=true},x) -> x | (x,{known_zero=true}) -> x | ({dynamic=x},{dynamic=y}) -> {dynamic=F.add x y; known_zero=false} end

As typical of partially-known values,`'a repr`

has two components. The`dynamic`

part is the (transformed) expression to be interpreted by the source`F`

interpreter. Since this interpreter is abstract at this point, nothing is known about`'a F.repr`

. The second component of`'a repr`

tells if this partially-known value is in fact fully known, as zero. We find this out when interpreting DSL integer literals, which are known literally. The interpretation of the addition propagates this knowledge and applies the algebraic laws in the straightforward way.Let us take a moment to convince ourselves of the correctness. The inductive structure of interpreters makes structural induction proofs straightforward. First we need a lemma that if

`e`

is a value of the type`'a repr`

and`e.known_zero`

is true,`e`

represents zero. The proof is trivial. The soundness can be stated as the theorem that if`E`

is a tagless-final encoding of an expression and`I`

is an interpreter in which the monoid law of addition holds, then`E(I).res`

and`E(SuppressZeroPE(I)).res.dynamic`

have the same value. We can also prove a stronger property: if`e`

is a value of the type`'a repr`

, it represents zero if and only if`e.known_zero`

is true. The theorem clearly holds for the base case`int x`

. In the inductive case`add e1 e2`

we apply the inductive hypothesis to`e1`

and`e2`

and the assumption that literals are non-negative. The stronger property entails the completeness of the optimization: if`e`

is a value of the type`'a repr`

then`e.dynamic`

represents the expression that has no additions to zero. The reader is encouraged to write the detailed proof, perhaps with the help of their favorite proof assistant.The following example will demonstrate the optimization:

module Ex3(I:SYM) = struct open I let res = add (add (int 3) (add (int 5) (int 0))) (add (int 1) (int 0)) end

Optimizing it and then getting the`Sh`

interpreter to print out the result produceslet module Ex3NoZerosPE(F:SYM) = Ex3(SuppressZeroPE(F)) in let module M = Ex3NoZerosPE(Sh) in M.res - : int SuppressZeroPE(Sh).repr = {SuppressZeroPE(Sh).dynamic = "((3 + 5) + 1)"; known_zero = false}

The optimization has worked: zero summands have disappeared. The OCaml output also shows the problem: what is printed is the partially known value. These values are used during the optimization; at the end we would like just the final result. There should be a way to observe the optimization result and spare the user the internals of the optimizer. The second problem is extensibility: if we extend our DSL with, say, first-class functions, we likewise have to extend`SuppressZeroPE`

to interpret the new expression forms (function abstraction and applications) in the`'a SuppressZeroPE(Sh).repr`

domain. These new interpretations are simple homomorphisms. They are irrelevant for the zero-addition elimination and are annoying boilerplate. More seriously, they break modularity: every time we extend the language we have to change all optimizations, even those that are not affected by the extension. Our optimization framework overcomes these problems.

- We now introduce the optimization framework for writing modular and
composable optimization passes over embedded DSL. When the DSL is
extended with forms irrelevant for a given optimization, its code does
not have to be changed or even recompiled. We do not need to keep
writing boilerplate homomorphic interpretations. Once developed,
optimization passes can be assembled into chains.
In this section, we build the framework. The next two sections
show it off on two sample optimizations.
In the previous section we have seen that tagless-final DSL transformations are written as interpreter transformations. The transformed interpreter uses the suitable representation for DSL expressions, from which the optimization result is extracted, or `observed', at the end. It makes sense therefore to add the observation function to the interpreter signature:

module type SYMOBS = sig include SYM type 'a obs val observe: 'a repr -> 'a obs end

In our`Ru`

and`Sh`

interpreters, the representation type is already suitable for observation and so`observe`

is just the identity. The following functor will add such a trivial observation:module AddObs(I:SYM) = struct include I type 'a obs = 'a repr let observe x = x end

Let us recall again the`SuppressZeroPE`

optimizer from the warm-up section:`SuppressZeroPE(F)`

interprets DSL expressions as values of the data typetype 'a repr = {dynamic: 'a F.repr; known_zero : bool}

that includes`'a F.repr`

, the interpretation of the expression in some`F`

interpreter (which evaluates the optimized expression). In addition,`'a repr`

contains extra data used during the optimization. The zero-elimination optimization evaluates the DSL expression and hence builds`'a repr`

bottom-up. The`dynamic`

part is the optimized`F`

interpretation and is extracted at the end. The following captures the pattern of two expression representations, one with the extra data for the optimization:module type Trans = sig type 'a from type 'a term val fwd : 'a from -> 'a term (* reflection *) val bwd : 'a term -> 'a from (* reification *) end

The type`'a from`

is the representation of a DSL (sub)expression in the`From`

interpreter; one may think of it as the optimization result so far. Since the`From`

interpreter is abstract, nothing is known about`'a from`

values and they cannot be inspected. The type`'a term`

represents the same DSL (sub)expression; it is also the optimization result so far. Unlike`'a from`

, the values of`'a term`

are known, at least to some extent, and can be inspected. These values contain the working data for the optimization, its state. The optimization process builds the`'a term`

values bottom up, inspecting those for the already optimized sub-expressions. The next two sections show the examples. The two representations are related, by the operations`fwd`

and`bwd`

. The latter is used in particular at the very end to extract the optimization result from the working data`'a term`

. The operation`fwd`

builds the working data. The two operations do not generally define bijection: usually`fwd`

is not surjective and`bwd`

is not injective. That is, although the composition`bwd . fwd`

should be the identity,`fwd . bwd`

is typically not. One may think of`bwd`

as an `abstraction' function: it throws away the data used internally during the optimization, the optimization state. Then`fwd`

is a concretization function, which initializes the optimization state, to some default, empty value. The composition`fwd . bwd`

is not the identity because the thrown away optimization state is typically richer than the default state.The two types and their relationship reminds one of the normalization-by-evaluation (NBE). In NBE,

`'a from`

will be called an opaque object representation;`'a term`

will then be a meta-language representation, the domain of the (non-standard) evaluation. The function`fwd`

is then reflection and`bwd`

is reification.To complete the framework we define a default, generic optimizer: given a particular instance of

`Trans`

(we reserve the name`X`

for such instances), the optimizer builds the`'a term`

bottom-up, from which the`'a from`

value can be extracted, or observed.module SYMT(X:Trans)(F:SYMOBS with type 'a repr = 'a X.from) = struct open X type 'a repr = 'a term type 'a obs = 'a F.obs let int x = fwd (F.int x) let add x y = fwd (F.add (bwd x) (bwd y)) let observe x = F.observe (bwd x) end

The reader may have noticed that this optimizer is the silly identity transformer: the optimization state initialized by`fwd`

is not used for anything and is later thrown away by`bwd`

. Therefore, the`SYMT`

-optimized`Ex2`

is the same as the original`Ex2`

: to be precise,`module M = Ex2(SYMT(X)(I)) in M.observe M.res`

will always give the same result as`module M = Ex2(I) in M.res`

for any`X`

and interpreter`I`

. Concrete optimizers are built as 'deltas' to`SYMT`

, overriding the interpretations of some DSL forms and actually using the optimization state. Interpretations of the other forms, irrelevant for the optimization, remain homomorphisms. The example is next. **References**- Peter Dybjer and Andrzej Filinski:
Normalization and Partial Evaluation

G. Barthe et al. (Eds.): Applied Semantics, LNCS 2395, pp. 137-192, 2002.

- To show off the optimization framework, we re-implement the
optimization from the warm-up section that eliminated additions to
zero.
Defining an optimization pass within the framework means specifying two things. First is the optimization state, the data that govern the optimization. They are defined by writing an instance of

`Trans`

. Next we need to tell how to use this optimization state and actually do the optimization. It is specified as a `partial interpreter': an interpreter for only those DSL forms that are affected by the optimization. We will be calling that second part`IDelta`

. It is convenient to put the two parts inside a functor, since they are both parameterized by the`From`

interpreter, which interprets the optimized expression. In the case of the zero-addition elimination, the optimization pass looks as follows.module SuppressZeroPass(F:SYM) = struct module X = struct type 'a from = 'a F.repr type 'a term = | Unk : 'a from -> 'a term | Zero : int term (* if the expression is zero *) let fwd x = Unk x (* generic reflection *) let bwd : type a. a term -> a from = function | Unk x -> x | Zero -> F.int 0 end open X module IDelta = struct let int n = if n = 0 then Zero else Unk (F.int n) let add : int term -> int term -> int term = fun x y -> match (x,y) with | (Zero,x) -> x | (x,Zero) -> x | (Unk x, Unk y) -> Unk (F.add x y) end end

The type`'a term`

is the GADT with two variants:`Zero`

for the literal zero and`Unk`

for a DSL expression whose zeroness is statically unknown or forgotten. The GADT hence tells the minimum we need to know to carry out the zero-addition elimination. The operation`fwd`

marks its argument as unknown, and`bwd`

forgets all static knowledge, producing the opaque`'a from`

value. This is the typical pattern. It should be clear that`'a term`

, albeit a data type, is not at all the deep DSL embedding (such as the`expr`

data type from the Primer section). For one,`'a term`

is not recursive. Second, it makes explicit only the bare minimum -- in our case, if the expression is literally zero or not.The partial interpreter shows how the literal zeroness is determined and used. The

`match`

statement in the interpretation of`add`

expresses the rules of optimization clearly and directly. Since our language is so simple,`IDelta`

is in fact the full interpreter. Later on, we extend the DSL with more expression forms; the`IDelta`

however will remain the same.The complete optimizer is built by combining the optimization pass with the default optimizer

`SYMT`

and overriding the default interpretations by those in`IDelta`

. The result is the full, optimizing interpreter:module SuppressZero(F:SYMOBS) = struct module OptM = SuppressZeroPass(F) include SYMT(OptM.X)(F) include OptM.IDelta (* overriding int and add in SYMT *) end

`Ex3`

from the previous section demonstrates the optimization. The result is shown underneath the expression, indented.module SuppressZeroSh = SuppressZero(AddObs(Sh)) let module M = Ex3(SuppressZeroSh) in SuppressZeroSh.observe M.res "((3 + 5) + 1)"

The next section shows a more interesting optimization and its composition with the present zero-elimination pass.

- More interesting is the linearization of additions by re-associating
them to the right:
`(add 1 (add 2 (add 3 ...)))`

. In this form, the first argument of any addition is not itself an addition. Such expressions can be compiled to a sequence of simple addition instructions over a single accumulator (think of Z80). We later compose this optimization with the zero-elimination of the previous section.The implementation follows the established pattern, of defining an instance of

`Trans`

with the data type`'a term`

representing the optimization state, and the partial interpreter`IDelta`

that builds and propagates this state.module LinearizeAddPass(F:SYM) = struct module X = struct type 'a from = 'a F.repr type 'a term = | Unk : 'a from -> 'a term (* non-empty difference list *) | Add : (int from -> int from) * int from -> int term let fwd x = Unk x (* generic reflect *) let bwd : type a. a term -> a from = function | Unk x -> x | Add (f,n) -> f n end open X module IDelta = struct let int n = Add ((fun x -> x),F.int n) let add : int term -> int term -> int term = fun x y -> match (x,y) with | (Add(fx,nx), Add(fy,ny)) -> Add ((fun nil -> fx (F.add nx (fy nil))),ny) | (Unk x, Add(fy,ny)) -> Add ((fun nil -> F.add x (fy nil)),ny) | (Add(fx,nx), Unk y) -> Add ((fun nil -> fx (F.add nx nil)),y) | _ -> Unk (F.add (bwd x) (bwd y)) end end

Again, some may think that the data type`'a term`

is the deep DSL embedding. It cannot be: it is not recursive. Further, the addition is represented differently from the AST data type`expr`

of the Primer section:`Add (f,n) : int term`

is not a node in a tree but a non-empty difference list. The`'a term`

here is different from the`'a term`

in the previous section. Unlike the typical deep embedding optimizations that traverse the*same*AST data type, tagless-final optimizations employ whatever representation is the most convenient for them. Different optimizations will likely use different expression representations.Right-associated additions are isomorphic to a non-empty list of summands. We encode it as a non-empty difference list and build bottom-up. Again the readers are encouraged to prove that the code above really does so. The proof is an easy induction: after all, the structural induction is apparent in the

`IDelta`

code, at least from the fact`add`

is not recursive.Taking

`Ex3`

again as the example, we linearize it and pretty-print the result:module LinearizeAdd(F:SYMOBS) = struct module OptM = LinearizeAddPass(F) include SYMT(OptM.X)(F) include OptM.IDelta end module SLin = LinearizeAdd(AddObs(Sh)) let module M = Ex3(SLin) in SLin.observe M.res "(3 + (5 + (0 + (1 + 0))))"

The result is indeed right-associated, but with zero summands. We have a pass for that, which we tack on:module SLinZero = SuppressZero(SLin) let module M = Ex3(SLinZero) in SLinZero.observe M.res "(3 + (5 + 1))"

Composing the`LinearizeAdd`

optimization with the earlier defined`SuppressZero`

is as simple as the ordinary functional composition of the transformers.

- The moment of truth for the modularity of the optimization framework
comes when we extend the DSL. This section does exactly that. Our
trivial DSL of integer literal additions gains two new forms for
creating and applying first-class functions, becoming
higher-order. Although the addition operations now have to deal with
variables, the old interpreters can be fully reused in extensions.
In particular, the two previously developed optimization
passes are applicable as they are.
In the tagless-final approach, a DSL is defined by the signature of its interpreters, like the

`SYM`

signature from the Primer. The new DSL of higher-order functions and addition expression has the following signature:module type SYMHO = sig include SYMOBS val lam: ('a repr -> 'b repr) -> ('a->'b) repr val app: ('a->'b) repr -> ('a repr -> 'b repr) end

It reuses the already defined simple DSL of arithmetic expressions, and its observations, adding to it two new expression forms. Here is a sample expression of the extended language, with lambda-abstractions and applications:module Ex5(I:SYMHO) = struct open I let res = observe @@ lam @@ fun x -> add (add x (add (int 5) (int 0))) (add (add (add (int 1) (int 0)) (int 7)) (add (app (lam (fun y->y)) (int 4)) (int 2))) end

The inferred signaturemodule Ex5 : functor (I : SYMHO) -> sig val res : (int -> int) I.obs end

tells that our DSL has grown beyond solely`int`

expressions:`Ex5`

represents an`int->int`

function.We need an

`Ru`

-like interpreter to evaluate it. We do not write it from scratch: rather, we extend the`Ru`

interpreter:module RuHO = struct include AddObs(Ru) let lam f = f let app f x = f x end

As the proper extension, it can handle all earlier examples in the original, simple DSL; it also interprets the extended language:let module M = Ex2(RuHO) in M.res 10 let module M = Ex5(RuHO) in M.res 6 25

`RuHO`

, like any other tagless-final interpreter, is typed, evaluating an`'a repr`

DSL expression into the OCaml value of the same type. We did not talk about types before as the DSL had only`int`

expressions. It has grown:`Ex5`

represents an`int->int`

expression, and is evaluated into the corresponding OCaml function; the latter is then applied to`6`

. The evaluation result has no tags and other wrappers, justifying the name of the approach.The

`ShHO`

interpreter pretty-prints expressions:let module M = Ex5(ShHO) in M.res "Lx. (x + (5 + 0)) + (((1 + 0) + 7) + ((Ly. y) 4 + 2))"

(We re-wrote the interpreter with precedence pretty-printing, to omit useless parenthesis).Let us see how to optimize the extended expressions. First, we need a new default interpreter (which is again obtained by extending

`SYMT`

and reusing all of its code):module SYMTHO(X:Trans)(F:SYMHO with type 'a repr = 'a X.from) = struct open X include SYMT(X)(F) let app x y = fwd (F.app (bwd x) (bwd y)) let lam f = fwd (F.lam (fun x -> bwd (f (fwd x)))) end

There is no second step. The old optimization passes`SuppressZeroPass`

and`LinearizeAddPass`

can be used as they are:module SuppressZeroHO(F:SYMHO) = struct module OptM = SuppressZeroPass(F) include SYMTHO(OptM.X)(F) include OptM.IDelta end module SuppressZeroHOSh = SuppressZeroHO(ShHO) let module M = Ex5(SuppressZeroHOSh) in M.res "Lx. (x + 5) + ((1 + 7) + ((Ly. y) 4 + 2))"

Adding the next pass, to linearize the additions, is a simple composition, as before:module LinearizeAddHO(F:SYMHO) = struct module OptM = LinearizeAddPass(F) include SYMTHO(OptM.X)(F) include OptM.IDelta end module SLinZeroHO = LinearizeAddHO(SuppressZeroHOSh) let module M = Ex5(SLinZeroHO) in M.res "Lx. x + (5 + (1 + (7 + ((Ly. y) 4 + 2))))"

The additions are all associated to the right, and there are no zeros in sight. Incidentally,`Ex5(SLinZeroHO)`

has the type`sig val res : (int -> int) SLinZeroHO.obs end`

. The result, just like the representation of the original`Ex5`

, is indexed by the same type`int->int`

. The optimization has preserved the type. All tagless-final optimizations have this property by the very construction.

- We have seen how to optimize tagless-final expressions, how to compose optimizations, and how tagless-final optimizations compare with the deep embedding. To summarize, tagless-final optimizations are modular and extensible: when new expression forms are added to the language, all previous optimization passes not affected by the new forms apply as they are. The optimization are typed. They are type-preserving by construction. They facilitate correctness proofs by structural induction. They are deeply connected to the normalization-by-evaluation. They are pipelined: once a sub-expression is optimized, it is passed on to the next interpreter/optimizer right away, without waiting for the whole first optimization to finish. All in all, the compositional interpretation -- as fold -- does not get in the way of generally context-sensitive, apparently non-compositional optimizations.

- This tutorial introduces, explains and drills optimizations of
domain-specific languages embedded in Haskell or OCaml in the
tagless-final style. The running example is deliberately simple and
well-known: the DSL of logic circuits. The language is initially
first order; it is later extended to the higher-order, to demonstrate
modularity and extensibility. The tutorial was given on several
occasions, using Haskell and OCaml.
The intended audience are the users, researchers and students of functional programming. Experience with the typed functional programming and only the basic familiarity with Haskell or OCaml are assumed. The tutorial thus should be applicable across the modern functional programming languages. Rather than a lecture, the tutorial is designed to be an interactive session, explaining the technique on a series of progressively more complex examples and discussing the implementation points and problems with the audience.

Just as a Math talk cannot avoid looking at proofs, in a programming tutorial we cannot help but stare at the code. Befitting its informal style, the tutorial was presented by showing, describing, explaining and discussing the code. Below are the code files, in Haskell and OCaml, with many comments, questions, remarks and quizzes.

- BasicGates.hs [5K]

basic_gates.ml [9K]

A DSL to model AND/OR/NOT gates connected by wires, and its embedding into Haskell or OCaml. We show sample circuits and several sample interpreters. - NAND.hs [2K]

basic_gates.ml [9K]

Realizing logic circuits in terms of NAND gates - BConstProp.hs [3K]

bconst_prop.ml [3K]

How to transform embedded DSL expressions? The first optimization: simplify gate assemblies by Boolean constant propagation. - BB.hs [3K]

Short overview of Tagless-Final and of the Boehm-Berarducci's encoding - BasicOptNeg.hs [4K]

bneg_adhoc.ml [4K]

Pushing negation all the way towards literals: the first step towards normalizing gate assemblies. This transformation does not at all look like a fold-like interpretation. Here we implement this transformation creatively. Later on, we redo it in a systematic, ``dumb'', mechanical way that no longer requires much creativity. Replacing creativity with mechanical work is one of the main goals of Math and Computer Science. - RR.hs [<1K]

rr.ml [2K]

Capturing the common pattern in writing optimizers: the Reflection-reification pair - B2Neg.hs [3K]

bneg_double.ml [3K]

The second optimization: double-negation elimination. We write a transformation that at first glance does not look like a fold at all - BNegDown.hs [3K]

bneg_down.ml [3K]

Even less fold-like transformer: applying De Morgan laws to push the negation towards the inputs - BNeg.hs [<1K]

bneg.ml [<1K]

Composing the three previous transformations, and repeating them the statically unknown number of times - HGates.hs [3K]

hgates.ml [6K]

Making the language higher-order: adding abstractions and applications. Modeling not just wires and gates but composite circuits - HConstProp.hs [2K]

hgates.ml [6K]

Higher-order ``constant propagation'': extending the partial evaluator to apply statically known functions. Demonstrating how to use the new optimization along with the previously developed ones -- which apply as they are to new, higher-order circuits - cnf.ml [2K]

Conversion to the conjunctive normal form

`AND x x -> x`

; complete Boolean circuit simplifications; converting to DNF; handling sharing. - BasicGates.hs [5K]
**References**- The Haskell version of the tutorial was first presented at the
17th Programming and Programming Languages workshop (PPL 2015),
Matsuyama, Japan, on Mar 5, 2015.
slides-PPL.pdf [82K]

A few introductory slides of that Haskell tutorial (in Japanese)T4: Embedding and optimizing domain-specific languages in the typed final style

The more extensive (3.5hrs) tutorial, presented at CUFP 2015 in Vancouver, BC, Canada on September 3, 2015Functional Thursday #30 October 1, 2015. Taipei, Taiwan

The recorded lecture at the Functional Thursday Meetup TaipeiInternational Summer School on Metaprogramming

Robinson College, Cambridge, 8th to 12th August 2016

Three-lecture course using Haskell as the meta-languageTagless-final optimizations, algebraically and semantically

The follow-up tutorial, stressing semantics of optimizations, their correctness, and calculation rather than re-writing