The distilled tutorial on sharing and common subexpression elimination in embedded domain-specific language compilers was presented at DSL 2011: IFIP Working Conference on Domain-Specific Languages, which took place on 6-8 September 2011 in Bordeaux, France. This page collects the tutorial material in the form of the paper published in the DSL 2011 Proceedings, the extensively commented Haskell code, and solutions to selected exercises.
One of the important tasks of a compiler is the so-called ``common subexpression elimination'' -- detecting subexpressions denoting the same computation and arranging for that computation to be performed once and the results shared. This optimization (significantly) improves both the running time and compactness of the code and is particularly important for hardware description and firmware compilers. As DSL implementers, it becomes our duty to detect duplicate subexpressions and have them shared. We call this detection implicit sharing, to contrast with the sharing explicitly declared by users. Imperative languages, where it matters whether an expression or its result are duplicated, have to provide a form (some sort of a local variable introduction) for the programmer to declare the sharing of expression's result. In this tutorial, we limit ourselves to side-effect--free expressions such as arithmetic expressions or combinational circuits. Explicit sharing is still important: sharing declarations can greatly reduce the search space for common subexpressions and prevent exponential explosions. Sharing declarations also help human readers of the program, drawing their attention to the `common' computations.A common pitfall in implementing explicit sharing is attempting to use the
letform of the host language to express sharing in the DSL code. Although the host
let-form may speed up some DSL interpretations, it leads to code duplication rather than sharing in the compilation result. Our technique is a pure veneer over hash-consing, hiding its seemingly imperative nature beneath a simple combinator language. The overall implementation remains pure functional and easy to reason about. Our technique makes adding explicit sharing particularly easy.
A reported unsuccessful attempt to detect and eliminate common subexpressions when implementing a real DSL compiler was the motivation for our library. Show-stopping was the expression comparison, which, in pure language is structural and requires the full traversal of expressions. We distill that attempt, presenting our DSL of arithmetic expressions and its embedding in Haskell as a data type. We introduce the two simple running examples: multiplication by a known integer constant and the optimal running sum computation (sklansky).
Reviewing the main approaches to speeding-up the comparison of DSL expressions, all relying on some sort of `pointer' equality, or, on labeling expressions with unique pieces of data (e.g., integers) that admit efficient comparison. The first approach is manual labeling. The second uses the state monad to automate label assignment. The third, popular but treacherous approach is the so-called observable sharing. The code illustrates its dangers. None of these approaches help with common sub-expressions unrelated to pointer aliasing.
The first part of our solution: introducing the tagless-final embedding of the DSL and presenting two interpreters, to evaluate expressions and to print them. We then add another interpreter, interpreting a DSL expression as a
Node in a
DAG, a hash-consed
data structure. The interpretation detects common subexpressions and
shares them. Rather then converting an abstract syntax tree to a DAG,
we build the DAG from the beginning.
Explicit sharing, the
let-form, is needed already in a
side-effect--free language. The
ExpLet.hs code explains why,
adding explicit sharing to our DSL. The code demonstrates how explicit
sharing greatly helps the implicit one.
Utility code: Establishing a bijection between the values of the type
a and integers, with the operations to retrieve the
value given its key, to find the key for the existing value, and to extend the
bijection with a new association. The type
a of values should at
least permit equality comparison; In the present implementation,
a to be a member of Ord.
There are many ways to implement bi-maps, for example, using hash tables,
or maps. The present implementation relies on
Data.IntMap to record both parts of the association.
The old code illustrating implicit and explicit sharing in embedded DSLs. The code uses a slightly richer object language. It was posted on the Haskell-Cafe list in February 2008 in response to Tom Hawkins' question.
sklanskyin the monadic style. We do have to change the example to insert explicit sharing. But we do not have to do the wholesale re-writing, retaining the list comprehension in
Ershov's short paper fully describes what we now call hash tables, hash functions, the useful properties of hash functions, and hash-consing. The paper analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.
The library presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described: see the preface to the English translation of Ershov's paper. Nowadays, starting a paper with the phrase ``All unexplained terms are those from '' (where  is the single reference) would not be taken kindly by reviewers.
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML