The distilled tutorial on sharing and common subexpression elimination in embedded domain-specific language compilers was presented at DSL 2011: IFIP Working Conference on Domain-Specific Languages, which took place on 6-8 September 2011 in Bordeaux, France. This page collects the tutorial material in the form of the paper published in the DSL 2011 Proceedings, the extensively commented Haskell code, and solutions to selected exercises.
One of the important tasks of a compiler is the so-called ``common subexpression elimination'' -- detecting subexpressions denoting the same computation and arranging for that computation to be performed once and the results shared. This optimization (significantly) improves both the running time and compactness of the code and is particularly important for hardware description and firmware compilers. As DSL implementers, it becomes our duty to detect duplicate subexpressions and have them shared. We call this detection implicit sharing, to contrast with the sharing explicitly declared by users. Imperative languages, where it matters whether an expression or its result are duplicated, have to provide a form (some sort of a local variable introduction) for the programmer to declare the sharing of expression's result. In this tutorial, we limit ourselves to side-effect--free expressions such as arithmetic expressions or combinational circuits. Explicit sharing is still important: sharing declarations can greatly reduce the search space for common subexpressions and prevent exponential explosions. Sharing declarations also help human readers of the program, drawing their attention to the `common' computations.
A common pitfall in implementing explicit sharing is attempting to use thelet form of the host language to express sharing in the DSL
code. Although the host let-form may speed up some DSL interpretations, it leads to code duplication rather than sharing in
the compilation result.
Our technique is a pure veneer over hash-consing, hiding its seemingly
imperative nature beneath a simple combinator language. The overall
implementation remains pure functional and easy to reason about. Our
technique makes adding explicit sharing particularly easy.ExpI.hs [7K]
A reported unsuccessful attempt to detect and eliminate common
subexpressions when implementing a real DSL compiler was the
motivation for our library. Show-stopping was the expression
comparison, which, in pure language is structural and requires the
full traversal of expressions. We distill that attempt, presenting
our DSL of arithmetic expressions and its embedding in Haskell as a
data type. We introduce the two simple running examples:
multiplication by a known integer constant and
the optimal running sum computation (sklansky).
Ptrs.hs [6K]
Reviewing the main approaches to speeding-up the comparison of DSL
expressions, all relying on some sort of `pointer' equality, or, on
labeling expressions with unique pieces of data (e.g., integers) that
admit efficient comparison. The first approach is manual
labeling. The second uses the state monad to automate label
assignment. The third, popular but treacherous approach is the so-called
observable sharing. The code illustrates its dangers. None of these
approaches help with common sub-expressions unrelated to pointer
aliasing.
ExpF.hs [12K]
The first part of our solution: introducing the tagless-final
embedding of the DSL and presenting two interpreters, to evaluate
expressions and to print them. We then add another interpreter,
interpreting a DSL expression as a Node in a DAG, a hash-consed
data structure. The interpretation detects common subexpressions and
shares them. Rather then converting an abstract syntax tree to a DAG,
we build the DAG from the beginning.
ExpLet.hs [7K]
Explicit sharing, the let-form, is needed already in a
side-effect--free language. The ExpLet.hs code explains why,
adding explicit sharing to our DSL. The code demonstrates how explicit
sharing greatly helps the implicit one.
BiMap.hs [2K]
Utility code: Establishing a bijection between the values of
the type a and integers, with the operations to retrieve the
value given its key, to find the key for the existing value, and to extend the
bijection with a new association. The type a of values should at
least permit equality comparison; In the present implementation,
we require a to be a member of Ord.
There are many ways to implement bi-maps, for example, using hash tables,
or maps. The present implementation relies on Data.Map and
Data.IntMap to record both parts of the association.
DSLSharing.hs [17K]
The old code illustrating implicit and explicit sharing in
embedded DSLs. The code uses a slightly richer object language.
It was posted on the Haskell-Cafe list in February 2008
in response to Tom Hawkins' question.
sklansky in the monadic style. We do have to change the example
to insert explicit sharing. But we do not have to do the
wholesale re-writing, retaining the list comprehension in
sklansky.Ershov's short paper fully describes what we now call hash tables, hash functions, the useful properties of hash functions, and hash-consing. The paper analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.
The library presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described: see the preface to the English translation of Ershov's paper. Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML