Sharing and Common Subexpression Elimination in EDSL compilation

Implementing Explicit and Finding Implicit Sharing in Embedded DSLs

The distilled tutorial on sharing and common subexpression elimination in embedded domain-specific language compilers was presented at DSL 2011: IFIP Working Conference on Domain-Specific Languages, which took place on 6-8 September 2011 in Bordeaux, France. This page collects the tutorial material in the form of the paper published in the DSL 2011 Proceedings, the extensively commented Haskell code, and solutions to selected exercises.

Introduction

We present the detection of implicit sharing and the implementation of explicit sharing in the context of embedded domains-specific language (DSL) compilers. The sharing implementation techniques have since found other uses, e.g., writing SAT solvers. Embedded compilers -- typical for circuit description, embedded control systems or GPU programming DSLs -- complement the familiar embeddings of a DSL in a host language as a library or an interpreter. For example, we may build a circuit model in Haskell using gate descriptions and combinators provided by the DSL library. We may test the circuit in Haskell by running the model on sample inputs. Eventually we have to produce Verilog code or a Netlist to manufacture the circuit. Likewise, a control system DSL program should eventually be compiled into machine code and burned in as firmware. An embedded compiler thus is a host language program that turns a DSL program into a (lower-level) code.

One of the important tasks of a compiler is the so-called ``common subexpression elimination'' -- detecting subexpressions denoting the same computation and arranging for that computation to be performed once, sharing the result. This optimization (significantly) improves both the running time and compactness of the code and is particularly important for hardware description and firmware compilers. As DSL implementers, it becomes our duty to detect duplicate subexpressions and have them shared. We call this detection implicit sharing, to contrast with the sharing explicitly declared by users. Imperative languages, where it matters whether it is an expression or its result are duplicated, have to provide a form (some sort of a local variable introduction) for the programmer to declare the sharing of expression's result. In this tutorial, we limit ourselves to side-effect--free expressions such as arithmetic expressions or combinational circuits. Explicit sharing is still important: sharing declarations can greatly reduce the search space for common subexpressions and prevent exponential explosions. Sharing declarations also help human readers of the program, drawing their attention to the `common' computations.

A common pitfall in implementing explicit sharing is attempting to use the let form of the host language to express sharing in the DSL code. Although the host let-form may speed up some DSL interpretations, it leads to code duplication rather than sharing in the compilation result.

Our technique is a pure veneer over hash-consing, hiding its seemingly imperative nature beneath a simple combinator language. The overall implementation remains pure functional and easy to reason about. Our technique makes adding explicit sharing particularly easy.

References

sharing.pdf [291K]
The paper published in the proceedings of DSL 2011 (Olivier Danvy and Chung-chieh Shan, Eds.) EPTCS.66, 2011, pp. 210-225. doi:10.4204/EPTCS.66.11

Tutorial code: the sharing library

The tutorial is the presentation of the library for detecting implicit and implementing explicit sharing. For clarity, the tutorial is centered around a simple (sub)DSL of arithmetic expressions embedded in Haskell. The running examples, albeit simplistic, are abstracted from real-life, reported problems.

ExpI.hs [7K]: This project was prompted by a mailing list message reporting an unsuccessful attempt to detect and eliminate common subexpressions when implementing a real DSL compiler. Show-stopping was the expression comparison, which, in pure language is structural and requires the full traversal of expressions. We distill that attempt, presenting our DSL of arithmetic expressions and its embedding in Haskell as a data type. We introduce the two simple running examples: multiplication by a known integer constant and the optimal running sum computation (sklansky).
Ptrs.hs [6K]: Reviewing the main approaches to speeding-up the comparison of DSL expressions, all relying on some sort of `pointer' equality, or, on labeling expressions with unique pieces of data (e.g., integers) that admit efficient comparison. The first approach is manual labeling. The second uses the state monad to automate label assignment. The third, popular but treacherous approach is the so-called observable sharing. The code illustrates its dangers. None of these approaches help with common sub-expressions unrelated to pointer aliasing.
ExpF.hs [12K]: The first part of our solution: introducing the tagless-final embedding of the DSL and presenting two interpreters, to evaluate expressions and to print them. We then add another interpreter, interpreting a DSL expression as a Node in a DAG, a hash-consed data structure. The interpretation detects common subexpressions and shares them. Rather then converting an abstract syntax tree to a DAG, we build the DAG from the beginning.
ExpLet.hs [7K]: Explicit sharing, the let-form, is needed already in a side-effect--free language. The ExpLet.hs code explains why, adding explicit sharing to our DSL. The code demonstrates how explicit sharing greatly helps the implicit one.
BiMap.hs [2K]: The utility code: Establishing a bijection between the values of the type a and integers, with the operations to retrieve the value given its key, to find the key for the existing value, and to extend the bijection with a new association. The type a of values should at least permit equality comparison; In the present implementation, we require a to be a member of Ord. There are many ways to implement bi-maps, for example, using hash tables, or maps. The present implementation relies on Data.Map and Data.IntMap to record both parts of the association.
DSLSharing.hs [17K]: The old code illustrating implicit and explicit sharing in embedded DSLs. The code uses a slightly richer object language. It was posted on the Haskell-Cafe list in February 2008 in response to Tom Hawkins' question.

Explicit sharing across several expressions

ExpLetList.hs [5K]
This is the solution to exercise 13 in the DSL paper, which was the challenge posed by Matthew Naylor back in 2008. The solution demonstrates explicit sharing in the sklansky example, without adding lists to our DSL. Mainly, we explicitly avoid rewriting sklansky in the monadic style. We do have to change the example to insert explicit sharing. But we do not have to do the wholesale re-writing, retaining the list comprehension in sklansky.

Historical note on hash consing

Despite the Lisp-sounding name, hash-consing was invented before Lisp. It was described, for the English audience, in the first volume of `Communications of the ACM' in 1958, as the translation of the paper by A.P. Ershov that appeared in the Proceedings of the Academy Of Sciences of the USSR earlier that year. The translation is quite accurate, as far as I could see, but drops the flowcharts and the figure depicting hash function distributions. The terminology may be confusing: what is called `conditional number' in the translated article is now called `hash-code'. The standard terminology has not been invented yet.

Ershov's short paper fully describes what we now call hash tables, hash functions, the useful properties of hash functions, and hash-consing. The paper analyzes the time complexity of the algorithm. Since the algorithm has two exits, writing programs in the continuation-passing style must have been familiar back then.

The library presented at DSL 2011 unwittingly follows Ershov's algorithm closely. The library is (hopefully) better described: see the preface to the English translation of Ershov's paper. Nowadays, starting a paper with the phrase ``All unexplained terms are those from [1]'' (where [1] is the single reference) would not be taken kindly by reviewers.

References

A. P. Ershov: On programming of arithmetic operations
Communications of the ACM, 1958, v1, N8, pp. 3-6.

А.П.Ершов. О Программировании Арифметических Операторов
Доклады Академии наук СССР, 1958, т.118, N3, 427-430. (Математика) Представлено академиком С.Л.Соболевым 2 VII 1957.

Implementing Explicit and Finding Implicit Sharing in Embedded DSLs

Introduction

Tutorial code: the sharing library

Explicit sharing across several expressions

Historical note on hash consing

Last updated June 1, 2020