# Normalizing loop bodies by evaluation

## Introduction

We present an application of delimited continuations to optimize MapReduce-like computations. Rather than applying the mapping function to concrete elements of the domain and reducing the results, we apply the mapping function to an abstract element and reify the function into the form of a switch data structure. The data structure describes the cases of the mapping, predicated by a set of constraints on loop variables. We consider loops over individuals and sets of individuals, and equality and set membership constraints. Representing loop bodies in the form of a data structure lets us analyze and optimize the loops. For instance, if the number of the mapping cases is small -- that is, the mapping function treats most of the elements of the domain uniformly, mapping them to the same value -- the reduction with a commutative function can be performed in sublinear time. For example, if the reduction function is addition, we replace the summation of many identical values with the single multiplication.

Probabilistic inference often leads to MapReduce computations of the auspicious form with few distinct cases of mapping. Indeed, we write models that express generic knowledge about elements of the domain, incorporating specific information we happen to have about particular elements.

The problem is how to automatically discover the auspicious form of the mapping function. We would like to write the function as an expression in a general purpose programming language -- the expression that we can evaluate as usual. Our goal is to abstractly interpret the very same expression to find out the particular cases and the default. The problem is trivial for simple loops. It becomes harder if MapReduce computations are nested and the mapping function includes (equality) comparisons of variables of different loops. The problem is really interesting if we want to loop over all subsets of the domain, when the straightforward enumeration becomes intractable even for small domains.

One may regard the switch data structure as a representation of the normal form of a term in the lambda-calculus with many-component sums. Our reification of nested loop bodies is then an instance of normalization-by-evaluation, and a non-trivial application of multi-prompt delimited continuations.

This is joint work with Chung-chieh Shan.

## Lifted inference: normalizing loops by evaluation

[The Abstract of the paper]
Many loops in probabilistic inference map almost every individual in their domain to the same result. Running such loops symbolically takes time sublinear in the domain size. Using normalization by evaluation with first-class delimited continuations, we lift inference procedures to reap this speed-up without interpretive overhead. To express nested loops, we use multiple control delimiters for metacircular interpretation. To express loops over a powerset domain, we convert nested loops over a subset to unnested loops.

This is joint work with Chung-chieh Shan.

Version
The current version is July 2009.
References
lifted.pdf [141K]
In informal proceedings of the Normalization-by-Evaluation workshop, affiliated with LICS 2009, Logic in Computer Science conference

Chung-chieh Shan. Slides of the talk at NBE09. August 15, 2009, Los Angeles.
< http://www.cs.rutgers.edu/~ccshan/rational/lifted-talk.pdf >

lifted_once.ml [6K]
Lifting single loops ranging over a fixed domain of individuals: Section 2.2 of the paper

lifted_nested.ml [6K]
Generalizing to nested loops, revealing the metacircularity of `reify`: Section 3 of the paper

delimcc_simple.ml [2K]
delimcc_simple.mli [<1K]
Emulating the polymorphic shift/reset using multiple prompts and unsafe unions. This emulation is used only in the expository code above, to get the prompts out of the way.

lifted_nested_mp.ml [6K]
Parameterizing over the type of the loop domain, optimizing reification of deeply nested loops: Section 4 of the paper

lifted_tuples.ml [18K]
Looping over all subsets of the domain
First we convert a nested loop to a flat loop over individuals. We then explain the transformation of the loop over all subsets to a loop over individuals. The code corresponds to Section 5 of the paper. The extensive comments in the code explain the transformation, which is only briefly described in the paper due to the lack of space.

check.hs [2K]
Verifying the results of the test example in Section 5.

pMap.ml [6K]
pMap.mli [5K]
Dependency: PMap - Polymorphic maps, by Xavier Leroy, Nicolas Cannasse, and Markus Mottl. The original code was augmented with the function `insert_with`

Dependency: Delimited continuations in OCaml

Makefile [3K]
How to compile the library and run the tests and examples

## Separating constraint inference from constraint solving

We describe an alternative procedure for normalizing nested loop bodies by evaluation. Like the approach described in the paper, it optimizes MapReduce-like computations and achieves the same results. The alternative approach too symbolically evaluates loop bodies, converting the mapping function to a switch data structure. The data structure is different however: it is now a decision tree rather than a list, and represents the constraints on loop variables in the raw form. The reification procedure, quite like `reify0` of HANSEI, merely converts an OCaml function representing the loop body into a decision tree data structure. The reification procedure does not attempt to solve the constraints. The constraints are solved by loop iterators, which use the results to efficiently evaluate the loops. Because of the clean separation between the constraint generation and constraint solving, the alternative approach is easily extensible. We show one extension, the addition of set-membership and cardinality constraints, to efficiently evaluate the loops over the powerset of the domain.
Version
The current version is July 2009.
References
lifted.ml [14K]
The code for the alternative approach, with comments and many tests

lifted_full.ml [19K]
Extending to loops over all subsets of the domain; adding the membership and cardinality constraints

HANSEI: Embedded probabilistic programming
The paper explains the reification of a probabilistic program into a lazy search tree.

### Last updated September 7, 2009

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org