let (rec) insertion: Generating (mutually) recursive definitions

 

 

let (rec) insertion without Effects, Lights or Magic

Let insertion in program generation is producing code with definitions (let-statements). Although definitions precede uses in generated code, during code generation `uses' come first: we might not even know a definition is needed until we encounter a reoccurring expression. Definitions are thus generated `in hindsight', which explains why this process is difficult to understand and implement -- even more so for parameterized, recursive and mutually recursive definitions.

We have earlier presented an interface for let(rec) insertion -- i.e. for generating (mutually recursive) definitions. We demonstrated its expressiveness and applications, but not its implementation, which relied on effects and compiler magic.

We now show how one can understand let insertion, and hence implement it in plain OCaml. We give the first denotational semantics of let(rec)-insertion, which does not rely on any effects at all. The formalization has guided the implementation of let(rec) insertion in the current version of MetaOCaml.

Not only we have shown what let-insertion actually means; Not only we have developed the first formal model that uniformly treats let-, letrec- and mutually-letrec--insertion, without any continuation or state effects. We have also simplified the previously proposed interface for generating mutually recursive definitions. In particular, memoization comes out naturally; the explicit memoizing fixed-point combinator is no longer needed. The simplification has lead to the straightforward implementation, in the current MetaOCaml.

Joint work with Jeremy Yallop.

References
genletrec.pdf [365K]
Short paper presented at PEPM 2022. January 18, 2022 (online) doi:10.48550/arXiv.2201.00495

talk.pdf [158K]
Slides of the talk at PEPM 2022 (recorded)

Embedded two-staged DSL with let-, letrec- and mutual-letrec--insertion in plain, pure OCaml

Generating Mutually Recursive Definitions
Derivation and illustration of the interface for mutually recursive let-insertion (simplified in the present work)

 

Embedded two-staged DSL with let-, letrec- and mutual-letrec--insertion in plain, pure OCaml

As an outcome of formalizing let rec insertion came the implementation of code generation with let rec insertion -- without any coroutines, delimited continuations, state or other run-time or compiler magic. It is written in plain, pure OCaml -- pure as using no effects, and also no libraries, extensions, etc. beyond the standard library. It can serve as a useful prototype for implementing let(rec) insertions in one's own language or framework. In fact, it has been a useful prototype for the MetaOCaml implementation.

The implementation also serves as an executable denotational semantics of two-level staging with (mutually recursive) let-insertion. It has been used in the above paper to verify all semantic calculations, for all the examples.

The implementation takes form of a two-stage DSL, embedded into OCaml in the tagless-final style.

Base calculus: essentially PCF, or the most basic, side-effect--free subset of OCaml. It is the basis of the host language; it is also the target language (the code being generated)
  • base.mli [2K]
    Base calculus represented as OCaml signature
  • sugar.ml [<1K]
    Syntax sugar
Code-generating combinators, added to the base language to make it staged
Examples
  • ex0.ml [2K]
    Initial examples, from Sec 1 and 2 of the genletrec paper
  • ex1.ml [5K]
    Examples with non-recursive genlet
  • exm.ml [3K]
    Memoizing non-recursive genlet
  • exA.ml [3K]
    Memoizing recursive genlet: Ackermann
  • exM.ml [3K]
    Mutually-recursive definitions
Implementations (i.e., denotational semantics) of Base
  • base_r.mli [<1K]
    base_r.ml [2K]
    The R semantics: the standard Scott-Strachey semantics for a typed Church-style lambda-calculus
  • base_p.mli [<1K]
    base_p.ml [4K]
    The S semantics: the semantic domain is a set of strings (of code)
Implementations: (semantics) of code generation
  • codec_r.ml [16K]
    The R semantics of the host calculus (base + code-generating combinators) parameterized by the semantics of the generated code (R or S)
Utilities

 

Generating Mutually Recursive Definitions

Many functional programs -- state machines, top-down and bottom-up parsers, evaluators, GUI initialization graphs, etc. -- are conveniently expressed as groups of mutually recursive bindings. One therefore expects program generators, such as those written in MetaOCaml, to be able to build programs with mutual recursion.

Unfortunately, currently MetaOCaml can only build recursive groups whose size is hard-coded in the generating program. The general case requires something other than quotation, and seemingly weakens static guarantees on the resulting code. We describe the challenges and propose a new language construct for assuredly generating binding groups of arbitrary size -- illustrating with a collection of examples for mutual, n-ary, heterogeneous, value and polymorphic recursion.

Joint work with Jeremy Yallop.

References
pepm2019.pdf [628K]
Short paper published in the proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM), January 14-15, 2009. Cascais, Portugal
doi:10.1145/3294032.3294078

simple.ml [10K]
Code for the most of the paper, including the generation of recursive (but not mutually recursive) definitions. Generation of the recursive definitions is already supported by MetaOCaml, as it turns out.

dummy_genletrec.ml [6K]
The prototype of the genletrec primitive proposed in the paper. It is a mere a prototype; the complete implementation requires a revision of MetaOCaml (which has since been carried out).

token.mli [<1K]
The token interface for the example of generating a finite state recognizer as a group of mutually recursive functions, from the FSM description. The example is implemented using the genletrec prototype.

let (rec) insertion without Effects, Lights or Magic
The improved and much simplified interface, implemented in MetaOCaml (BER N111)