How to build `identifiers' with syntax-rules

  1. Motivation
  2. Introduction
  3. Insight and Illustration
    1. Getters and setters are first-class
  4. Explanation
  5. Portable implementation
  6. ``Keywords''
  7. Other approaches
  8. References

  

Motivation

One of the often mentioned complaints against syntax rules is their apparent inability to construct identifiers whose names follow a desired pattern. For example, R5RS macros are believed to be incapable of implementing a define-structure form, such as the one described in the Gambit-C 3.0 manual:

special form: define-structure name field...
The define-structure expands into a set of definitions of the following procedures: For example:
     > (define-structure point x y color)
     > (define p (make-point 3 5 'red))
     > (point-x p)
     3
     > (point-color p)
     red
     > (point-color-set! p 'black)

The expansion of (define-structure point x y color) introduces identifiers such as point-x and point-color-set! These identifiers did not exist before. Furthermore, their names are derived from the arguments of the define-structure form. R5RS macros seemingly cannot synthesize identifiers.


  

Introduction

Sometimes we do want to write a macro whose expansion defines functions with the names based off an argument of the macro. For example, Peter Keller wished precisely that, in the message he posted on comp.lang.scheme on Dec 14, 2002. He wished for a macro define-frobnicate such that the expansion of (define-frobnicate foo) defines functions foo-ref and foo-set!. He received the standard reply: parameterized identifiers are not possible in the R5RS macro system.

Aren't they really? The present pages shows how to write a syntax-rule macro that is almost identical to Gambit's define-structure. The field access expression looks almost the same as (point-x p) -- with the difference of only one character! Furthermore, getting and setting of the fields in our structure is just as efficient as the corresponding operations in Gambit. Our getters and setters are first-class values. Our define-structure is a syntactic sugar over a define-record-type: We rely on SRFI-9 to actually implement record data types.


  

Insight and Illustration

We begin with the eternal question: ``What's in the name?'' The name for a field accessor such as point-x stands for a (procedural) value. It is this value that we apply, store in a data structure, pass as an argument or the result. Surely there are other ways of referring to that value, for example, as the result of a higher-order dispatcher procedure. To access the field x of the structure point, we may write

     ((point-dispatcher 'x) p)
     ((point-dispatcher 'x 'set!) 5.0)

This notation looks a bit ugly, with an extra pair of parentheses and quotation marks. Furthermore, it involves a run-time overhead: the dispatch to an accessor procedure is now done at run time. That is where macros come in.

The macros and the following two examples are completely defined in [def-struct-code]. The first example is:

     (display "Example from Gambit-C") (newline)
     
     (define-structure point x y color)
     
     (let ((p (make point 3 5 'red)))  ; cf. (make-point 3 5 'red)
       (display (point x p))           ; cf. (point-x p)
       (newline)
       (display (point color p))       ; cf. (point-color p)
       (newline)
       (point color set! p 'black)     ; (point-color-set! p 'black)
       (display (point color p))       ; cf. (point-color p)
       (newline)
     )

To access the fields of a point, we write (point x p) and (point color p) rather than (point-x p) and (point-color p) as in Gambit. The difference is only in one character, as promised. We also write make point rather than make-point. Our setter, too, looks similar to the one in Gambit, with spaces replacing the dashes.


  

Getters and setters are first-class

Getters, setters, and the predicate are first-class values, which can be stored in a data structure and passed as arguments to procedures. For example:

     (display "Example with higher-order functions") (newline)
     
     (define-structure wish adj noun punct)
     
     (let ((w (make wish #f #f #f)))
        (for-each
          (lambda (setter value) (setter w value))
          (list (wish adj set!) (wish noun set!) (wish punct set!))
          (list "Happy" "Holidays" "!"))
       (for-each
         (lambda (pred getter)
           (if (pred w) (display (getter w)) (display #\space)))
         `(,(wish ?) ,(lambda (_) #f) ,(wish ?) ,(wish ?))
         `(,(wish adj) #f ,(wish noun) ,(wish punct)))
       (newline)
     )
We should specifically point out the line
     `(,(wish adj) #f ,(wish noun) ,(wish punct))
Here, the accessor (wish adj) is a first-class value, which is stored in a data structure (a list), which is in turn passed to the higher-order combinator for-each. Likewise, we can apply the introduced ``identifiers'' (as Al Petrofsky has pointed out on the discussion thread):
     (define-structure point x y color)
     (make point 3 5 'red)            ; => a point
     (apply (make point) '(3 5 red))  ; => an equivalent point

The examples run with Scheme48. See below about running the examples on other Scheme systems.


  

Explanation

The macro (define-structure point x y color) expands into the definition of a macro point and the declaration of the point SRFI-9 record type:

     (define-record-type rectype 
        (maker~2 x y color) predicate~3
        (x getter~4 setter~4)
        (y getter~5 setter~5)
        (color getter~6 setter~6)
     )
     (define-syntax point
       (syntax-rules (set!~1 color y x ?~3 *make*~2)
         ((point color set!~1) setter~6)
         ((point color set!~1 rec~6 new-val~6) (setter~6 rec~6 new-val~6))
         ((point color) getter~6)
         ((point color arg~6) (getter~6 arg~6))
         ((point y set!~1) setter~5)
         ((point y set!~1 rec~5 new-val~5) (setter~5 rec~5 new-val~5))
         ((point y) getter~5)
         ((point y arg~5) (getter~5 arg~5))
         ((point x set!~1) setter~4)
         ((point x set!~1 rec~4 new-val~4) (setter~4 rec~4 new-val~4))
         ((point x) getter~4)
         ((point x arg~4) (getter~4 arg~4))
         ((point ?~3) predicate~3)
         ((point ?~3 arg~3) (predicate~3 arg~3))
         ((point *make*~2) maker~2)
         ((point *make*~2 . args~2) (maker~2 . args~2))))

Scheme48, among other systems, lets us look at the expansion of a macro:

     > ,expand (point x p)
     '(#{Generated getter 252} p)

that is, (point x p) is expanded into the invocation of the appropriate getter procedure. The dispatch to the getter has happened at the macro-expand time. Therefore, our (point x p) form is just as efficient at run time as (point-x p) is in Gambit.

     > ,expand (point x)
     #{Generated getter 252}

yields an identifier, which is bound to the getter procedure. We store the latter value in data structures and to pass as arguments and the result.


  

Portable implementation

Alas, the example as stated above runs only on Scheme48. It seems we have encountered a dark, under-specified corner of R5RS macro system. See [syntax-rule-dark-corner] for more detail and a simple illustrative test.

If we forgo the treacherous and under-specified top level, we can make the example run on any R5RS system (at least on four R5RS systems installed on my computer). The changes are minor; the complete code is [def-struct1-code]. The comments in that file also show how to load and run the code on Scheme48, SCM, Bigloo, and Chez Scheme. The example now has the form:

     (define-structure wish (adj noun punct)
     (let ((w (make wish #f #f #f)))
        (for-each
          (lambda (setter value) (setter w value))
          (list (wish adj set!) (wish noun set!) (wish punct set!))
          (list "Happy" "Holidays" "!"))
       (for-each
         (lambda (pred getter)
           (if (pred w) (display (getter w)) (display #\space)))
         `(,(wish ?) ,(lambda (_) #f) ,(wish ?) ,(wish ?))
         `(,(wish adj) #f ,(wish noun) ,(wish punct)))
       (newline)
     ))

which differs from the earlier code only in the placement of one parenthesis. It portably prints the desired result.


  

``Keywords''

The define-structure macro works seamlessly with ``keywords'' -- even on a Scheme system that does not support DSSSL.

In the discussion thread of the original article [Usenet-discussion] Al Petrofsky wrote:

I think the biggest drawback to this technique ... is that you can't use the field names as local variables, as in:
     (let ((x (point-x foo)))
       (if (= x (point-x bar))
         <something>))
If macro calls like (point x bar) are going to work, then x must not be shadowed. Thus, your system is not that much better than plain srfi-9, at least in so far as you need to use up one name from the top-level address space for every field of the structure type.

That is where ``keywords'' come in. They -- at least the real DSSSL keywords -- are not supposed to be used as identifiers. Rather, they are meant as symbolic labels, in particular, argument list labels and field labels. By using ``keywords'' then, we do not take away from the supply of identifiers. Furthermore, ``keywords'' as field labels make the code look nicer. Given the same code [def-struct1-code] as before, the following fragment runs and prints the expected result -- on SCM (which does not have DSSSL keywords), Petite Chez Scheme (does not support keywords) and Bigloo (which does support keywords). The similar code runs on Scheme48 (which again does not support keywords).

     (define-structure point (x: y: color:)
     (let ((p (make point 3 5 'red)))   ; cf (make-point 3 5 'red)
       (display (point x: p))           ; cf (point-x p)
       (newline)
       (display (point color: p))       ; cf (point-color p)
       (newline)
       (point color: set! p 'black)     ; (point-color-set! p 'black)
       (display (point color: p))       ; cf (point-color p)
       (newline)
     ))
The best part is: we do not need to do anything to the code or to the top level: we do not need the DSSSL support, we do not need to introduce the identifiers x:, y:, etc. on a system without the keywords. It just works, everywhere.
  

Other approaches

The discussion with Al Petrofsky about CooL symbols introduced another approach to emulating parameterized identifiers with syntax rules [CooL-symbols]. The article [S-exp-as-identifiers] shows how to truly concatenate 'identifiers' with syntax rules. The latter approach poses a question 'what is an identifier' -- the question that was discussed in many papers on hygienic macros. The approach is a logical outgrowth of the Kohlbecker renaming algorithm. The latter relies on a difference between the source and the macro-expanded forms of a program -- the difference that we can turn to our advantage. For example, identifiers in the source program do not even have to be symbols! It is also important to realize that the core evaluator will be just as happy to evaluate this

      ((lambda (voltage resistance) (/ voltage resistance)) 110 10)
as this
       ((lambda (i~124~10 i~123~10) (/ i~124~10 i~123~10)) 110 10)
Indeed, programs as closed terms are invariant with respect to alpha-renaming.
  

References

[def-struct-code] def-struct.scm [5K]
define-struct code, for Scheme48

[def-struct1-code] def-struct1.scm [6K]
Portable define-struct code

[syntax-rule-dark-corner] A dark, under-specified corner of R5RS macros

[CooL-symbols] Portable case-sensitive symbols, and the meaning of identifiers

[S-exp-as-identifiers] Macro-expand-time environments and S-expressions as identifiers

[Usenet-discussion] How to build "identifiers" with syntax-rules: define-structure as a R5RS macro
The original article and its USENET discussion thread, on which the present page is based. The article was posted on comp.lang.scheme on Thu, 19 Dec 2002 13:42:19 -0800.



Last updated October 1, 2004

This site's top page is http://pobox.com/~oleg/ftp/

oleg-at-pobox.com or oleg-at-okmij.org or oleg-at-computer.org
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML