The need for portable case-sensitive symbols arose in several real projects. The most natural answer is a conservative lexical extension: a portable notation for case-sensitive symbols. The notation fully preserves the lexical structure of R5RS Scheme and can be used on any R5RS system. Denoted case-sensitive symbols are transcribed into genuine case-sensitive symbols by a portable macro. We consider a low-level-macro and a slightly less general syntax-rule implementations. The discussion of the notation and its transcription on the comp.lang.scheme newsgroup has revealed surprisingly deep insights: into portable lexical extensions of Scheme, into treating code as data, into the capabilities of syntax-rules macros, and into the very meaning of identifiers.
This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com.
According to R5RS, "Upper and lower case forms of a letter are never distinguished except within character and string constants." There are legitimate applications however that greatly benefit from case-sensitive symbols. One such application is an S-expression-based form of XML [SXML]. PLT XML collections, SXML and all other similar projects map tag names to identifiers. Such a representation is highly appropriate as tag names are not usually mutable but heavily used in identity comparisons. The need for case-sensitive identifiers in describing semi-structured data as S-expressions was recognized in DSSSL.
A great number of Scheme systems already offer a case-sensitive reader, which often has to be activated through a compiler option or pragma. A web page [Scheme-case-sensitivity] discusses case sensitivity of various Scheme systems in detail.
'"CooL"notation and its transcription
This section describes the notation and its implementation that were first presented in an article [CooL-symbols].
According to R5RS, symbols created by
(string->symbol "ASymbol")retain their case, while symbols
reador entered literally
(with-input-from-string "ASymbol" read) 'ASymbolmay get their case changed on many Scheme systems. Therefore, the following expression
(eq? (string->symbol "ASymbol") 'ASymbol)evaluates to
#fon many Scheme systems, e.g., on SCM (which downcases all literal symbols) and Bigloo (which uppercases them).
A SSAX XML parser [SSAX] relies on
string->symbol to turn tag and attribute names into
case-sensitive symbols. A test suite for the parser however needed a
way to enter such case-sensitive symbols literally. Test cases are
embedded into the SSAX code, and are always enclosed within a special
(run-test (test1) (test2) ...)If a user wants to run self-tests, he defines this form as
(define-macro run-test (lambda body `(begin (display "\n-->Test\n") ,@body)))Otherwise, he defines
(define-macro run-test (lambda body '(begin #f)))which effectively switches all the tests off. This fortuitous circumstance suggested that the
run-testcan do a bit more than just expanding into a
run-testform can enable truly portable and truly concise case-sensitive symbols.
We introduce a notation
'"ASymbol" -- a quoted
string -- to stand for a case-sensitive
ASymbol. This notation is valid only within the body of a
run-test or similar form.
The notation is implemented by scanning the run-test's body and
replacing every occurrence of
(quote "str") with the
To make the implementation more general, we separate the task of
scanning and replacing into a macro
(define-macro sensitize-case (lambda (body) (define (re-write body) (cond ((vector? body) (list->vector (re-write (vector->list body)))) ((not (pair? body)) body) ((and (eq? 'quote (car body)) (pair? (cdr body)) (string? (cadr body))) (string->symbol (cadr body))) (else (cons (re-write (car body)) (re-write (cdr body)))))) (re-write body))) (define-macro run-test (lambda body `(sensitize-case (begin ,@body))))
It must be stressed that
'"ASymbol" behaves truly
like a Scheme symbol with its case preserved: the operation
(string->symbol "ASymbol") is performed at a macro-expand time rather than at run time. An evaluator sees
no quotes or function invocations at the place where
'"ASymbol" used to appear: the evaluator sees a genuine literal symbol. Thus
'"ASymbol" can be used in a
case statement in positions
where only literal values are allowed.
SSAX since version 5.0 implements
run-test as a
portable, R5RS-compliant syntax-rule macro.
The following expression:
(run-test (and (symbol? ''"ASymbol") (symbol? (car '('"ASymbol"))) (eq? (string->symbol "ASymbol") ''"ASymbol") (case (string->symbol "ASymbol") (('"ASymbol") #t) (else #f))) )returns
#ton Gambit, SCM, MIT Scheme, and Bigloo, that is, regardless of the case-sensitivity of a Scheme system. Notice a curious notation --
''"ASymbol"-- a double-quote following double quotes.
The SSAX.scm source code [SSAX] gives many more examples, e.g.,
(run-test ; Definition of ; test:: XML-string * doctype-defn * expected-SXML-term -> void ; elided (test "<BR/>" dummy-doctype-fn '(('"BR"))) (test "<!DOCTYPE T SYSTEM 'system1' ><!-- comment -->\n<T/>" (lambda (elem-gi seed) (assert (equal? elem-gi ''"T")) (values #f '() '() seed)) '(('"T"))) )
At first sight, the transcription of the
''"ASymbol" notation can only be effected by a low-level macro.
High-level (a.k.a., R5RS or syntax-rules) macros cannot express this
transformation. By design, syntax-rules prohibit manufacturing of
symbols and identifiers: otherwise, it would be impossible to guarantee
It is therefore astonishing to realize that a syntax-rule macro can nevertheless carry out a (less general) transcription task. Al Petrofsky had a remarkable insight: the examples in the previous section will still hold if we, rather than replacing a quoted string with a symbol, re-write expressions where the quoted string appears. Al Petrofsky wrote [Petrofsky]:
Although your implementation supports case-sensitive variable names, it appears that you don't really desire them, you just want case-sensitive literals. In r5rs, there are only three expression types in which literals occur: quote, quasiquote, and case. What you need is for the tests to be evaluated in a syntactic environment that has modified versions of these syntaxes that understand the
'"ASymbol"notation. The only constraint hygiene imposes is that you must pass in to the macro the names of the keywords that will be rebound (in other words, because run-test is really a binding construction, the identifiers being bound must be lexically visible from the expressions that use them).
Below is an implementation of run-test that takes as extra arguments the identifiers to be bound to the
'"ASymbol"-aware versions of quote, quasiquote, and case. It is called like so:(run-test '`case (and (symbol? ''"ASymbol") (symbol? (car '('"ASymbol"))) (eq? (string->symbol "ASymbol") ''"ASymbol") (case (string->symbol "ASymbol") (('"ASymbol") #t) (else #f)))) ;=> #t
The syntax-rule implementation of
run-test can be found in
[Petrofsky]. The difference between the
Petrofsky's approaches is best illustrated by peeking at the expansion
of a sample
run-test expression. Specifically we examine the
transcription of a literal expression
'('"a"), which is
a literal one-element list containing a case-sensitive symbol.
In Petrofsky's implementation,
(run-test '`case '('"a"))expands into an expression
(cons (if (string? '"a") (begin (string->symbol '"a")) (begin (cons 'quote (cons '"a" '())))) '())whereas
(sensitize-case '('"a"))expands into a literal
Another use case for the case-sensitive symbols was pointed out by Jens Axel Soegaard. He wrote (ref. [case-command]):
I used this construct(case command ((F !) (draw distance)) ((G) (move distance)) ((+) (begin (right (* turns angle)) (set! turns 1)))and since case uses
eqv?, I experienced that none of the cases where fulfilled, where
commandwas the symbol
F(originating from a string). In a case clause, one has to use datums, so I can not repair my code writing
((string->symbol "F") !).
Both approaches discussed above can solve this problem. We can
indeed do a case-sensitive
case-match of symbols on any
R5RS Scheme system. We only need to: (i) encode case-sensitive symbols
'"SymBol" (that is, a quote followed by the string
that spells the symbol), and (ii) enclose such code in a
sensitize-case or Al Petrofsky's
For example, the following expression
(sensitize-case (let ((command (string->symbol "Go"))) (case command (('"Go" !) (display "Went!")) (('"Move") (display "Moved")) (else (display "stuck!")))))prints
Went!, when evaluated with Gambit (a case-sensitive Scheme system) and with case-insensitive SCM and MIT Scheme.
The article [S-exp-as-identifiers] shows how to truly concatenate 'identifiers' with syntax rules.
Ray Dillinger [Dillinger] wondered about using
"non-classical" symbols (created by
symbol->string and perhaps
containing spaces and other bad characters) as identifiers.
sensitize-case macro truly replaces quoted
strings with the corresponding symbols -- even in binding positions of
special forms. Therefore, the macro can be used to create utmost
(sensitize-case (define (foo) (let (('"1" 5) ('"" 7) ('"(" 25)) (display (+ '"1" '"" '"1" '"(")))))
No matter the looks,
foo is a correct procedure. The
(foo) indeed prints the number 42, on
Gambit-C, Bigloo, SCM and MIT Scheme. This example looks especially
spectacular in MIT Scheme, which can print out a closure. If you enter
the above code, evaluate
(foo) to check that the code
runs, and then ask MIT Scheme to show the body of
foo, you will see:
1 ]=> (pp foo) (named-lambda (foo) (let ((1 5) ( 7) (( 25)) (display (+ 1 1 ())))Numbers, empty strings and even parentheses can be legitimate Scheme identifiers! I like
(let ((1 5)) (+ 1 ...))the most. What a nice illustration of a difference between notation and denotation!
[Lisovsky] Kirill Lisovsky: Case sensitivity of Scheme systems.
[SXML] SXML Specification. Section 6. Case-sensitivity of SXML names.
[SSAX] Functional XML parsing framework: SAX/DOM and
SXML parsers with support for XML Namespaces and validation.
[CooL-symbols] About ''"CooL": low-level macros considered useful
A message on a comp.lang.scheme newsgroup, posted on Thu, 29 Mar 2001 00:32:29 +0000 (UTC)
[Petrofsky] Al Petrofsky: About '`case [was About ''"CooL"]
A message on a comp.lang.scheme newsgroup, posted on 14 Apr 2001 02:44:34 -0700
[case-command] Portable case-sensitive and insensible identifiers [Was:
Symbols in DrScheme - bug?]
A message on a comp.lang.scheme newsgroup, posted on Mon, 5 Nov 2001 15:03:54 -0800
[Dillinger] Ray Dillinger: Re: Symbols
A message on a comp.lang.scheme newsgroup, posted on Fri, 04 Jan 2002 03:44:25 GMT
[S-exp-as-identifiers] Macro-expand-time environments and S-expressions as identifiers
This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML