previous   next   up   top

Type-safe functional formatted IO

 


 

Introduction

Formatted input and output facilities -- epitomized by printf and scanf of C -- are present in many languages. With the exception of Danvy's functional unparsing and its implementation in SML/NJ, formatted input/output facilities are either not type-safe or not applicative. Even in Haskell the type checker does not stop the programmer from passing to printf more or fewer arguments than required by the formatting string. OCaml supports typed sprintf and sscanf but relies on a particular weird typing of these functions and on ad hoc overloading of string literals.

Our goal is type-safe formatted i/o as an ordinary library of formatters, format parsers and format descriptors. Our library interface is essentially that of OCaml formatted i/o, without the weird typing. For concreteness, we will be discussing sprintf and sscanf . Other formatted i/o functions are similar. The function sprintf takes as arguments a format descriptor and the values to format, returning the formatted string as the result. The types and the number of the other arguments of the function depend on the format descriptor. Conversely, sscanf receives as arguments the input string, a format descriptor and a consumer function. The function sscanf parses the received string according to the format descriptor and passes the extracted data to the consumer function, whose result is returned. Again, the number and the types of the arguments to the consumer function depend on the format descriptor. The function sscanf is partial because parsing may fail. We further require format descriptors to be first-class: we should be able to pass them as arguments, return as results, and build incrementally.

The function printf is a polyvariadic function: it accepts an arbitrary number of arbitrarily typed arguments. Furthermore, the first argument to printf determines the number and the types of the other arguments. It seems writing such a function requires dependent types. Olivier Danvy's surprising discovery was type-safe sprintf written as an ordinary SML function. Hinze and Asai demonstrated more type-safe applicative formatters. Although sscanf takes the fixed number of arguments, the consumer function passed as the last argument does not have the fixed type. The number and the types of its arguments are determined by the format descriptor. Again dependent types seem to be called for. The fact that one can write type-safe sscanf in any mainstream functional language is less known. Apparently it has been unknown if type-safe sprintf and sscanf could share the same format descriptor.

References
Olivier Danvy: Functional Unparsing
J. Functional Programming, 1998, v8, N6, pp. 621-625.

Ralf Hinze: Formatting: A Class Act
J. Functional Programming, 2003, v13, N5, pp. 935-944.

Kenichi Asai: On Typing Delimited Continuations: Three New Solutions to the Printf Problem
Ochanomizu U. Technical Report OCHA-IS 08-2, September 2008, 17 pp.
Higher-Order and Symbolic Computation, 2009, to appear.
< http://pllab.is.ocha.ac.jp/~asai/papers/tr08-2.pdf >

 

Formatted IO as an embedded DSL: the initial view

We demonstrate type-safe sprintf and sscanf sharing the same format descriptor. The gist of our surprisingly trivial solution is a simple embedded domain-specific language (DSL) of formatting patterns. The functions sprintf and sscanf are two interpreters of the language, building or parsing a string according to the given pattern. The format descriptor, a term in our DSL, can be interpreted in far more than two ways. For example, one may write an interpreter that counts the number of integer descriptors or converts a format descriptor to a C format string. One can write sscanf-like functions whose input comes from a file, a buffer, etc.

This article uses initial tagless typed embedding of the formatting DSL into Haskell. The terms of the language are represented as GADTs. We also demonstrate that lambda-abstractions at the type level are expressible already in the Hindley-Milner type system; GADT with the inherent polymorphic recursion help us use these abstractions.

Here is the typical example of formatting and unformatting:

     fmt3 = lit "The value of " ^ char ^ lit " is " ^ int
     tp3 = sprintf fmt3 'x' 3
     -- "The value of x is 3"
     ts3 = sscanf "The value of x is 3" fmt3 (\c i -> (c,i))
     -- Just ('x',3)
A format descriptor is built by connecting primitive descriptors (such as lit "string" , int , char) with (^) . Whereas sprintf $ lit "Hello world" has the type String , sprintf $ fmt3 has the type Char -> Int -> String . Likewise, the type of the consumer of the values parsed by sscanf varies with the format descriptor. We see that sprintf and sscanf indeed use exactly the same format descriptor, which is a first-class value.

Our implementation is based on the observation of Hinze (`Formatting: a class act' JFP 2003) that the format descriptor is a `type transformer', a functor. For example, the format descriptor int is associated with a functor transforming a type t to a type Int -> t . Hinze represented functors in Curry style; the functor associated with int is then (Int ->) . We use a more direct, Church-style representation. Let us consider a type F a b where F is a binary type constructor, a is a type variable, and b is a type that may include one or more occurrences of a . Such F a b may be regarded a type abstraction, a type-level lambda-function. To apply the function to an argument, we unify a with the argument; the second parameter of F is the result of the application.

Version
The current version is 1.1, Aug 31, 2008.
References
PrintScan.hs [4K]
The complete code with many examples

GADT.ml [15K]
The same code in OCaml (see Example 1 in the file)

PrintScanI.txt [6K]
The initial view on typed sprintf and sscanf
The message with more explanations, posted on the Haskell mailing list on Sun, 31 Aug 2008 19:40:41 -0700 (PDT)

Relating Final and Initial typed tagless representations

 

Formatted IO as an embedded DSL: the final view

We present a dual solution to the problem of type-safe sprintf and sscanf sharing the same format descriptor. Previously, we defined the embedded domain-specific language (DSL) of formatting patterns in the initial style, as a data type. The language can also be defined in the final style. To the end user, the difference is hardly noticeable: all the examples of the initial solution work as they are (modulo a few adjustments caused by the monomorphism restriction). However, whereas the initial style required GADT, the final solution is entirely in Haskell98. Despite often-heard disparagement, Haskell98 is powerful enough for type-indexed terms, thought to require GADTs or similar dependent-type-like extensions.

The transformation from the initial to the final style follows the correspondence referenced below. In fact, the final implementation was `derived' from the initial implementation by Emacs `code movements'. Once the syntax errors have been fixed, the code worked on the first try.

Version
The current version is 1.1, Sep 2, 2008.
References
PrintScanF.hs [8K]
The complete Haskell98 code with many examples
The code has been used in the course on typed tagless-final interpreters and so has been embellished with many comments.

PrintScanF.txt [4K]
The final view on typed sprintf and sscanf
The message posted on the Haskell mailing list on Tue, 2 Sep 2008 00:57:18 -0700 (PDT)

Relating Final and Initial typed tagless representations

 

Generic polyvariadic printf in Haskell98

This generalization of Text.Printf.printf is inspired by the message of Evan Klitzke, who wrote on Haskell-Cafe about frequent occurrences in his code of the lines like
     infoM $ printf "%s saw %s with %s" (show x) (show y) (show z)
Writing show on and on quickly becomes tiresome. It turns out, we can avoid these repeating show , still conforming to Haskell98.

Our polyvariadic generic printf is like polyvariadic show with the printf-like format string. Our printf handles values of any present and future type for which there is a Show instance. For example:

     t1 = unR $ printf "Hi there"
     -- "Hi there"
     
     t2 = unR $ printf "Hi %s!" "there"
     -- "Hi there!"
     
     t3 = unR $ printf "The value of %s is %s" "x" 3
     -- "The value of x is 3"
     
     t4 = unR $ printf "The value of %s is %s" "x" [5]
     -- "The value of x is [5]"
The unsightly unR appears solely for Haskell98 compatibility: flexible instances remove the need for it. On the other hand, Evan Klitzke's code post-processes the result of formatting with infoM , which can subsume unR .

A bigger problem with our generic printf , shared with the original Text.Printf.printf , is partiality: The errors like passing too many or too few arguments to printf are caught only at run-time. We can certainly do better.

Version
The current version is 1.1, June 5, 2009.
References
GenPrintF.hs [2K]
The complete source code with the tests. It was published in the message posted on the Haskell-Cafe mailing list on Fri, 5 Jun 2009 00:57:00 -0700 (PDT)

Safe and generic printf/scanf with C-like format string

 

Safe and generic printf/scanf with C-like format string

We implement printf that takes a C-like format string and the variable number of other arguments. Unlike C of Haskell's printf , ours is total: if the types or the number of the other arguments, the values to format, do not match the format string, a type error is reported at compile time. To the familiar format descriptors %s and %d we add %a to format any showable value. The latter is like the format descriptor ~a of Common Lisp. Likewise, we build scanf that takes a C-like format string and the consumer function with the variable number of arguments. The types and the number of the arguments must match the format string; a type error is reported otherwise.

Our approach is a variation of the safe printf and scanf described elsewhere on this page. We use Template Haskell to translate the format string to a phrase in the DSL of format descriptors. We use the final approach to embed that DSL into Haskell.

Unlike the safe printf explained in the Template Haskell documentation, in our implementation, format descriptors are first class. They can be built incrementally. The same descriptor can be used both for printing and for parsing. Our printf and scanf are user-extensible: library users can write functions to direct format output to any suitable data sink, or to read parsed data from any suitable data source such as string or a file. Finally, what is formatted can be parsed back using the same format descriptor.

Here are some of the tests from the test collection referenced below. The evaluation result is given in the comments below each binding. Example t31 shows that format descriptors are indeed first-class. The definition t32 , when uncommented, raises the shown type error because the format descriptor does not match the type of the corresponding argument.

     t4 = let x = [9,16,25] 
              i = 2 
          in sprintf $(spec "The element number %d of %a is %a") i x (x !! i)
     -- "The element number 2 of [9,16,25] is 25"
     
     t4s = sscanf "The element number 2 of [9,16,25] is 25"
             $(spec "The element number %d of %a is %a")
             (\i a e -> (i,a::[Int],e::Int))
     -- Just (2,[9,16,25],25)
     
     -- What we print we can parse, using the same descriptor
     t31 = let printed = sprintf desc "x" 3
               parsed  = sscanf printed desc (\s i -> (s,i))
           in (printed, parsed)
       where desc = $(spec "The value of %s is %d")
     -- ("The value of x is 3",Just ("x",3))
     
     -- t32 = sprintf $(spec "The value of %s is %d") "x" True
     --     Couldn't match expected type `Bool' against inferred type `Int'
Version
The current version is 1.2, July 4, 2009.
References
TFTest.hs [4K]
Tests of printing, parsing, parsing the printed data, and of extending the library to format to the standard output.

TotalPrintF.hs [4K]
The code to convert format strings to DSL phrases

Formatted IO as an embedded DSL: the final view

 

Functional un|unparsing

[The Abstract of the paper]
Danvy's functional unparsing problem (JFP 1998) is to implement a type-safe printf function, which converts a sequence of heterogeneous arguments to a string according to a given format. The dual problem is to implement a type-safe scanf function, which extracts a sequence of heterogeneous arguments from a string by interpreting (Friedman & Wand LFP 1984, Friedman & Wand 2008) the same format as an equally heterogeneous sequence of patterns that binds zero or more variables. We derive multiple solutions to both problems from their formal specifications (Wand JACM 1980, TCS 1982).

On one hand, our solutions show how the Hindley-Milner type system, unextended, permits accessing heterogeneous sequences with the static assurance of type safety. On the other hand, our solutions demonstrate the use of control operators (Felleisen et al. LFP 1988, Meyer & Wand 1985) to communicate with formats as coroutines (Wand LFP 1980, Haynes et al. LFP 1984).

This is joint work with Kenichi Asai and Chung-chieh Shan.

Version
The current version is August 2009.
References
Chung-chieh Shan. Slides of the talk presented at the Symposium in honor of Mitchell Wand. August 23, 2009, Boston.
< http://www.cs.rutgers.edu/~ccshan/binding/printf-scanf-slides.pdf >

derive5.ml [26K]
An extensively commented OCaml code explaining the step-wise derivation of typed printf and scanf . The code derives several novel implementations of printf that rely on multi-prompt delimited continuations.

Derive5.hs [4K]
Deriving type-safe printf in direct style, using shift/reset with effectful typing that supports the answer-type modification and polymorphism. The code rigorously derives printf of Asai (2009) along with an implementation that encloses in reset the whole application of printf to values to format.

derive6.ml [8K]
Deriving a context-passing implementation of printf



Last updated April 4, 2010

This site's top page is http://okmij.org/ftp/

oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML