From oleg@pobox.com Sun Dec 26 22:48:42 GMT 1999 Message-ID: <8465sg$2bq$1@nnrp1.deja.com> From: oleg@pobox.com Subject: Tcl, Scheme, Web, robot [Re: Why Scheme?] Date: Sun, 26 Dec 1999 22:50:47 GMT Reply-To: oleg@pobox.com Keywords: HTTP, robot, get-url, parsing, Scheme, Tcl Newsgroups: comp.lang.scheme Summary: a web robot in Tcl vs. Scheme X-Article-Creation-Date: Sun Dec 26 22:50:47 1999 GMT Status: OR A "Tclsh spot" in December 1999 issue of USENIX ;login: presented a simple web robot to fetch the last trade price for a specific stock. The robot makes an HTTP request and extracts two nuggets of information from a reply web page filled with advertisements and other HTML excesses. This elegant piece of Tcl code is quoted in Listing 1 (below). I couldn't help but wonder how this problem could be solved in Scheme. Listing 2 shows the result. The comparison seems relevant to the ongoing discussion as to what Scheme is good or not good for, and why Scheme isn't as popular as it merits. When comparing Listings 1 and 2 one has to keep in mind differences in self-documentation and error handling. The Tcl code basically has none. When called as "robot.tcl sunw xxx" it writes: SUNW last traded at 74 9/16 xxx last traded at 74 9/16 This gives a _false_ impression that xxx is a valid ticker symbol. Imagine this were a trading agent; it may tell you then it bought you shares of a company that does not actually exist... I couldn't bring myself to make Scheme code mask the error and lie as blatantly, even for the sake of illustration. The Scheme code does not require any HTTP package -- in fact, the "get-url" function is spelled out explicitly in Listing 2. The code is compact yet powerful. It handles direct connections as well as those via an HTTP proxy. The function can be used independently, to fetch a page or a file from a web server. Speaking of code compactness, it is instructive to compare 466 lines of ...tcl8.0/http2.0/http.tcl with get-url from Listing 2. Granted get-url is less generic, but _obviously_ not by the amount the line count ratio may imply. There are other important differences between Listings 1 and 2. The Tcl code first loads the entire web page into a string and then applies a regular expression to it. The Scheme code does no such things. Listing 2 deals with server's reply as it comes, character by character, and only once. This follows a functional, referentially-transparent style of input processing (where the input port is the only stateful argument, used in a linear-logic fashion). For another thing, the Scheme code does not employ regular expressions. Simple parsing primitives suffice handsomely. Listing 2 illustrates an and-let* form, to execute a series of steps each of which may fail. If a step does not succeed the rest of the computation should be aborted. The and-let* form neatly expresses such a guarded computation, *without* resorting to call/cc, exceptions and similar expensive means. I have run the Scheme code on Listing 2, inside as well as outside a firewall. It seems to work well. This code appears somewhat similar to 'Sherlock', which Apple distressingly hypes as the best thing since Finder. Sherlock uses just as unsophisticated pattern matching for web pages as does Listing 2. It appears that disturbingly many people possess the following pre-defined set of associations '((CGI Perl ASP JSP) (ASP VB) (DHTML JavaScript VB) (Tk Tcl) (Expect Tcl) (Hot "J.*" "V.*" "X.*" ".*X$")) As this article strove to point out, the reality is _far_ bigger, and arguably better. PS. The same issue of ;login: has another thought-provoking piece of code. Page 57 shows a C program, written by a _smart_ person, which uses "memcpy" to store _one_ byte into memory . PPS. string-split mentioned on Listing 2 has a slim chance of becoming a part of SRFI-13, if it gets enough votes. So far it has only one. BTW, SRFI-13 is still under discussion; Olin will certainly appreciate comments, votes and suggestions. They can make a difference: to paraphrase Hegel (?), people program in a language they deserve. Listing 1. package require http foreach symbol $argv { set url "http://www.newsalert.com/free/stocknews?Symbol=$symbol" set id [::http::geturl $url] set data [::http::data $id] regexp {arts\?Symbol=(.*?)">(.*?)<} $data match symbol price puts "$symbol last traded at $price" } Quoted from Clif Flynt, "The tclsh spot", ";login:", vol. 24, N6, pp. 68-73. All comments are properly directed to article's author, clint@cflynt.com or to ;login: editors, login@usenix.org. The code below constitutes a very small part of the ;login article and is quoted here for the purpose of comparison, critique and discussion; I hope this fits fair use. Listing 2. #! /usr/local/bin/gsi -f (define docstrings '( "" " This is a Scheme robot to fetch the current trading price" " for one or several stocks. The ticker symbols for these stocks" " should be specified as command-line arguments for this script." " If you had to configure your web browser to use a proxy, you need" " to set an env variable http_proxy to proxyhost or proxyhost:proxyport" " prior to running this script. proxyport 80 is assumed by default." "" " Example:" " fetch-quote.scm sunw APPL" "" "$Id: fetch-quote.scm,v 1.1 1999/12/26 21:46:14 oleg Exp oleg $" "")) (include "myenv.scm") ;(include "and-let.scm") ; SRFI-2 (define service-url "http://www.newsalert.com/free/stocknews?Symbol=") ; Given a URL, fetch it using the GET method. Return a port from ; which to read the reply. If http_proxy env variable is set, we will ; use that proxy (define (get-url url) (define (do-fetch schema dummy host resource) (let* ((proxy (OS:getenv "http_proxy")) (target-host (or proxy host)) (http-port (##open-input-output-file (string-append "tcp://" target-host (if (string-index target-host #\:) "" ":80"))))) (for-each (lambda (str) (display str http-port)) `("GET " ,@(if proxy (list url) (list "/" resource)) " HTTP/1.0\r\n" "Host: " ,host "\r\n" "User-agent: Scheming-puppy/1.1\r\n" "\r\n" ; Empty line finishes the request )) (flush-output http-port) http-port)) (apply do-fetch (string-split url '(#\/) 4))) ; The root function... (let ((args (argv))) (cond ((or (null? args) (null? (cdr args))) ; No arguments are given at all (for-each (lambda (docstring) (cerr docstring nl)) docstrings) (exit 4)) (else (for-each (lambda (ticker-symbol) (or (and-let* ( (reply-port (get-url (string-append service-url ticker-symbol))) ;((port-copy reply-port (current-output-port))) ((find-string-from-port? "arts?Symbol=" reply-port)) (found-ticker-symbol (next-token '() '(#\" *eof*) "" reply-port)) ((eq? #\" (read-char reply-port))) ((eq? #\> (read-char reply-port))) (price (next-token '(#\space) '(#\< *eof*) "" reply-port)) ) (cout found-ticker-symbol " last traded at " price nl) #t) ; End-of-and-let* (cout "Failed to fetch a stock quote for " ticker-symbol nl))) (cdr args))))) and-let* is SRFI-2. The other input parsing primitives are explained in http://pobox.com/~oleg/ftp/Scheme/parsing.html The only far-flung extension is dealing with "tcp://" file names. As a matter of fact a Scheme system takes such a string as a regular file name; the magic happens thanks to an "extended" version on open(2). http://pobox.com/~oleg/ftp/syscall-interpose.html It is possible to imbue any language system or application with such extended file opening powers. No recompilation is necessary (and in some cases, no relinking is required either, courtesy of LD_PRELOAD).