;; -*-mode: Outline; -*- TOC General Awards Linux: A strategic disruptive force Myths, missteps, and folklore in protocol design (the best talk) A toolkit for user-level file systems LOMAC -- Mandatory Access Control (MAC) you can live with TrustedBSD: adding trusted operating system features to FreeBSD Security enhanced Linux Plan9/Inferno Scripting for PalmOS Nickle: Language principles and pragmatics The design and implementation of the NetBSD rc.d system User-level checkpointing for LinuxThreads programs Are mallocs free of fragmentation? Super-BSD BOF Handwriting recognition on WinCE vs. Palm and Newton Sandboxing applications Building a secure web browser Citrus project: true multilingual support for BSD operating systems Reverse-engineering instruction encodings An embedded error recovery and debugging mechanism for scripting language extensions Interactive simultaneous editing of multiple text regions High-performance memory-based web servers: kernel and user-space performance Web server acceleration via the inverse cache Storage management for web proxies Active Content: really neat technology or impending disaster? The future of virtual machines: A VMware Perspective * General June 28-30, 2001, Boston, MA (Marriott Copley Place) Attendance: ca. 1500 people, down from the last year's 1700. Last year was the 25th anniversary of USENIX. Furthermore, this year economy is slower, and travel budgets of many companies, including Lucent and AT&T, are lower. The conference is selective: in the general track, 24 out of submitted 92 papers were accepted. In the FREENIX track, 27 out of 58 submitted papers were accepted. * Awards The GNU project received the Lifetime USENIX achievement award (aka the "Flame" award). The Kerberos project received the Software Tool User Group award. * Linux: A strategic disruptive force Daniel D. Fry, Director of IBM Linux Technology center Keynote address Dr. Fry received his PhD in theoretical atomic physics from John Hopkins U. The most remarkable facet of his presentation is that he is an IBM officer who can speak for IBM Linux policy. In his address, Dr. Fry repeatedly stated that IBM is committed to Linux, and gave reasons for that. His statements are _official_. Daniel Fry seems a rather forbiddable proponent for Linux. A part of his appeal is that he speaks for IBM. The other part is that he speaks the language of the boardroom; he makes arguments that carry weight among corporate executives. If any organization/company seeks arguments for using Linux, they should invite Dr. Fry to speak. What follows is a close transcript of his keynote address. IBM is committed to Linux because doing so *is good business*. IBM has recognized the commercial potential of Open Source. IBM thinks that skills, desire and open culture, combined, is a long-term, sustainable force. IBM believes not necessarily in Open Source, but definitely in Open Standards (TCP/IP, HTTP, XML) and open culture. "IBM thinks that Linux is a high-value proposition for IBM customers and shareholders." Linux value for IBM: - growing marketplace acceptance - No customer lock-in. The end user is in charge. For IBM as a _service_ provider, this is a good thing. No single vendor can lock out IBM, can prevent IBM from marketing its services to a client. - many people are comfortable with Linux (especially recent college graduates) - Industry-wide initiative - Multi-platform (x86, PowerPC, SPARC, HP/PA, mainframes, ARM, etc) - Basis of innovation. IBM thinks Linux is the reference platform today for new, innovative applications. IBM noted there are 23,000 _business_ applications on Linux (not games or development tools). "IBM believes people with mission-critical databases will run Linux" [sic!]. Not now, but very soon. "IBM sees Linux as a key enabling technology for the next generation of e-business." Linux development _myths_: - Open Source is an undisciplined process - Linux is less secure - No acceptance of Enterprise features - Linux will fragment - Traditional vendors can't participate IBM Linux technology center was established with a mission to accelerate maturation of Linux into the Enterprise. The center employs approx 230 people. IBM pays them to work with Linux community. IBM has recently announced: - a port of its Journaled File system into Linux. JFS gives improved reliability and performance, compared to the native ext2fs Linux file system, which offers no reliability guarantees. - a customer workload test project (joint with SGI). An 8-way SMP Linux system ran IBM DB2 at 95% load level for 96 hours. IBM came to realize that it _can_ work with Open Source -- and make money off it: - IBM sells hardware underneath Linux - IBM sells business applications on the top of Linux - IBM sells services all over Linux Therefore, the fact that OS is free is irrelevant to business of IBM. IBM has made this transition to Linux and Open Source because of his customers: customers were asking for Linux, and that brought about the change in the IBM attitude. I think there is another reason IBM is so hot on Linux. Daniel Fry emphasized that Open Source and Linux in particular are the "disruptive, creative force." This force can dislodge the existing players in the Enterprise market -- Microsoft and to a lesser extent Sun and HP. That benefits IBM. IBM wants to sell services and applications -- not the OSes. Closed operating systems -- Windows or Solaris -- make it difficult for IBM to reach many customers, make it difficult to port applications. OpenSource OSes make the Enterprise market open, free from lock-ins by OS system vendors. This directly benefits IBM. * Myths, missteps, and folklore in protocol design Radia Perlman, Sun Microsystems Eastern Labs. Invited talk, June 30. This was the best presentation of the whole conference. Perhaps the slides will be available at usenix.org. The talk was an enumeration of mistakes and disasters in Internet protocol design (some disasters have happened, some are waiting to happen). The motivation of the talk was a plea to learn from mistakes and be less religious about protocols and the Internet. Let's design a protocol which has the most merits (rather the one which is "truer" IP). The talk made me think far less of IETF. They have made a few good decisions early on. The early success seems to have gotten in their heads -- they are making worse and worse decisions. Example 1: routers, bridges, and switches. As it turns out, bridges were introduces _after_ routers. Radia Perlman can authoritatively say that as she has designed the first bridge. Before that she was a chief designer for DECNet. After Ethernet was introduced and became popular, people have flocked to Ethernet, to her chagrin. DECNet had a level 3 protocol (and can relay packets from one net to another) but Ethernet was a _local_ area network. Ethernet proponents claimed that it was enough. Then one day Radia Perlman's manager came to her and said that he wanted her to design a way to relay packets between two Ethernet segments: Ethernet had inherent size limitations. Routers could connect two networks -- but routers were single protocol and expensive. The managers wanted a cheaper, and multi-protocol way. Thus the bridge was born. Unlike DECNet, Ethernet frames had no hop count. Therefore, Radia Perlman had to invent a spanning tree algorithm to prevent network "divergence" in the presence of loops. Bridges emulate CLNP -- an ISO version of IP. CLNP (ISO 8473) was rather close to IP but better in many ways. CLNP had a hierarchical addressing and offered a mobility within a campus: a user can move his computer from one router (connectivity point on campus) to another without any need to change addresses and other protocol settings. CLNP integrated a telephone network -- regular telephone numbers can serve as (a part of) node address. When the limitations of IP became apparent, CLNP looked like a deserving successor. Unfortunately, CLNP was developed by ISO -- an anathema to IETF. Therefore, CLNP had no chances within IETF. This is very sad. Example 2: IP multicast -- 12 years of design and no result. The obvious idea (implemented in ATM) is to make a multicast group resemble a tree. A multicast group id should include the address of the group's root node. Therefore, any node wishing to join a group will send a message to its gateway; the gateway will figure out the address of the root from the multicast group id and forward the request to the root -- remembering the path the request came from. Alas, this simple idea was abandoned for reasons nobody can remember. Far more complex schemes have been designed. IETF has adopted a design axiom that multicast on level 3 should look the same as broadcast over the Ethernet -- as an article of faith. All the IETF schemes use flat multicast group ids. All of the schemes require flooding of the whole Internet with join or group discovery messages. No wonder that after 12 years of deliberation multicast is hardly deployed anywhere. MBone just doesn't scale. Example 3: Great ARPAnet flood. LSP -- an IMP routing advertisement protocol -- was unstable. One day that was apparent to everybody: ARPAnet stopped functioning because the routers kept endlessly circulating the growing sequence of advertisements, eventually saturating all their queues and the links. Fortunately, the network was small that time, and by sheer luck, people who maintained the network were the same people who designed it. Example 4: Interdomain routing. Somehow it was thought that interdomain routing must be different from intra-domain routing. First came RIP, which was the obvious failure. Then came EGP -- which had no metric and so it couldn't handle loops. The latest incarnation is BGP. BGP is incredibly complex -- furthermore, it is unstable. Every node can set its own policies; alas, there is no mechanism to guarantee the policies are globally consistent. Furthermore, it is known that policies can diverge: a site A may receive a BGP advertisement from site B and change the routes it advertises. That change may cause site B to alter its advertisement -- etc. The great ARPAnet flood will inevitably repeat -- with catastrophic results this time. Example 5: IPSec. Photurus was a great key exchange protocol -- resistant to DoS attacks. Somehow IETF killed it and instead adopted a DoS-prone ISAKMP -- which was designed at NSA! * A toolkit for user-level file systems David Mazieres, NYU. FREENIX session, June 29. The author received the best FREENIX paper award -- at least this time it was deservingly. Custom, virtual filesystems (FS) such as ftpfs, zipfs, encryption fs are increasingly relying on NFS, because NFS, unlike the kernel V-node interface, is portable and standard. A user needs to build an NFS server -- which listens to NFS requests and implements them in terms of the user-defined FS. Because such NFS server often sits on the same computer that uses the custom FS, the server is called a loopback server. One example of a loopback server is an Alex FTP server, which permits "mounting" of an anon FTP server. After a client NFS-mounts such a server, the client can browse remote FTP site as if it were a local FS. However, all existing NFS servers have drawbacks. For one, they can deadlock in a situation when the OS buffer cache is full. Scenario: A client sends a 'read' request to the loopback server. The loopback server accesses a page of code or data that is currently paged out. As the buffer cache is full, to bring a new page into the physical memory, the system has to page out some currently mapped pages. If such pages are dirty, the OS has to write out their content to the backing store first, before freeing the pages. Suppose the dirty page to be evicted from memory belongs to a NFS mounted FS. The OS therefore will ask the NFS server to write out the dirty page. But the NFS server is blocked on the page fault: hence the deadlock of the whole OS. The other problem with loopback server is performance: a single slow file holds the whole server. An SFS toolkit -- the topic of the talk -- is to _construct_ loopback NFS server that takes care of threading and other boring and difficult- to-get-right details. The toolkit includes its own RPC compiler for all NFS-related RPC messages. The compiler can generate stubs/skeletons that permit convenient tracing of NFS messages. The latter greatly help in debugging the server and analyzing the load. Another part of the toolkit is libasync -- an asynchronous RPC library (traditional RPC are synchronous). The toolkit and the libasync library is written in C++. It uses a lot of C++ hacks to basically create a closure (partially evaluate a function), to capture the "continuations", and do reference-counting automatic memory management. Indeed, asynchronous RPC programming is largely CPS. During the discussion, a person asked why such a helpful and needed toolkit as SFS hasn't been implemented before. David Mazieres replied that he first tried to implement SFS in plain C. The complexity of memory managements for closures and continuations was staggering. He gave up. It's only now, David Mazieres said, that C++ provides tools (templates and partial template instantiation) that he could use to write his hacks to manage memory and create continuations. Of course the question is why not to use a better language, where garbage collection, closures and continuations are built in. Note of caution though: during some of the processing, the loopback server cannot afford to take a page fault. Therefore, David Mazieres locks the whole stake and the needed heap areas during such processing. Any other implementation language must likewise let the programmer lock areas of memory and guarantee page-fault-free processing. See www.fs.net for code and more details. * LOMAC -- Mandatory Access Control (MAC) you can live with Timothy Fraser, NAI Labs. FREENIX session, June 28. LOMAC is a Mandatory Access Control (MAC) system with only two levels, lo and hi. LOMAC's distinguished features is that it works with the existing Linux kernels and applications without any recompilation, and it is largely invisible to traditional users. LOMAC requires no patches to the kernel, and _no site-specific configurations_. Thus LOMAC places little burden on the sysadm. LOMAC is a loadable kernel module. You install it, load it in -- and LOMAC instantly starts to work. LOMAC is a MAC with two levels. A hi level is reserved for system applications: daemons, system services, files it /etc and /bin. Everything else is of lo security. LOMAC rules: - a lo-level process can't write to hi-level files or alter hi-level directories - if a hi-level process accesses a lo-level file or directory, the process is demoted to the lo-level LOMAC assigns _files_ to levels statically. LOMAC starts all processes at the hi level. Once a process accesses a lo-level file (including its own executable file, or a network device), the process is demoted. Thus every process that reads off the network is automatically demoted. Slick! The protection is indeed transparent. LOMAC guards against all existing and _future_ Trojan horses and buffer overflow attacks. Buffer overflow attack still works -- but it becomes useless for the attacker. As soon as the privileged process accesses intruder's files or the network, the process is automatically demoted (and cannot modify any system configuration afterwards). Implementation: system call interposition within the kernel. Overhead: on micro-benchmarks: 4K file copy: 2.8% 256-byte file copy: 9.5% on a macro benchmark (Linux kernel compile): 3.1% LOMAC has a trusted file update program. SSH is exempt from the demotion rule. This makes it possible to administer the system remotely. Note, su does _not_ work. A non-privileged process cannot modify privileged data -- period. www.nailabs.com, click on open Source. See also on SourceForge. * TrustedBSD: adding trusted operating system features to FreeBSD Robert N.M. Watson, NAI Labs/FreeBSD project. FREENIX session, June 28. TrustedBSD is an OS extension to FreeBSD. The project operates a code tree separate from that of FreeBSD. The code will be gradually merged into the FreeBSD. The first trusted features (ACLs and extended attributes for MAC labels) will appear in FreeBSD 5.0. TrustedBSD started with a *rigorous* specification for a FreeBSD kernel. The right approach! The TrustedBSD emphasizes extensive regression tests. More information at trustedbsd.org * Security enhanced Linux Stephen Smalley, NAI Labs Peter Loscocco, NSA. FREENIX session, June 28. The talk describes a security-enhanced Linux, SELinux, being developed at NSA (and now at NAI labs as well). One of the authors still works for NSA. SELinux is a comprehensive mandatory access control (MAC) system, which controls access to files, sockets, IPC channels, etc. Unlike traditional trusted systems, it's highly flexible and configurable. SELinux lets run most applications unchanged. Out of the box, SELinux supports two particular security models: type enforcement and role-based access control (RBAC). Other models can be configured if desired. The type enforcement model attaches attributes to various domains (eg., init domain, getty domain, user domain). Each domain has its own set of permissions, and a set of processes. Domain labels of processes are inherited. Every file in system is given a type. A policy is an association between domain labels (of processes) and type labels (of files). Overhead: 10-33% on microbenchmarks, 10% in network latency. A macro-benchmark (webstone) shows _no_ perceptible overhead, either in latency or the throughput. Policies can be fairly complex. SELinux has its own version of tar, which allows for extended file attributes carrying MAC labels. * Plan9/Inferno This year, Vita Nuova, LLC had a booth at a USENIX expo. The booth was busy: It appears Plan9 and Inferno elicit interest. Inferno runs on PalmOS and iPAQ! I met and talked with Roger Peppe, whom I previously communicated with via e-mail. He also showed me a bit of Inferno, which was running on his laptop. Inferno has security built into the OS network protocol stack. When you initiate or listen to a connection, you need to specify what security protocol to use: none, rc4,, etc. Thus support for SSL, authentication and encryption is transparent. Authentication and authorization of applications is built into the OS. The management of permissions is very easy -- just like the traditional management of file permissions. In Inferno, everything is a "file" -- or at least looks like a file. The ability to write network servers and clients in Limbo (which functions as a command-line shell) is fascinating. * Scripting for PalmOS Brian Ward, U. Chicago. FREENIX session, June 28. Programming PalmOS in C is highly cumbersome. BTW, PalmOS toolbox is rather similar to MacOS toolbox. Once you've done writing all the handlers, you don't have any energy to program the meat of your application. OTH, PalmOS applications look rather similar to db-backed dynamic web applications. Hence HLL -- a scripting language for Palm, based on PHP, with "actions" (callbacks). The idea is very similar to Wireless Markup language WML. However HLL is very immature. Why didn't he implement WML or Tcl? I wonder why this paper was accepted in the first place -- it is _very_ preliminary. * Nickle: Language principles and pragmatics Bart Massey, Portland State U. Keith Packard, XFree86 Project and SuSE Inc. FREENIX session, June 28. The talk described Nickle -- a scripting language 15 years in the making. The idea is to develop a language like Maple -- for numerical modeling and prototyping. Nickle supports unlimited precision integer numbers and rationals. It has a byte-compiler and byte-interpreter, mark-and-sweep GC, and the full call/cc. When I enquired if call/cc is indeed re-enterable and if they considered all ramifications, e.g., to building an argument list, Bart Massey hesitated. The design goals of Nickle would be easily satisfied by ML. Bart Massey asserted that they wanted to have a more imperative language. * The design and implementation of the NetBSD rc.d system Luke Mewburn, Wasabi Systems. FREENIX session, June 28. A new rc.d system in NetBSD is similar to that of SYSV. Every rc.d/* script starts or stops a service. The order of execution is far superior. Every rc.d/* script must contain specialized comments: 'provide', 'require', 'before', 'keyword'. The init process, upon start-up, runs rcorder. The latter reads all rc.d/* scripts, resolves dependencies, computes the global order and runs the scripts accordingly. The keyword pseudo-comment lets us associate arbitrary labels with a script. We can run "rcorder -r kw" to execute only those scripts that contain the specific keyword/label. "rcorder -s kw" stops a specific service. rc.d system is an interesting application of modules in programming languages. * User-level checkpointing for LinuxThreads programs William R. Dieter and James E. Lumpp, Jr. U. Kentucky FREENIX session, June 28. The goal is to record the state of a running application in a file -- so that the application can be restarted from the remembered state. Checkpointing lets us run a computationally-extensive application piecewise -- when resources are available, perhaps moving from a computer to a computer. Previously no checkpointing system supported multi-threaded applications. The author's system is smart enough to save only that part of application's virtual space which is actually mapped. To find the mapped ranges, the system examines /proc/pid/maps * Are mallocs free of fragmentation? Aniruddha Bohra, Rutgers U. (summer student) Eran Gabber, Bell Labs. FREENIX session, June 28. The paper shows an experimental result that contradicts a well-known paper by Paul Wilson "Memory fragmentation problem solved?" It turns out, memory fragmentation problem is not solved. Different implementations of malloc (even those that based on the same algorithm) vary in quality. Some of them -- in particular, malloc(3X) of Solaris, which is billed as a "memory-efficient malloc" -- fragment memory. This fragmentation manifests as a memory leak. As the authors showed, a particular long-running application eventually crashed because it ran out of memory. When the authors recompiled the application to use a different version of malloc() -- KpH malloc from BSD -- the application ran in constant memory with fragmentation of only 30.5%. The authors observed no correlation between the speed of malloc() and the fragmentation it causes. Solaris "space-efficient" malloc(3X) is the slowest and causes the largest fragmentation. Doug Lea's malloc and PhK from BSD 4.2 are among the best. * Super-BSD BOF Just like the last year, there was a (long) BoF for all flavors of BSD. A large conference room was mostly filled -- which shows the interest in BSDs. ** NetBSD NetBSD maintains binary compatibility with FreeBSD, SCO and Linux on x86, with Solaris on SPARC, etc. NetBSD project growth: 1997: 73 developers, 13 different platforms, 7 different architectures 2001: 252 developers, 44 different platforms, 16 different architectures The current version is 1.5.1; version 1.6 is planned by the end of the year. That version will have kernel events/queues, faster pipes and IPC. The slides of this good, 1-hr BoF NetBSD talk will be available at www.netbsd.org ** OpenBSD OpenBSD received a DARPA grant to continue their security work. In particular, support for cryptographic accelerators and smart cards. As an example, they cited 2.5 MB/s throughput using 3DES with only 1% CPU utilization. Cryptographic services are integrated into the _kernel_, and provided to user applications (via /dev/crypto). OpenBSD started work on public key cryptographic support. ** FreeBSD As of June 2001, the project had 274 committers (compared to 216 in June of last year). Both AMD and Intel stepped up their FreeBSD efforts (giving documentation, donating hardware, donating software emulators of the forthcoming chips). Microsoft is porting C# to FreeBSD. The current version is 4.3; version 4.4 is coming on Aug 20; 4.5 will be released in Dec. The stable version 5.1 is planned for July next year. It will have SMP/NG with a fully preemptible kernel, an ability to perform a snapshot of a file system. OS X (Darwin) will always stay synchronized with FreeBSD -- there will be no fragmentation between FreeBSD and Mac OS X. ** BSDI talk BSDI has been fully absorbed into Wind River. BSDI is transitioning from Colorado into Alameda and Minnesota. The coming version of BSD/OS will feature a fully preemptible kernel and a fixed-priority, real-time scheduling. It seems BSD/OS is catching up with Solaris -- it's the second UNIX system that can be called hard real-time. * Handwriting recognition on WinCE vs. Palm and Newton A talk with an attendee. Handwriting recognition on WinCE is based on the same Paragraph software that was driving first Newtons. However, in Newton handwriting was well-integrated into the OS. The system will try to recognize a word right after a word was scribbled. A user could correct (or at least, mark) the error. WinCE is basically a Windows platform. The recognition starts only after the entire page has been written. Palm's portable, folding keyboard is really slick! I've seen several people using it to take notes during presentations. * Sandboxing applications Vassilis Prevalakis, U. Penn. Diomidis Spinellis, Athens U. FREENIX session, June 29. The system currently controls all file accesses by an applications via a combination of chroot and user-level NFS or perlfs. An application is launched into a separate filesystem tree, which is perlfs mounted into the main tree. The distinguished tree is transparently mapped into the main tree. However, all file accesses are intercepted and examined for compliance with a policy. The system is transparent to all existing applications, requires no changes to the kernel, portable across UNIX platforms. To establish a policy, a system can be switched to a learning phase. In that phase, the system simply monitors all file accesses by an application -- and derives the profile. At the end of the learning phase this profile becomes a security policy. But learning access patterns is very difficult. It's hard to make sure that the learned application profile is comprehensive. Just as it's hard to make sure that a test of an application covers all (significant) code flow paths, it's difficult to exercise an application so that it accesses all files it may normally access. It's even harder to do that without the source code of the application. A regular user or an overworked sysadm will find the task impossible even with access to the source code. * Building a secure web browser Sotiris Ioannidis, U. Penn Steven M. Bellovin, AT&T-Research. FREENIX session, June 29. The talk was about a more general system -- a SubOS. A secure web browser was a good case study. As it is, the work is nothing more than sandboxing Javascript. It's a very preliminary work. * Citrus project: true multilingual support for BSD operating systems Jun-ichiro Hagino, Internet Initiative Japan, Inc. FREENIX session, June 29. Citrus project is an interface to provide wide-character i/o. The project wrote several (dynamically-loaded) encoders and decoders, which support Unicode and ISO 2022 for a great variety of stateless (e.g., 16-bit Unicode) and stateful (e.g., UTF-8) encodings. A state of the encoder/decoder is visible to the user (and can be checkpointed). * Reverse-engineering instruction encodings Wilson Hsieh, U. Utah; Dawson Engler, Stanford U.; Godmar Back, U. Utah. General refereed session, June 29. The goal is to reverse-engineer an assembler to understand packing of opcode and operand bits within an instruction word. This knowledge can then be used to automatically generate back-ends (assembling macros, to be precise) for just-in-time compilers. Another instance of a template-based programming. The talk appears misguided. They can't actually reverse-engineer an assembler from scratch. The user still has to provide the system with an abstract description of the instruction set. If the user has to read the CPU documentation anyway, he can just as well write down binary instruction encoding. I guess other people had similar impression, therefore the presenter was asked not a single question. * An embedded error recovery and debugging mechanism for scripting language extensions David M. Beazley, U. Chicago. General refereed session, June 29. The goal is to catch exceptions (segmentation faults, zerodivide, etc) that may occur during execution of scripting language extensions. The extensions are typically written in C, and therefore, unsafe. Because the extension is executed within the context of a scripting language, its debugging is very difficult. The goal of the project is to catch an exception, print a stack trace of an error within the extension, and propagate the error back to the scripting host (where it can be handled -- or at least intelligently reported). The system works without any modification or recompilation of the extension. The error recovery and the debugging system clearly involved a number of clever hacks. I got an impression however that the hacks where the end rather than the means. The author obviously enjoyed the complexity of his solution. It appears his goal can be reached far simpler -- by wrapping an extension and interposing on the entry points in its shared library. * Interactive simultaneous editing of multiple text regions Robert C. Miller and Brad A. Myers, CMU General refereed session, June 29. The authors received the best paper award. Another case of an over-engineered solution. The presenter demonstrated his system. It was indeed impressive. He selected several regions of text (which can be discontinuous). He then selected words or groups of words within one phrase -- and the system _inferred_ the corresponding words in the other phrases. When he started editing the first selected phrase, the other selected phrases were edited synchronously. It was fascinating to watch. The crux of the problem was inferring the meaning of user's selection within one phrase, and generalizing it to other phrases. It's an instance of programming by demonstration. The system pre-processes the text heavily to make on-line work easier. A feature is a pattern or a literal string that occurs several times. Features are stored as region sets: sequences of begin/end offsets. The author cited results of a usability study they performed on two groups of CMU undergrads. Given a large repetitive task -- editing a large bibliography and changing its formatting -- the system indeed helped accomplish the work faster. However, as the presenter admitted in response to a question, the error rate for simultaneous editing was about the same as the error rate for people that used a traditional editor -- about 30%. When the system discovers and generalizes user's selections, a user can correct the system and guide it to the selection he wants. Alas, understanding how the system generalizes proved to be difficult. Users tend to overlook subtle errors during the generalization. * High-performance memory-based web servers: kernel and user-space performance Philippe Joubert, ReefEdge, Inc.; Robert B. King, IBM Research; Richard Neves, ReefEdge, Inc.; Mark Russinovich, Winternals Software; John M. Tracey, IBM Research. General refereed session, June 29. A well-researched, _extensive_ paper that identifies all significant bottlenecks in serving static web pages. As a proof, the authors developed a kernel web server that performs three times faster than the best user-land web server. Their server has been tested on Linux and Win2k. The authors have noted that simply moving the web server into the kernel is not enough. A high-performance server must be implemented as a FSM with an efficient event notification (kqueues or LinuxRT signals), avoid scheduling on each request ("cheap interrupts"), use zero-copy networking (share buffers between the filesystem and networking), and avoid the socket interface. IBM heavily invests in Web server acceleration via kernel extensions. The Linux version of the kernel accelerator (as a kernel loadable module) will be released as Open Source. * Web server acceleration via the inverse cache The speed of serving pages is not a very important feature of a web server. Most web servers except really inefficient ones can push data to a client faster than a typical remote client can accept. Most of connections to a web server are slow. Such connections stay on for a long time, taking significant resources. The problem is exacerbated if the served content is dynamic. A slow client ties up not only the OS resources but expensive database connections as well. If the connection rate is high, the number of outstanding connection grows and the system eventually runs out of resources (kernel memory, system memory, database connection pool). An inverse cache is a cheap computer that is located immediately in front of the main web/database/application server. The cache accepts the web server's reply and slowly feeds it to a client. Because the cache quickly accepts the whole reply, the main web server and the database close the connection and free the resources. Serving stored content to slow clients scales very well. Caches are stateless and independent -- if one cache becomes overloaded, we can easily bring up another. I have been pushing this idea all around. At the USENIX Expo, I came across a vendor, RedLine Networks, which seems to have carried it out in a commercial product. www.redlinenetworks.com It appears that IBM uses a similar technique in its Netfinity line of web accelerators. * Storage management for web proxies Elizabeth Shriver, Bell Labs; Lan Huang, SUNY Stony Brook; Eran Gabber, Bell Labs; Christopher A. Stein, Harvard U. General refereed session, June 29. A web server cache uses a filesystem as its backing store. The cache however does not need many of the traditional features of a filesystem: permission checking, random access. Even persistence and recoverability are not strictly necessary -- it's perfectly permissible to lose (cached) data. The authors have written a _toolkit_ to build specialized, high-performance filesystems. One such filesystem is Hummingbird -- a FS for a web cache. It's an entirely user-level (hence, portable) FS that features zero-copy operations, explicit co-location of data on disk, etc. For caching the web content, Hummingbird is 4-8 _times_ more efficient than the conventional FS, UFS. * Active Content: really neat technology or impending disaster? Charlie Kaufman, Iris Associates. Invited talk, June 29. The speaker is a security architect for Lotus Notes. The talk was rather entertaining -- but rather uninsightful and short on answers. Interesting quote: "Internet is getting bigger, user are getting dumber, and dumb users' computer capacity is getting larger." * The future of virtual machines: A VMware Perspective Ed Bugnion, co-founder, VMware, Inc. Invited talk, June 30. It all started with a project "Disco" at Stanford. Stanford has designed a "Stanford Flash" machine -- a NUMA computer (which later became SGI Origin). Programmers needed an OS -- which is very expensive undertaking. Instead, the Flash group wrote a monitor (microkernel, so to speak) that "virtualized" the NUMA hardware and gave an appearance of a uniform memory architecture. The Stanford group could then run the existing SGI IRIX OS on its machine. VMware provides near raw machine performance. For a good introduction to VMware and hosted and hostless (for ESX server) modes see the paper "Virtualizing I/O devices on VMware Workstation's hosted virtual machine monitor" by Jeremy Sugerman and Beng-Hong Lim, VMware. This is the first paper in USENIX'01 proceedings, the general refereed track. VMware jointly with NSA are developing MetTop -- to access NIPRnet/SIPRNet from the same computers (but within totally isolated virtual environments). SELinux acts as the host OS (see above for more details about SELinux). Note the ease of administering multiple guest OSes on the same box: cloning from a "master copy" of an OS, installed, fully configured, and with all needed applications.