;; -*-mode: Outline;  -*-

TOC
   General
   Awards
   Linux: A strategic disruptive force
   Myths, missteps, and folklore in protocol design (the best talk)
   A toolkit for user-level file systems
   LOMAC -- Mandatory Access Control (MAC) you can live with
   TrustedBSD: adding trusted operating system features to FreeBSD
   Security enhanced Linux
   Plan9/Inferno
   Scripting for PalmOS
   Nickle: Language principles and pragmatics
   The design and implementation of the NetBSD rc.d system
   User-level checkpointing for LinuxThreads programs
   Are mallocs free of fragmentation?
   Super-BSD BOF
   Handwriting recognition on WinCE vs. Palm and Newton
   Sandboxing applications
   Building a secure web browser
   Citrus project: true multilingual support for BSD operating systems
   Reverse-engineering instruction encodings
   An embedded error recovery and debugging mechanism for scripting
	language extensions
   Interactive simultaneous editing of multiple text regions
   High-performance memory-based web servers: kernel and user-space performance
   Web server acceleration via the inverse cache
   Storage management for web proxies
   Active Content: really neat technology or impending disaster?
   The future of virtual machines: A VMware Perspective


* General
June 28-30, 2001, Boston, MA (Marriott Copley Place)

Attendance: ca. 1500 people, down from the last year's 1700. Last year
was the 25th anniversary of USENIX. Furthermore, this year economy is
slower, and travel budgets of many companies, including Lucent and
AT&T, are lower.

The conference is selective: in the general track, 24 out of submitted
92 papers were accepted. In the FREENIX track, 27 out of 58 submitted
papers were accepted.

* Awards

The GNU project received the Lifetime USENIX achievement award (aka
the "Flame" award).
The Kerberos project received the Software Tool User Group award.


* Linux: A strategic disruptive force
Daniel D. Fry, Director of IBM Linux Technology center
Keynote address

Dr. Fry received his PhD in theoretical atomic physics from John
Hopkins U.

The most remarkable facet of his presentation is that he is an IBM
officer who can speak for IBM Linux policy. In his address, Dr. Fry
repeatedly stated that IBM is committed to Linux, and gave reasons for
that. His statements are _official_. Daniel Fry seems a rather
forbiddable proponent for Linux.  A part of his appeal is that he
speaks for IBM. The other part is that he speaks the language of the
boardroom; he makes arguments that carry weight among corporate
executives. If any organization/company seeks arguments for using
Linux, they should invite Dr. Fry to speak. What follows is a close
transcript of his keynote address.

IBM is committed to Linux because doing so *is good business*. IBM has
recognized the commercial potential of Open Source. IBM thinks that
skills, desire and open culture, combined, is a long-term, sustainable
force. IBM believes not necessarily in Open Source, but definitely in
Open Standards (TCP/IP, HTTP, XML) and open culture. "IBM thinks that
Linux is a high-value proposition for IBM customers and shareholders."

Linux value for IBM:
   - growing marketplace acceptance
   - No customer lock-in. The end user is in charge. For IBM as a
_service_ provider, this is a good thing. No single vendor can lock
out IBM, can prevent IBM from marketing its services to a client.
   - many people are comfortable with Linux (especially recent college
graduates)
   - Industry-wide initiative
   - Multi-platform (x86, PowerPC, SPARC, HP/PA, mainframes, ARM, etc)
   - Basis of innovation. IBM thinks Linux is the reference platform
today for new, innovative applications. IBM noted there are 23,000
_business_  applications on Linux (not games or development tools).

"IBM believes people with mission-critical databases will run Linux"
[sic!]. Not now, but very soon. "IBM sees Linux as a key enabling
technology for the next generation of e-business."

Linux development _myths_:
   - Open Source is an undisciplined process
   - Linux is less secure
   - No acceptance of Enterprise features
   - Linux will fragment
   - Traditional vendors can't participate

IBM Linux technology center was established with a mission to
accelerate maturation of Linux into the Enterprise. The center employs
approx 230 people. IBM pays them to work with Linux community. IBM has
recently announced:
   - a port of its Journaled File system into Linux. JFS gives
improved reliability and performance, compared to the native ext2fs
Linux file system, which offers no reliability guarantees.
   - a customer workload test project (joint with SGI). An 8-way SMP Linux
system ran IBM DB2 at 95% load level for 96 hours.

IBM came to realize that it _can_ work with Open Source -- and make
money off it:
   - IBM sells hardware underneath Linux
   - IBM sells business applications on the top of Linux
   - IBM sells services all over Linux
Therefore, the fact that OS is free is irrelevant to business of IBM.
IBM has made this transition to Linux and Open Source because of his
customers: customers were asking for Linux, and that brought about the
change in the IBM attitude.

I think there is another reason IBM is so hot on Linux. Daniel Fry
emphasized that Open Source and Linux in particular are the
"disruptive, creative force." This force can dislodge the existing
players in the Enterprise market -- Microsoft and to a lesser extent
Sun and HP. That benefits IBM. IBM wants to sell services and
applications -- not the OSes. Closed operating systems -- Windows or
Solaris -- make it difficult for IBM to reach many customers, make it
difficult to port applications. OpenSource OSes make the Enterprise
market open, free from lock-ins by OS system vendors. This directly
benefits IBM.


* Myths, missteps, and folklore in protocol design
Radia Perlman, Sun Microsystems Eastern Labs. Invited talk, June 30.

This was the best presentation of the whole conference.
Perhaps the slides will be available at usenix.org. 

The talk was an enumeration of mistakes and disasters in Internet
protocol design (some disasters have happened, some are waiting to
happen). The motivation of the talk was a plea to learn from mistakes
and be less religious about protocols and the Internet. Let's design a
protocol which has the most merits (rather the one which is "truer"
IP). The talk made me think far less of IETF. They have made a few
good decisions early on. The early success seems to have gotten in
their heads -- they are making worse and worse decisions.

Example 1: routers, bridges, and switches.

As it turns out, bridges were introduces _after_ routers. Radia
Perlman can authoritatively say that as she has designed the first
bridge. Before that she was a chief designer for DECNet. After
Ethernet was introduced and became popular, people have flocked to
Ethernet, to her chagrin. DECNet had a level 3 protocol (and can relay
packets from one net to another) but Ethernet was a _local_ area
network. Ethernet proponents claimed that it was enough. Then one day
Radia Perlman's manager came to her and said that he wanted her to
design a way to relay packets between two Ethernet segments: Ethernet
had inherent size limitations. Routers could connect two networks --
but routers were single protocol and expensive. The managers wanted a
cheaper, and multi-protocol way. Thus the bridge was born. Unlike
DECNet, Ethernet frames had no hop count. Therefore, Radia Perlman had
to invent a spanning tree algorithm to prevent network "divergence" in
the presence of loops.

Bridges emulate CLNP -- an ISO version of IP. CLNP (ISO 8473) was
rather close to IP but better in many ways. CLNP had a hierarchical
addressing and offered a mobility within a campus: a user can move his
computer from one router (connectivity point on campus) to another
without any need to change addresses and other protocol settings. CLNP
integrated a telephone network -- regular telephone numbers can serve
as (a part of) node address. When the limitations of IP became
apparent, CLNP looked like a deserving successor. Unfortunately, CLNP
was developed by ISO -- an anathema to IETF. Therefore, CLNP had no
chances within IETF. This is very sad.

Example 2:
IP multicast -- 12 years of design and no result. The obvious idea
(implemented in ATM) is to make a multicast group resemble a tree. A
multicast group id should include the address of the group's root
node. Therefore, any node wishing to join a group will send a message
to its gateway; the gateway will figure out the address of the root
from the multicast group id and forward the request to the root --
remembering the path the request came from. Alas, this simple idea was
abandoned for reasons nobody can remember. Far more complex schemes
have been designed. IETF has adopted a design axiom that multicast on
level 3 should look the same as broadcast over the Ethernet -- as an
article of faith. All the IETF schemes use flat multicast group
ids. All of the schemes require flooding of the whole Internet with
join or group discovery messages. No wonder that after 12 years of
deliberation multicast is hardly deployed anywhere. MBone just doesn't
scale.

Example 3: Great ARPAnet flood. LSP -- an IMP routing advertisement
protocol -- was unstable. One day that was apparent to everybody:
ARPAnet stopped functioning because the routers kept endlessly
circulating the growing sequence of advertisements, eventually
saturating all their queues and the links. Fortunately, the network
was small that time, and by sheer luck, people who maintained the
network were the same people who designed it.

Example 4:
Interdomain routing. Somehow it was thought that interdomain routing
must be different from intra-domain routing. First came RIP, which was
the obvious failure. Then came EGP -- which had no metric and so it
couldn't handle loops. The latest incarnation is BGP. BGP is
incredibly complex -- furthermore, it is unstable. Every node can set
its own policies; alas, there is no mechanism to guarantee the
policies are globally consistent. Furthermore, it is known that
policies can diverge: a site A may receive a BGP advertisement from
site B and change the routes it advertises. That change may cause site
B to alter its advertisement -- etc. The great ARPAnet flood will
inevitably repeat -- with catastrophic results this time.

Example 5: IPSec.
Photurus was a great key exchange protocol -- resistant to DoS
attacks. Somehow IETF killed it and instead adopted a DoS-prone ISAKMP
-- which was designed at NSA!


* A toolkit for user-level file systems
David Mazieres, NYU. FREENIX session, June 29.

The author received the best FREENIX paper award -- at least this time
it was deservingly. 

Custom, virtual filesystems (FS) such as ftpfs, zipfs, encryption fs
are increasingly relying on NFS, because NFS, unlike the kernel V-node
interface, is portable and standard. A user needs to build an NFS
server -- which listens to NFS requests and implements them in terms
of the user-defined FS. Because such NFS server often sits on the same
computer that uses the custom FS, the server is called a loopback
server. One example of a loopback server is an Alex FTP server, which
permits "mounting" of an anon FTP server. After a client NFS-mounts
such a server, the client can browse remote FTP site as if it were a
local FS.

However, all existing NFS servers have drawbacks. For one, they can
deadlock in a situation when the OS buffer cache is full. Scenario: A
client sends a 'read' request to the loopback server. The loopback
server accesses a page of code or data that is currently paged out. As
the buffer cache is full, to bring a new page into the physical
memory, the system has to page out some currently mapped pages. If
such pages are dirty, the OS has to write out their content to the
backing store first, before freeing the pages. Suppose the dirty page
to be evicted from memory belongs to a NFS mounted FS.  The OS
therefore will ask the NFS server to write out the dirty page. But the
NFS server is blocked on the page fault: hence the deadlock of the
whole OS.  The other problem with loopback server is performance: a
single slow file holds the whole server.

An SFS toolkit -- the topic of the talk -- is to _construct_ loopback
NFS server that takes care of threading and other boring and
difficult- to-get-right details. The toolkit includes its own RPC
compiler for all NFS-related RPC messages. The compiler can generate
stubs/skeletons that permit convenient tracing of NFS messages. The
latter greatly help in debugging the server and analyzing the
load. Another part of the toolkit is libasync -- an asynchronous RPC
library (traditional RPC are synchronous).

The toolkit and the libasync library is written in C++. It uses a lot
of C++ hacks to basically create a closure (partially evaluate a
function), to capture the "continuations", and do reference-counting
automatic memory management.  Indeed, asynchronous RPC programming is
largely CPS. During the discussion, a person asked why such a helpful
and needed toolkit as SFS hasn't been implemented before. David
Mazieres replied that he first tried to implement SFS in plain C. The
complexity of memory managements for closures and continuations was
staggering. He gave up. It's only now, David Mazieres said, that C++
provides tools (templates and partial template instantiation) that he
could use to write his hacks to manage memory and create
continuations. Of course the question is why not to use a better
language, where garbage collection, closures and continuations are
built in. Note of caution though: during some of the processing, the
loopback server cannot afford to take a page fault. Therefore, David
Mazieres locks the whole stake and the needed heap areas during such
processing. Any other implementation language must likewise let the
programmer lock areas of memory and guarantee page-fault-free
processing.

See www.fs.net for code and more details.


* LOMAC -- Mandatory Access Control (MAC) you can live with
Timothy Fraser, NAI Labs. FREENIX session, June 28.

LOMAC is a Mandatory Access Control (MAC) system with only two levels,
lo and hi. LOMAC's distinguished features is that it works with the
existing Linux kernels and applications without any recompilation, and
it is largely invisible to traditional users. LOMAC requires no
patches to the kernel, and _no site-specific configurations_. Thus
LOMAC places little burden on the sysadm. LOMAC is a loadable kernel
module. You install it, load it in -- and LOMAC instantly starts to
work.

LOMAC is a MAC with two levels. A hi level is reserved for system
applications: daemons, system services, files it /etc and
/bin. Everything else is of lo security. 

LOMAC rules:
  - a lo-level process can't write to hi-level files or alter hi-level
directories
  - if a hi-level process accesses a lo-level file or directory, the
process is demoted to the lo-level

LOMAC assigns _files_ to levels statically. LOMAC starts all processes
at the hi level. Once a process accesses a lo-level file (including
its own executable file, or a network device), the process is
demoted. Thus every process that reads off the network is
automatically demoted. Slick! The protection is indeed
transparent. LOMAC guards against all existing and _future_ Trojan
horses and buffer overflow attacks. Buffer overflow attack still works
-- but it becomes useless for the attacker. As soon as the privileged
process accesses intruder's files or the network, the process is
automatically demoted (and cannot modify any system configuration
afterwards).

Implementation: system call interposition within the kernel. Overhead:
on micro-benchmarks:
   4K file copy: 2.8%
   256-byte file copy: 9.5%
on a macro benchmark (Linux kernel compile): 3.1%

LOMAC has a trusted file update program. SSH is exempt from the
demotion rule. This makes it possible to administer the system
remotely. Note, su does _not_ work. A non-privileged process cannot
modify privileged data -- period.

www.nailabs.com, click on open Source.
See also on SourceForge.


* TrustedBSD: adding trusted operating system features to FreeBSD
Robert N.M. Watson, NAI Labs/FreeBSD project. FREENIX session, June 28.

TrustedBSD is an OS extension to FreeBSD. The project operates a code
tree separate from that of FreeBSD. The code will be gradually merged
into the FreeBSD. The first trusted features (ACLs and extended
attributes for MAC labels) will appear in FreeBSD 5.0.

TrustedBSD started with a *rigorous* specification for a FreeBSD
kernel. The right approach! The TrustedBSD emphasizes extensive
regression tests. More information at trustedbsd.org


* Security enhanced Linux
Stephen Smalley, NAI Labs
Peter Loscocco, NSA. FREENIX session, June 28.

The talk describes a security-enhanced Linux, SELinux, being developed
at NSA (and now at NAI labs as well). One of the authors still works
for NSA.

SELinux is a comprehensive mandatory access control (MAC) system,
which controls access to files, sockets, IPC channels, etc. Unlike
traditional trusted systems, it's highly flexible and
configurable. SELinux lets run most applications unchanged.

Out of the box, SELinux supports two particular security models: type
enforcement and role-based access control (RBAC). Other models can be
configured if desired.

The type enforcement model attaches attributes to various domains
(eg., init domain, getty domain, user domain). Each domain has its own
set of permissions, and a set of processes. Domain labels of processes
are inherited. Every file in system is given a type. A policy is an
association between domain labels (of processes) and type labels (of
files).

Overhead: 10-33% on microbenchmarks, 10% in network latency. A
macro-benchmark (webstone) shows _no_ perceptible overhead, either in
latency or the throughput.

Policies can be fairly complex.

SELinux has its own version of tar, which allows for extended file
attributes carrying MAC labels.


* Plan9/Inferno

This year, Vita Nuova, LLC had a booth at a USENIX expo. The booth was
busy: It appears Plan9 and Inferno elicit interest. Inferno runs on
PalmOS and iPAQ!

I met and talked with Roger Peppe, whom I previously communicated with
via e-mail. He also showed me a bit of Inferno, which was running on
his laptop. Inferno has security built into the OS network protocol
stack. When you initiate or listen to a connection, you need to
specify what security protocol to use: none, rc4,, etc. Thus support
for SSL, authentication and encryption is transparent. Authentication
and authorization of applications is built into the OS. The management
of permissions is very easy -- just like the traditional management of
file permissions. In Inferno, everything is a "file" -- or at least
looks like a file. The ability to write network servers and clients in
Limbo (which functions as a command-line shell) is fascinating.


* Scripting for PalmOS
Brian Ward, U. Chicago. FREENIX session, June 28.

Programming PalmOS in C is highly cumbersome. BTW, PalmOS toolbox is
rather similar to MacOS toolbox. Once you've done writing all the
handlers, you don't have any energy to program the meat of your
application. OTH, PalmOS applications look rather similar to db-backed
dynamic web applications. Hence HLL -- a scripting language for Palm,
based on PHP, with "actions" (callbacks). The idea is very similar to
Wireless Markup language WML. However HLL is very immature. Why didn't
he implement WML or Tcl? I wonder why this paper was accepted in the
first place -- it is _very_ preliminary.


* Nickle: Language principles and pragmatics
Bart Massey, Portland State U.
Keith Packard, XFree86 Project and SuSE Inc. FREENIX session, June 28.

The talk described Nickle -- a scripting language 15 years in the
making. The idea is to develop a language like Maple -- for numerical
modeling and prototyping. Nickle supports unlimited precision integer
numbers and rationals. It has a byte-compiler and byte-interpreter,
mark-and-sweep GC, and the full call/cc. When I enquired if call/cc is
indeed re-enterable and if they considered all ramifications, e.g., to
building an argument list, Bart Massey hesitated.

The design goals of Nickle would be easily satisfied by ML. Bart
Massey asserted that they wanted to have a more imperative
language. 


* The design and implementation of the NetBSD rc.d system
Luke Mewburn, Wasabi Systems. FREENIX session, June 28.

A new rc.d system in NetBSD is similar to that of SYSV. Every rc.d/*
script starts or stops a service. The order of execution is far
superior. Every rc.d/* script must contain specialized comments:
'provide', 'require', 'before', 'keyword'. The init process, upon
start-up, runs rcorder. The latter reads all rc.d/* scripts, resolves
dependencies, computes the global order and runs the scripts
accordingly. The keyword pseudo-comment lets us associate arbitrary
labels with a script. We can run "rcorder -r kw" to execute only those
scripts that contain the specific keyword/label. "rcorder -s kw" stops
a specific service.

rc.d system is an interesting application of modules in programming
languages.


* User-level checkpointing for LinuxThreads programs
William R. Dieter and James E. Lumpp, Jr. U. Kentucky
FREENIX session, June 28.

The goal is to record the state of a running application in a file --
so that the application can be restarted from the remembered
state. Checkpointing lets us run a computationally-extensive
application piecewise -- when resources are available, perhaps moving
from a computer to a computer. Previously no checkpointing system
supported multi-threaded applications.

The author's system is smart enough to save only that part of
application's virtual space which is actually mapped. To find the
mapped ranges, the system examines /proc/pid/maps


* Are mallocs free of fragmentation?
Aniruddha Bohra, Rutgers U. (summer student)
Eran Gabber, Bell Labs. FREENIX session, June 28.

The paper shows an experimental result that contradicts a well-known
paper by Paul Wilson "Memory fragmentation problem solved?"

It turns out, memory fragmentation problem is not solved. Different
implementations of malloc (even those that based on the same
algorithm) vary in quality. Some of them -- in particular, malloc(3X)
of Solaris, which is billed as a "memory-efficient malloc" -- fragment
memory. This fragmentation manifests as a memory leak. As the authors
showed, a particular long-running application eventually crashed
because it ran out of memory. When the authors recompiled the
application to use a different version of malloc() -- KpH malloc from
BSD -- the application ran in constant memory with fragmentation of
only 30.5%.

The authors observed no correlation between the speed of malloc() and
the fragmentation it causes. Solaris "space-efficient" malloc(3X) is
the slowest and causes the largest fragmentation. Doug Lea's malloc and
PhK from BSD 4.2 are among the best.


* Super-BSD BOF

Just like the last year, there was a (long) BoF for all flavors of
BSD. A large conference room was mostly filled -- which shows the
interest in BSDs.

** NetBSD
NetBSD maintains binary compatibility with FreeBSD, SCO and Linux on
x86, with Solaris on SPARC, etc. 

NetBSD project growth:
   1997: 73 developers, 13 different platforms, 7 different architectures
   2001: 252 developers, 44 different platforms, 16 different architectures

The current version is 1.5.1; version 1.6 is planned by the end of the
year. That version will have kernel events/queues, faster pipes and
IPC. The slides of this good, 1-hr BoF NetBSD talk will be available at
www.netbsd.org

** OpenBSD

OpenBSD received a DARPA grant to continue their security work. In
particular, support for cryptographic accelerators and smart cards. As
an example, they cited 2.5 MB/s throughput using 3DES with only 1% CPU
utilization. Cryptographic services are integrated into the _kernel_,
and provided to user applications (via /dev/crypto). OpenBSD started
work on public key cryptographic support.

** FreeBSD

As of June 2001, the project had 274 committers (compared to 216 in
June of last year).

Both AMD and Intel stepped up their FreeBSD efforts (giving
documentation, donating hardware, donating software emulators of the
forthcoming chips). Microsoft is porting C# to FreeBSD.

The current version is 4.3; version 4.4 is coming on Aug 20; 4.5 will
be released in Dec. The stable version 5.1 is planned for July next
year. It will have SMP/NG with a fully preemptible kernel, an ability
to perform a snapshot of a file system.

OS X (Darwin) will always stay synchronized with FreeBSD -- there will
be no fragmentation between FreeBSD and Mac OS X.

** BSDI talk

BSDI has been fully absorbed into Wind River. BSDI is transitioning
from Colorado into Alameda and Minnesota. The coming version of BSD/OS
will feature a fully preemptible kernel and a fixed-priority, real-time
scheduling. It seems BSD/OS is catching up with Solaris -- it's the
second UNIX system that can be called hard real-time.


* Handwriting recognition on WinCE vs. Palm and Newton
A talk with an attendee.

Handwriting recognition on WinCE is based on the same Paragraph
software that was driving first Newtons. However, in Newton
handwriting was well-integrated into the OS. The system will try to
recognize a word right after a word was scribbled. A user could
correct (or at least, mark) the error. WinCE is basically a Windows
platform. The recognition starts only after the entire page has been
written. 

Palm's portable, folding keyboard is really slick! I've seen several
people using it to take notes during presentations.


* Sandboxing applications
Vassilis Prevalakis, U. Penn.
Diomidis Spinellis, Athens U. FREENIX session, June 29.

The system currently controls all file accesses by an applications via
a combination of chroot and user-level NFS or perlfs. An application
is launched into a separate filesystem tree, which is perlfs mounted
into the main tree. The distinguished tree is transparently mapped
into the main tree. However, all file accesses are intercepted and
examined for compliance with a policy. The system is transparent to
all existing applications, requires no changes to the kernel, portable
across UNIX platforms.

To establish a policy, a system can be switched to a learning
phase. In that phase, the system simply monitors all file accesses by
an application -- and derives the profile. At the end of the learning
phase this profile becomes a security policy.

But learning access patterns is very difficult. It's hard to make sure
that the learned application profile is comprehensive. Just as it's
hard to make sure that a test of an application covers all
(significant) code flow paths, it's difficult to exercise an
application so that it accesses all files it may normally access. It's
even harder to do that without the source code of the application. A
regular user or an overworked sysadm will find the task impossible
even with access to the source code.


* Building a secure web browser
Sotiris Ioannidis, U. Penn
Steven M. Bellovin, AT&T-Research. FREENIX session, June 29.

The talk was about a more general system -- a SubOS. A secure web
browser was a good case study. As it is, the work is nothing more than
sandboxing Javascript. It's a very preliminary work.


* Citrus project: true multilingual support for BSD operating systems
Jun-ichiro Hagino, Internet Initiative Japan, Inc. FREENIX session,
June 29.

Citrus project is an interface to provide wide-character i/o. The
project wrote several (dynamically-loaded) encoders and decoders,
which support Unicode and ISO 2022 for a great variety of stateless
(e.g., 16-bit Unicode) and stateful (e.g., UTF-8) encodings. A state
of the encoder/decoder is visible to the user (and can be
checkpointed).


* Reverse-engineering instruction encodings
Wilson Hsieh, U. Utah; Dawson Engler, Stanford U.;
Godmar Back, U. Utah. General refereed session, June 29.

The goal is to reverse-engineer an assembler to understand packing of
opcode and operand bits within an instruction word. This knowledge can
then be used to automatically generate back-ends (assembling macros,
to be precise) for just-in-time compilers. Another instance of a
template-based programming.

The talk appears misguided. They can't actually reverse-engineer an
assembler from scratch. The user still has to provide the system with
an abstract description of the instruction set. If the user has to
read the CPU documentation anyway, he can just as well write down
binary instruction encoding. I guess other people had similar
impression, therefore the presenter was asked not a single question.


* An embedded error recovery and debugging mechanism for scripting language extensions
David M. Beazley, U. Chicago. General refereed session, June 29.

The goal is to catch exceptions (segmentation faults, zerodivide, etc)
that may occur during execution of scripting language extensions. The
extensions are typically written in C, and therefore, unsafe. Because
the extension is executed within the context of a scripting language,
its debugging is very difficult. The goal of the project is to catch
an exception, print a stack trace of an error within the extension,
and propagate the error back to the scripting host (where it can be
handled -- or at least intelligently reported). The system works
without any modification or recompilation of the extension.

The error recovery and the debugging system clearly involved a number
of clever hacks. I got an impression however that the hacks where the
end rather than the means. The author obviously enjoyed the complexity
of his solution. It appears his goal can be reached far simpler -- by
wrapping an extension and interposing on the entry points in its
shared library.


* Interactive simultaneous editing of multiple text regions
Robert C. Miller and Brad A. Myers, CMU
General refereed session, June 29.

The authors received the best paper award. Another case of an
over-engineered solution.

The presenter demonstrated his system. It was indeed impressive. He
selected several regions of text (which can be discontinuous). He then
selected words or groups of words within one phrase -- and the system
_inferred_ the corresponding words in the other phrases. When he
started editing the first selected phrase, the other selected phrases
were edited synchronously. It was fascinating to watch.

The crux of the problem was inferring the meaning of user's selection
within one phrase, and generalizing it to other phrases. It's an
instance of programming by demonstration.

The system pre-processes the text heavily to make on-line work easier.

A feature is a pattern or a literal string that occurs several
times. Features are stored as region sets: sequences of begin/end
offsets.

The author cited results of a usability study they performed on two
groups of CMU undergrads. Given a large repetitive task -- editing a
large bibliography and changing its formatting -- the system indeed
helped accomplish the work faster. However, as the presenter admitted
in response to a question, the error rate for simultaneous editing was
about the same as the error rate for people that used a traditional
editor -- about 30%. When the system discovers and generalizes user's
selections, a user can correct the system and guide it to the
selection he wants. Alas, understanding how the system generalizes
proved to be difficult. Users tend to overlook subtle errors during
the generalization.


* High-performance memory-based web servers: kernel and user-space performance
Philippe Joubert, ReefEdge, Inc.; Robert B. King, IBM Research;
Richard Neves, ReefEdge, Inc.; Mark Russinovich, Winternals Software;
John M. Tracey, IBM Research. General refereed session, June 29.

A well-researched, _extensive_ paper that identifies all significant
bottlenecks in serving static web pages. As a proof, the authors
developed a kernel web server that performs three times faster than
the best user-land web server. Their server has been tested on Linux
and Win2k.  The authors have noted that simply moving the web server
into the kernel is not enough. A high-performance server must be
implemented as a FSM with an efficient event notification (kqueues or
LinuxRT signals), avoid scheduling on each request ("cheap
interrupts"), use zero-copy networking (share buffers between the
filesystem and networking), and avoid the socket interface.

IBM heavily invests in Web server acceleration via kernel
extensions. The Linux version of the kernel accelerator (as a kernel
loadable module) will be released as Open Source.


* Web server acceleration via the inverse cache

The speed of serving pages is not a very important feature of a web
server. Most web servers except really inefficient ones can push data
to a client faster than a typical remote client can accept. Most of
connections to a web server are slow. Such connections stay on for a
long time, taking significant resources. The problem is exacerbated if
the served content is dynamic. A slow client ties up not only the OS
resources but expensive database connections as well. If the
connection rate is high, the number of outstanding connection grows
and the system eventually runs out of resources (kernel memory, system
memory, database connection pool).

An inverse cache is a cheap computer that is located immediately in
front of the main web/database/application server. The cache accepts
the web server's reply and slowly feeds it to a client. Because the
cache quickly accepts the whole reply, the main web server and the
database close the connection and free the resources. Serving stored
content to slow clients scales very well. Caches are stateless and
independent -- if one cache becomes overloaded, we can easily bring up
another.

I have been pushing this idea all around. At the USENIX Expo, I came
across a vendor, RedLine Networks, which seems to have carried it out
in a commercial product. www.redlinenetworks.com It appears that IBM
uses a similar technique in its Netfinity line of web accelerators.


* Storage management for web proxies
Elizabeth Shriver, Bell Labs; Lan Huang, SUNY Stony Brook;
Eran Gabber, Bell Labs; Christopher A. Stein, Harvard U.
General refereed session, June 29.

A web server cache uses a filesystem as its backing store. The cache
however does not need many of the traditional features of a
filesystem: permission checking, random access. Even persistence and
recoverability are not strictly necessary -- it's perfectly
permissible to lose (cached) data.

The authors have written a _toolkit_ to build specialized,
high-performance filesystems. One such filesystem is Hummingbird -- a
FS for a web cache. It's an entirely user-level (hence, portable) FS
that features zero-copy operations, explicit co-location of data on
disk, etc. For caching the web content, Hummingbird is 4-8 _times_
more efficient than the conventional FS, UFS.


* Active Content: really neat technology or impending disaster?
Charlie Kaufman, Iris Associates. Invited talk, June 29.

The speaker is a security architect for Lotus Notes. The talk was
rather entertaining -- but rather uninsightful and short on answers.

Interesting quote: "Internet is getting bigger, user are getting
dumber, and dumb users' computer capacity is getting larger."


* The future of virtual machines: A VMware Perspective
Ed Bugnion, co-founder, VMware, Inc. Invited talk, June 30.

It all started with a project "Disco" at Stanford. Stanford has
designed a "Stanford Flash" machine -- a NUMA computer (which later
became SGI Origin). Programmers needed an OS -- which is very
expensive undertaking. Instead, the Flash group wrote a monitor
(microkernel, so to speak) that "virtualized" the NUMA hardware and
gave an appearance of a uniform memory architecture. The Stanford
group could then run the existing SGI IRIX OS on its machine.

VMware provides near raw machine performance. For a good introduction
to VMware and hosted and hostless (for ESX server) modes see the paper
"Virtualizing I/O devices on VMware Workstation's hosted virtual
machine monitor" by Jeremy Sugerman and Beng-Hong Lim, VMware. This is
the first paper in USENIX'01 proceedings, the general refereed track.

VMware jointly with NSA are developing MetTop -- to access
NIPRnet/SIPRNet from the same computers (but within totally isolated
virtual environments). SELinux acts as the host OS (see above for more
details about SELinux).

Note the ease of administering multiple guest OSes on the same box:
cloning from a "master copy" of an OS, installed, fully configured,
and with all needed applications.