Semi-implicit batched remote code execution as staging

Introduction
Refresher: call-by-name, call-by-value, call-by-need
Chourai by examples: remote applications
Remote conditional execution: the album example
The implementation of Chourai
Brief comparison with related work

Introduction

Batching several remote-procedure or remote-object operations into one request decreases the number of network client/server round-trips, reduces the communication overhead and indeed significantly improves performance of distributed applications. The benefits are offset by the cost of restructuring the code to incite large batches; and by the increase in the difficulty of reasoning about the code, predicting its performance let alone establishing correctness. The overall research goal is to reduce the downside.

We describe a semi-implicit batching mechanism that makes the points of remote server communication explicit yet conceals the proxies, saving the trouble of writing them. The changes to the client code are minimal: mainly, adding calls to force. The type-checker will not let the programmer forget to call force. The remote batch server is simple and generic, with no need to optimize for specific clients.

Our mechanism batches both independent and data-dependent remote calls. Our mechanism is compositional, letting the programmer build nested applications and conditional (and, potentially, iterative) statements using composition, application and naming. Writing a remote program is exactly like writing a typed local program, which is type-checked locally, and can even be executed locally (for debugging).

The key insights are treating remote execution as a form of staging (meta-programming), generalizing mere remote function calls to remote applicative and conditional expressions, and introducing an embedded domain-specific language, Chourai, for such expressions. A batch of dependent remote function calls can then be regarded as a complex applicative expression in the A-normal form. Another key insight is that emulating call-by-value via call-by-need surprisingly makes sense.

Version

The current version is December 2010.

References

Staged futures were first presented at the Ninth Meeting of the WG 2.11 IFIP working group. Waterloo, ON, Canada. December 2, 2010

Brief comparison with related work

Our approach is particularly close to BRMI

Eli Tilevich, William R. Cook, Yang Jiao: Explicit Batching for Distributed Objects
Proc. IEEE International Conference on Distributed Computing Systems (ICDCS 2009)
< http://research.cs.vt.edu/vtspaces/brmi/ >

BRMI is a pure Java library that implements proxy objects that, unlike Java RMI objects, do not immediately contact a remote server whenever their operations are invoked. The BRMI proxies record the desired operations and return `futures'. When the client calls BRMI's flush operation, the accumulated object operations are sent to the server in one batch; server replies are distributed to BRMI futures, which can then be queried by the client.

We too use futures to represent the results that might yet to be computed. Unlike BRMI, we do not throw an exception when the value of the future is requested. Rather, the request for the value of a future initiates the communication with the server, for all outstanding futures. Therefore, we do not need flush. Unlike BRMI, and like call-by-need and or declaratively parallel languages, we treat futures strictly as single-assignment entities. The most visible difference (apart from shunning Java or any OOP) is regarding the language of remotely executed expressions as a typed embedded domain-specific language (EDSL).

Somewhat related to BRMI is BEST:

Ali Ibrahim, Yang Jiao, Eli Tilevich, and William R. Cook: Remote Batch Invocation for Compositional Object Services
Proc. European Conference on Object-Oriented Programming (ECOOP 2009).

BEST is notably less explicit than BRMI; BEST is essentially a new Java-like language, with its own compiler (into regular Java).

The idea of futures for batching of remote calls is quite old:

P. Bogle and B. Liskov. Reducing cross domain call overhead using batched futures. OOPSLA'94

Unlike batch futures, we support batching of both related and unrelated function calls and calls that return primitive values. We further support conditional and other control expressions.

Other related work is well reviewed in the papers cited above.

Chourai by examples: remote applications

We describe Chourai on a series of examples. The first, introductory example deals with remote applications, both simple and nested. For concreteness we take the remotely applied function to be the factorial. The second series of album examples, below, is more realistic.

Remotely executed computations are described in a domain-specific language Chourai embedded in OCaml. The signature CHR defines the syntax of the language. Here is a part of CHR needed for the present introductory example. Chourai is typed; a Chourai expression of the type t is represented as an OCaml value of the type t chr; the type constructor chr is abstract. Our current subset of the language includes integer, boolean and string literals, and applications:

     module type CHR = sig
      type 'a chr
      
      val unit    : unit chr
      val int     : int    -> int chr
      val bool    : bool   -> bool chr
      val string  : string -> string chr
     
      val app   : ('a -> 'b) chr -> 'a chr -> 'b chr
     
      val force : 'a chr -> 'a
     end

The signature also has the operation force to obtain the OCaml value represented by a Chourai computation. Invoking force forces the evaluation of the Chourai expression, unless it has been evaluated already. In the latter case, force immediately returns the computed value or an exception.

Eddy Westbrook has pointed out that 'a chr seems to be a co-monad: one may extract the value from the any 'a chr expression (by using the polymorphic function force, aka extract). There is, however, no polymorphic injection function 'a -> 'a chr -- there is no return.

The language Chourai has constants for (remotely applied) functions; our introductory example needs the factorial:

     module type CHRFact = sig
      include CHR
      val fact : (int -> int) chr
     end

Different implementations of the Chourai signature interpret Chourai expressions in different ways. For example, one may implement Chourai to interpret the expressions locally and eagerly, which is useful for debugging. We demonstrate here an implementation that interprets Chourai remotely, on a server accessible through a network socket. The implementation is also a bit lazy. We develop the examples interactively, by submitting expressions to the top-level of the OCaml interpreter after loading the Chourai library. In the transcript below, the OCaml top-level responses are indented. We also show the corresponding trace of the server execution; each line of the trace is indented and prefix with server:

     let t1 = app fact (int 7);;
         val t1 : int Rpc_dsl.BatchFact.chr = <abstr>
     let t2 = app fact (int 4);;
         val t2 : int Rpc_dsl.BatchFact.chr = <abstr>
     
     let t12 = (force t1 - force t2);;
       server: New connection
       server: Evaluating: App Tag:Factorial 7
       server: Evaluating: App Tag:Factorial 4
       server: Responses computed successfully. Replying...
         Waiting for a response
         val t12 : int = 5016

We start by computing the difference 7! - 4!. The OCaml variable t1 is bound to the code of the Chourai expression fact 7. The trace shows no factorial evaluation. One may treat t1 as the future of 7!. Since we do not need the value 7! right away, the execution continues without looking at the future. After entering the future of 4!, we turn to computing the difference, and do need the values of the factorials. We force the futures to extract their values. The first use of force sends all the outstanding computations to the server. The server trace shows the arrival and the execution of both factorial computations. We see the result; the single occurrence of the phrase ``Waiting for a response'' tells of the single communication with the server. One may wonder which of the two forced expressions, force t1 or force t2, was evaluated first. We shall see that it does not matter. Indeed, if we change the example as follows:

     let t1' = app fact (int 7);;
         val t1' : int Rpc_dsl.BatchFact.chr = <abstr>
     let t2' = app fact (int 4);;
         val t2' : int Rpc_dsl.BatchFact.chr = <abstr>
     
     let t12' = (force t2' - force t1');;
       server: New connection
       server: Evaluating: App Tag:Factorial 7
       server: Evaluating: App Tag:Factorial 4
       server: Responses computed successfully. Replying...
         Waiting for a response
         val t12' : int = -5016

We see the same server trace. The server has evaluated 7! and 4! in the order they have been created by the client. One of these expressions was evaluated speculatively -- that is, not forced. The programmer does not need to know which. Despite the lazy-like evaluation (the client may continue without waiting for the result of the remote call, if the result is not needed immediately), Chourai is essentially a strict language. All Chourai expressions shall be executed, in the order they are entered on the client side (assuming that the client uses force at the end). In other words, both client and server computations, by themselves, are executed in the call-by-value order; the order of server computations with respect to client ones may be indeterminate -- which enables for batching.

So far we have batched independent function calls. We now evaluate (3!)!, which requires two remote function calls, with the result of the first one being the argument of the second one.

     let t3 = app fact (int 3);;
         val t3 : int Rpc_dsl.BatchFact.chr = <abstr>
     let t4 = app fact t3;;
         val t4 : int Rpc_dsl.BatchFact.chr = <abstr>
     
     let t4v = force t4;;
       server: New connection
       server: Evaluating: App Tag:Factorial 3
       server: Evaluating: App Tag:Factorial Var5
       server: Responses computed successfully. Replying...
         Waiting for a response
         val t4v : int = 720
     
     let t5 = force t3;;
         val t5 : int = 6

One may view t4 is the code for the nested Chourai application fact (fact 3). The transcript again shows the single communication with the server. Evaluation of t5 gave the result without any communication, because the value of the future t3 is already known.

We now demonstrate exception handling. Our factorial server throws an exception if the argument of fact is negative.

     let t3'  = app fact (int (-1));;
         val t3' : int Rpc_dsl.BatchFact.chr = <abstr>
     let t4'  = app fact t3';;
         val t4' : int Rpc_dsl.BatchFact.chr = <abstr>
     
     force t4';;
       server: New connection
       server: Evaluating: App Tag:Factorial -1
       server: Responses computed partly or not at all. Replying...
         Waiting for a response
         Exception:
         Failure "Skipped due to the earlier: Failure(\"fact on a negative number\")".
     
     force t3';;
         Exception: Failure "Failure(\"fact on a negative number\")".

The server receives two factorial applications to evaluate; the first one throws an exception. In the current policy, the server terminates the batch and sets the result of all expressions in the batch to be exceptions. Whenever the value of the future is demanded on the client, the exception is thrown. The thrown exception may be be the result of an exception in an earlier computation, as is the case of t4'.

We emphasize the difference in the treatment of remote computations between Chourai and BRMI, or Java RMI in general. Remote object invocation (RMI) is typically explained in terms of stubs and proxies: ``A stub implements the remote interface and serves as the remote object's client-side proxy; operations on the stub are forwarded to the remote object'' (Tilevich et al., ICDCS 2009). The functional programming approach is again seems more perspicuous: a remote procedure call is just an application in an embedded DSL. A batch of related calls is the code of a nested application. One may even regard the type constructor chr as code of MetaOCaml. The nested Chourai application fact (fact 3), represented in OCaml as app fact (app fact (int 3)) will result in the following batch when forced:

     [(v1, App Factorial 3)
      (v2, App Factorial v1)]

That batch is the A-normal form of the nested application.

Version

The current version is December 2010.

References

fact_client.ml [5K]
The factorial example: the client code and many regression tests

fact.ml [<1K]
The code for the factorial executed by the server

Remote conditional execution: the album example

Our second series of album examples is taken from the Java RMI literature. One of the examples is the remote linked list traversal, which is a Java RMI micro-benchmark. The album examples are very close to the the remote file server, the running example of Tilevich et al., ICDCS 2009.

The album example shows off boolean and conditional operations of Chourai, encoded in OCaml using the following, previously elided, members of the CHR signature:

     module type CHR = sig
      type 'a chr
      ...         (* see above *)
     
      val app2  : ('a -> 'b -> 'c) chr -> 'a chr -> 'b chr -> 'c chr
     
      val lt    : (int -> int -> bool) chr
     
      val guard : bool chr -> (unit -> 'a chr) -> bool chr
     end

The operation app2 encodes two-argument applications; lt represents a two-argument function testing if the first integer argument is less than the second one. The special form guard is a control structure in Chourai: it takes a test and a (generally, side-effecting) Chourai expression, and evaluates the expression only if the test evaluates to true. The form guard returns the result of the test. Since both OCaml and Chourai are call-by-value, we have to use a thunk to encode the second argument of guard as an unevaluated expression.

The album example is traversing a remote album collection, querying titles and ratings of albums, and deleting albums with low rating. The album example extends Chourai with a number of function constants for the album operations. The most notable change is a custom data type album_descr describing one album. Since albums are managed by the server, album_descr is an abstract data type to the client, representing a reference, or a handle, of one remote album. The album interface is thus:

     val get_album    : (string -> album_descr) chr
     val next_album   : (album_descr -> album_descr) chr
     val get_title    : (album_descr -> string) chr
     val get_rating   : (album_descr -> int) chr
     val delete_album : (album_descr -> string) chr

The function get_album gives the handle of the first album of the collection identified by the string argument. The function next_album, the analogue of list tail, gives the handle of the next album, or raises an exception if the collection is exhausted. The functions get_title and get_rating return the attributes of a remote album, and delete_album asks the server to delete the album represented by album_descr.

The first test asks for the title and the rating of the first album of the large collection. As before, we enter expressions at the OCaml top-level prompt and observe top-level's responses (shown indented in the transcript below). We also show the server trace. The trace for the first test tells of the single client-server roundtrip: the tree remote applications have been batched into one request:

     let t11 = app get_album (string "large");;
         val t11 : Album.album_descr Album.chr
     let t1v = let title = app get_title t11 and rating = app get_rating t11
               in (force title, force rating);;
       server: New connection
       server: Evaluating: App Tag:get_album "large"
       server: Evaluating: App Tag:get_title Var8
       server: Evaluating: App Tag:get_rating Var8
       server: Responses computed successfully. Replying...
         Waiting for a response
         val t1v : string * int = ("Title 100", 1)

Our next test gets the title and the rating of the n-th album in the collection. Since the server supports only the list-like interface of the collection, we have to execute the next_album operation n-1 times. To make the traversal of the remote list convenient, we program an operation to skip n albums.

     let rec skip_albums n album =
      if n <= 0 then album
      else skip_albums (pred n) (app next_album album);;
         val skip_albums : int -> Album.album_descr Album.chr -> Album.album_descr Album.chr

We stress that skip_album is a client function, implemented using the remote function next_album. We obtain the title and the rating of the 4th album without further ado:

     let t31 = skip_albums 4 (app get_album (string "large"));;
         val t31 : Album.album_descr Album.chr
     
     let t3v = let title = app get_title t31 and rating = app get_rating t31
               in (force title, force rating);;
       server: New connection
       server: Evaluating: App Tag:get_album "large"
       server: Evaluating: App Tag:next_album Var11
       server: Evaluating: App Tag:next_album Var12
       server: Evaluating: App Tag:next_album Var13
       server: Evaluating: App Tag:next_album Var14
       server: Evaluating: App Tag:get_title Var15
       server: Evaluating: App Tag:get_rating Var15
       server: Responses computed successfully. Replying...
         Waiting for a response
         val t3v : string * int = ("Title 104", 1)

There has been the single round-trip: all remote calls, to traverse the list and to query the final element have been batched into one request. The server trace may remind one of the standard `power' example in partial evaluation: the function power specialized to the statically known exponent. Indeed, like power, skip_album builds a Chourai expression whose traversal loop is fully unrolled since the iteration count is known locally, on the client side. The album handle however is `dynamic' (in partial evaluation terminology).

Our final test is deleting low-rated albums among the first n albums of the collection. The rating threshold and the iteration count are known on the client side (that is, statically). The ratings however are known only by the server (that is, `dynamic'). Since the result of the rating comparison with the threshold cannot be known in advance, we cannot, it seems, execute the whole operation in one batch. As the server traverses the list, it has to ship the rating back to the client; the client would do the comparison and tell the server if the album should be deleted. However, we can instruct the server to do the comparison: Chourai provides the conditional statement guard exactly for that purpose. In Partial Evaluation lingo, since the test is dynamic, conditional expression must be dynamic, or future-stage.

     let t41 = (app get_album (string "large"));;
         val t41 : Album.album_descr Album.chr
     
     let delete_low_rating n =
      let rec loop album i =
        let t = guard (app2 lt (app get_rating album) (int 5)) 
                      (fun () -> app delete_album album) in
        if i >= n then force t else
        loop (app next_album album) (succ i)
      in loop (app get_album (string "large")) 0;;
         val delete_low_rating : int -> bool = <fun>
     
     delete_low_rating 4;;
       server: New connection
       server: Evaluating: App Tag:get_album "large"
       server: Evaluating: App Tag:get_album "large"
       server: Evaluating: App Tag:get_rating Var19
       server: Evaluating: App2 Tag:< Var20 5
       server: Evaluating: Guard Var21 in Let 22 = App Tag:delete_album Var19
       server: Evaluating: App Tag:delete_album Var19
       server: Evaluating: App Tag:next_album Var19
       server: Evaluating: App Tag:get_rating Var24
       server: Evaluating: App2 Tag:< Var25 5
       server: Evaluating: Guard Var26 in Let 27 = App Tag:delete_album Var24
       server: Evaluating: App Tag:next_album Var24
       server: Evaluating: App Tag:get_rating Var29
       server: Evaluating: App2 Tag:< Var30 5
       server: Evaluating: Guard Var31 in Let 32 = App Tag:delete_album Var29
       server: Evaluating: App Tag:delete_album Var29
       server: Evaluating: App Tag:next_album Var29
       server: Evaluating: App Tag:get_rating Var34
       server: Evaluating: App2 Tag:< Var35 5
       server: Evaluating: Guard Var36 in Let 37 = App Tag:delete_album Var34
       server: Evaluating: App Tag:next_album Var34
       server: Evaluating: App Tag:get_rating Var39
       server: Evaluating: App2 Tag:< Var40 5
       server: Evaluating: Guard Var41 in Let 42 = App Tag:delete_album Var39
       server: Evaluating: App Tag:delete_album Var39
       server: Responses computed successfully. Replying...
         Waiting for a response
         - : bool = true

We program delete_low_rating in the most straightforward way. It is again a client function that builds a server computation. In other words, it is a Chourai macro, a Chourai generator. The appearance of app and app2 marks Chourai expressions; the other forms are OCaml. The OCaml type checker makes sure the whole operation is well-typed; the type checker also watches for stage confusion. We can tell which expression belongs to which stage (whether it is a Chourai or a client expression) just by looking at the type. In our test, all even-numbered albums have low rating. The server trace shows that indeed every other album is deleted. The whole operation is executed in one batch.

The DSL perspective on remote procedure invocation helped us to see mobile code behind RPC. We can execute more than just one simple remote call; a simple interpreter gives us a language of `mobile' code, of distributed computations. The language is lightweight; the language is typed and bi-call-by-value, helping programmers reason about correctness and resource usage.

Version

The current version is December 2010.

References

album_client.ml [6K]
The album example: the album interface and the client code, along with many tests.

album.ml [4K]
Client and server functions for the album example. The code demonstrates extending Chourai with a custom, serializable data type.

The implementation of Chourai

Semi-implicit batched remote code execution is implemented as an embedded DSL (Chourai) on the client side, and as a server-side interpreter. Here is the implementation, as an OCaml library.

Version

The current version is December 2010.

References

rpc_dsl.mli [2K]
The syntax of the typed domain-specific language Chourai of remote computation: the signature CHR

rpc_dsl.ml [6K]
The implementation of the signature CHR to remotely and perhaps speculatively execute sequences of EDSL expressions compiled into the A-normal form

rpc_t.mli [<1K]
The representation for Chourai expressions and values, communicated between a client and a server

future.mli [<1K]
future.ml [2K]
Futures: the representation for values that are possibly not yet known

server.ml [5K]
The Chourai server
The main part of the server is the implementation of the function eval to evaluate a primitive application, and of the function execute to evaluate a sequence of (A-normal) expressions and bind their results to variables.

wire.mli [3K]
wire.ml [5K]
The wire format; serializers and deserializers for basic datatypes; generic, typed serialization and de-serialization based on type representations

server_comm.mli [<1K]
server_comm.ml [2K]
Communication routines of the server

config.ml [<1K]
Configuration data such as the parameters of the client-server connection

Makefile [2K]
How to compile the library and run the tests and examples

Refresher: call-by-name, call-by-value, call-by-need

We illustrate call-by-name, -value, and -need evaluation orders using three typed tagless-final OCaml embeddings of the same simple typed higher-order language. The three implementations are three OCaml structures implementing the same signature. We thus recap two of the key approaches of our project -- the tagless-final embedding and the implementation of call-by-need.

Version

The current version is December 2010.

References

Parametrizing expressions by the evaluation order
Detailed explanation of the underlying `tagless final' approach

call_by_any.ml [6K]
Commented OCaml code with explanations

CBAny.hs [7K]
The exemplar Haskell code

Last updated December 7, 2010

This site's top page is http://okmij.org/ftp/

oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!

Converted from HSXML by HSXML->HTML