HANSEI is an ordinary OCaml library, with probability distributions represented as ordinary OCaml programs. Delimited continuations let us reify non-deterministic programs as lazy search trees, which we may then traverse, explore, or sample. Thus an inference procedure and a model invoke each other as co-routines. Thanks to the delimited control, deterministic expressions look exactly like ordinary OCaml expressions, and are evaluated as such, without any overhead.
Inference procedures and probabilistic models are both ordinary OCaml functions. Both may be defined and extended by library users; both may appear in each other's code. Performing inference on a model that includes calls to inference procedures lets us parameterize distributions by distributions, and lets inference procedures measure their own accuracy. One application is modeling agents reasoning about each other's limited reasoning.
This is joint work with Chung-chieh Shan.
Abstract of the poster at the NIPS2008 workshop ``Probabilistic Programming: Universal Languages, Systems and Applications''
The paper, in particular, presents a step-by-step refining of the finally-tagless to a very shallow embedding of the probabilistic DSL. That is, probabilistic functions are embedded as host-language functions, calls to them are embedded as host function calls, and random integers can be added using the addition operation of the host language. This approach lets us take advantage of the existing infrastructure of the host language while also reducing notational and interpretive overhead.
This is joint work with Chung-chieh Shan.
Delimited control and breadth-first, depth-first, and iterative deepening search
Another illustration, only in Haskell rather than OCaml, and without probabilities
This is joint work with Chung-chieh Shan.
This is joint work with Chung-chieh Shan.
The interface of the library: the `syntax' of HANSEI
The implementation of the library core: The
dist function, evidence assertion, reification and reflection, call-by-need evaluation, sharing and memoization, reification and memoization in nested scopes
Exact and approximate inference procedures: the explorers of the lazy search tree resulting from reifying probabilistic programs
Dependency: PMap - Polymorphic maps, by Xavier Leroy, Nicolas Cannasse, and Markus Mottl. The original code was augmented with the function
Dependency: Delimited continuations in OCaml
How to compile the library and run the tests and examples
Exact inference, variable elimination, likening to IBAL
Importance sampling with look-ahead: a new, general algorithm for approximate inference; contrast with rejection sampling
Lazy evaluation (or, evidence pulling) and sharing. Probabilistic context-free grammars. Memoization of stochastic functions
Many examples and regression tests of nested inference and nested call-by-need probabilistic computations
``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.
The well-commented code of the model and sample inferences
The well-commented code of the model and the record of time complexity investigations and optimization
We write the model to be independent from the structure of the graph. Typical queries ask for the (distribution of the) values of the attribute at a particular node. Therefore, we express the attribute propagation backwards: rather the specifying how the attribute of a node propagates to its descendants, we describe how a node obtains the attribute from its ancestors. The process is trivially declared in OCaml, as a recursive function that, given a non-root node, finds its ancestors and combines their attributes using noisy-or. We easily extend this function with assertions of available evidence, to support conditional queries.
Although the graph does not have directed cycles, it does have many undirected ones. As we work backwards, we may encounter a given node several times. Each time the value of node's random attribute must be the same. We have to use memoization.
The well-commented code of several variations of the model. All in all, our exact inference scales to the large version of the benchmark, with 19 nodes.
The very large search space and the very low probability of evidence make exact inference or rejection sampling hopeless. IBAL was able to handle the model using a combination of very clever and extensive analyses of model's code. Since HANSEI is embedded in OCaml and is a regular OCaml library, it cannot obtain the code of the whole program. It turns out that with the combination of reification and lazy evaluation we are able to handle this model with accuracy roughly comparable to that of IBAL.
Warm-up: splitting a note sequence in two parts at a random place, and applying a randomly chosen operation (deletion, identity, or a note transposition by various amounts) to both parts of the sequence. Crucially this simplified model does not recursively subdivide and transform the sequence. Therefore, the state space remains sufficiently small even for exact inference. The lazy evaluation was essential.
The full model, with recursive subdivision. The state space is very large, and the probability of evidence is very low, of the order 10^(-7). Only importance sampling is feasible.
A flat region of space is observed by a radar on a 10x10 screen. A number of aircrafts fly through the region. The state (the location and the velocity) of an aircraft at each time step is a stochastic function of its state at the previous time step. With certain probability, the radar detects an aircraft, displaying a blip on the the screen. Due to the limited resolution of the radar, several aircrafts can give rise to the same blip. A number of false alarms may occur, each producing a blip. We observe the screen at several time moments and note the blips. The problem is to estimate the number of aircrafts. We may also want to identify the trajectory of each aircraft. A more general model allows aircrafts to disappear and new aircrafts to appear at each time step.
We found it crucial to sample the state of an aircraft and the location of a false alarm lazily. If the x-coordinate of the detected aircraft does not match the x-coordinate of any observed blip, we immediately report the contradiction (of the possible world with the evidence). We do not need to compute the y-coordinate of the aircraft, let alone the locations of other aircrafts and false alarms. Because of laziness, extending our model to 3D would increase the computation time by 50% (rather than ten-fold, for the 10x10x10 grid).
Sequential Monte Carlo was another necessary technique. We compute the complete distribution for the state of all the aircrafts at each time step. We can use exact inference or our importance sampling. For the ideal case without noise, we did both and observed good agreement, even with only 1500 samples. Computing the state of the system requires roughly the same amount of work per time step. This fact lets us trace the evolution of the system for as long as needed. As the noise decreases, the sampling results seem to converge to the `ground truth' (which can be inferred exactly). For simple cases, the computed results match calculations by hand.
The very end of the code file shows the results for the general case, of three blips moving horizontally. As the time progresses, the observed evidence favors more and more there being three aircrafts.
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >
David Poole: the AILog2 web page
< http://www.cs.ubc.ca/spider/poole/aibook/code/ailog/ailog2.html >
The web page refers to the complete AILog2 code for the airplane-blip example. The code simplifies the problem: first, it restricts aircraft movements to a plane. Then it assumes that observations of the radar screen are incomplete: the screen may contain other blips. The latter assumption is an over-simplification. For our model we used the observation procedure as given in Example 1.3 of Milch's et al. BLOG chapter. Alas, that BLOG code did not specify various priors; we picked what we considered reasonable priors, taking a hint from the AILog model whenever possible.
The re-implementation of the AILog code and the naive but ultimately unsuccessful attempt at the full model.
Nindependent random samples from the uniform distribution of integers within the interval
0..M-1. What is the probability that the sequence of samples is sorted?
Mis large (and so duplicates are unlikely), the probability of the drawn samples being ordered is
1/N!. The task is to estimate that probability in one's favorite probabilistic system, setting
Mto be 1 million and
Nat least 10. With these parameters, the size of the search space is
10^60. Clearly exact inference and rejection sampling are preposterous to consider.
Harik's problem is ideal for a benchmark: it is easy to explain and to model. The expected result is also known. Getting any useful estimate of the result is rather difficult, especially without extensive fine-tuning of the inference algorithm.
Here are the results of solving the problem in HANSEI, using the importance sampling procedure from HANSEI library. In the model, the numbers are represented as lazy lists of bits, which proved crucial. Run times are in seconds on EeePC 701 (633 MHz CPU).
M = 2^20, N = 10 1000 samples runtime 8 secs estimated prob is 1.69e-07 2000 samples runtime 16 secs estimated prob is 1.83e-07 4000 samples runtime 32 secs estimated prob is 1.83e-07 8000 samples runtime 64 secs estimated prob is 2.23e-07 1/10! is 2.76e-07 M = 2^20, N = 20 24000 samples runtime 468 secs estimated prob is 3.64e-20 1/20! is 4.11e-19The convergence is a bit slow. We stress the use of a general-purpose importance sampling procedure, which knows nothing about numbers and numeric order, and which has not been tuned for the problem at hand.
This is joint work with Chung-chieh Shan.
The code for the model and for several experiments with it
Our goal thus is to build a sequence of
N all-different random
variables with the same range: a sample from the sequence should have
no duplicates. If the value of each variable is an integer from
N-1, the problem is then the building of a stochastic
[0..N-1] -> [0..N-1]. Defining a stochastic
relation that is a function demands care: the naively
[0,1] -> [0,1] stochastic relation
f1 below is not a function, which is
clearly seen from the probability table for the tuple
(f1 0, f1 1, f1
0, f1 1) computed by Hansei. (In the OCaml session
transcript below, the responses of the OCaml top-level are
open ProbM;; let uniformly_list lst = List.nth lst (uniform (List.length lst));; val uniformly_list : 'a list -> 'a = <fun> exact_reify (fun () -> let f1 = function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1] in (f1 0, f1 1, f1 0, f1 1)) - : (int * int * int * int) Ptypes.pV = [(0.0625, Ptypes.V (1, 1, 1, 1)); (0.0625, Ptypes.V (1, 1, 1, 0)); (0.0625, Ptypes.V (1, 1, 0, 1)); (0.0625, Ptypes.V (1, 1, 0, 0)); (0.0625, Ptypes.V (1, 0, 1, 1)); (0.0625, Ptypes.V (1, 0, 1, 0)); (0.0625, Ptypes.V (1, 0, 0, 1)); (0.0625, Ptypes.V (1, 0, 0, 0)); (0.0625, Ptypes.V (0, 1, 1, 1)); (0.0625, Ptypes.V (0, 1, 1, 0)); (0.0625, Ptypes.V (0, 1, 0, 1)); (0.0625, Ptypes.V (0, 1, 0, 0)); (0.0625, Ptypes.V (0, 0, 1, 1)); (0.0625, Ptypes.V (0, 0, 1, 0)); (0.0625, Ptypes.V (0, 0, 0, 1)); (0.0625, Ptypes.V (0, 0, 0, 0))]The transcript shows importing the Hansei library and defining a procedure to uniformly sample from a list. The probability table computed next makes it patent that repeated evaluations of
f1 0may give different results:
f1is not a function. We have to use the
memofacility of Hansei to define stochastic functions, such as
exact_reify (fun () -> let f2 = memo (function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1]) in (f2 0, f2 1, f2 0, f2 1));; - : (int * int * int * int) Ptypes.pV = [(0.25, Ptypes.V (1, 1, 1, 1)); (0.25, Ptypes.V (1, 0, 1, 0)); (0.25, Ptypes.V (0, 1, 0, 1)); (0.25, Ptypes.V (0, 0, 0, 0))]The value of
f2 0is non-deterministic: it can be
0in one possible world and
1in another. Once the value of
f2 0is chosen, however, it stays the same. Therefore,
f2is a function: repeated applications of
f2to the same argument give the same value. The stochastic function
f2however is not injective: there are possible worlds (the first and the last one, in the above table) in which
f2 1have the same value.
The simplest way to turn
f2 into an injective function is to reject
the worlds in which it turns out not injective. We define
asserting that applying it to different arguments must give different
exact_reify (fun () -> let f2 = memo (function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1]) in let f3 x = if f2 0 == f2 1 then fail (); f2 x in (f3 0, f3 1, f3 0, f3 1));; - : (int * int * int * int) Ptypes.pV = [(0.25, Ptypes.V (1, 0, 1, 0)); (0.25, Ptypes.V (0, 1, 0, 1))]We obtained
f3that is stochastic, is a function (applying it to the same arguments gives the same values) and is injective (applying it to different arguments gives different values). Alas, it is not efficient. The weights in the resulting probability table sum to
0.5rather than to one, betraying the waste of the half of the choice space. We would like to avoid making choices that we come to regret later. It behooves us to generate only the `right' worlds.
An injective and surjective function
[0..N-1] -> [0..N-1] is a permutation of the sequence
Hence we merely need to compute all permutations of the sequence, and sample
from it (using the already defined
there are only two permutations, leading us to
is a version of
memo specialized to thunks):
exact_reify (fun () -> let perms = letlazy (fun () -> uniformly_list [ [0;1]; [1;0] ]) in let f4 = List.nth (perms ()) in (f4 0, f4 1, f4 0, f4 1));; - : (int * int * int * int) Ptypes.pV = [(0.5, Ptypes.V (1, 0, 1, 0)); (0.5, Ptypes.V (0, 1, 0, 1))]
f4 is stochastic, is a function, is injective, and is
defined without wasting any choices. Yet we are still not satisfied.
For the general
N, we will have to generate all permutations of
[0..N-1] so to sample from them. The first
f4 x to any argument
x involves sampling from
(Further applications of
f4 use already made choices.) If the value
f4 x turns out inconsistent with further constraints on the
world, we have to reject not only the choice that has lead to
(N-1)! choices that have lead to
f4 y for all
y. We will have wasted a lot of choice space.
The waste can be reduced by computing the permutations on demand. We introduce
lazy_permutation : 'a list -> (unit -> 'a) list
that maps a list
lst of values to a list of random variables.
The variables are correlated.
Sampling the first random variable gives a value uniformly chosen
lst. Sampling the second variable gives an element of
that is different from the one chosen by the first variable, etc. Sampling
all the random variables gives a permutation of
if we reject the sample of one random variable, all further random
variables are left unsampled: The permutation is computed
to the needed extent. If we handle the result of
lazy_permutation as a list, in sequence, we can save quite many choices.
The source code below demonstrates that
samples from all permutations, uniformly. The code defines
the lazy stochastic injective function
f5 such that determining
f5 0 requires only
f5 1 -- N*(N-1) choices, etc.
The code for
lazy_permutation relies on the function
uniformly_uncons : 'a list -> 'a * 'a list that
uniformly selects one element from a list, returning the element
and the remainder of the list. The code thus illustrates an acute
observation by Dylan Thurston and Chung-chieh Shan that a sorting
algorithm corresponds to a permutation algorithm. To further
illustrate the relationship, the code implements a lazy permutation
function that is based on insertion sort, non-deterministically
inserting elements one-by-one into the initially empty list.
It turns out slower than the selection-sort--based permutation.
Purely Functional Lazy Non-deterministic Programming
The cited paper describes an alternative implementation of lazy permutations, which corresponds to insertion sort rather than selection sort. This alternative method requires defining a data structure for lists with a non-deterministic tail. The alternative method is more general since it represents lists whose size is random as well. On the other hand, the selection-sort--based method here works with the ordinary lists of OCaml, and is faster.
Solving the zebra puzzle with a sequence of all-different random variables.
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML