OR
HANSEI is an ordinary OCaml library, with probability distributions represented as ordinary OCaml programs. Delimited continuations let us reify non-deterministic programs as lazy search trees, which we may then traverse, explore, or sample. Thus an inference procedure and a model invoke each other as co-routines. Thanks to the delimited control, deterministic expressions look exactly like ordinary OCaml expressions, and are evaluated as such, without any overhead.
Inference procedures and probabilistic models are both ordinary OCaml functions. Both may be defined and extended by library users; both may appear in each other's code. Performing inference on a model that includes calls to inference procedures lets us parameterize distributions by distributions, and lets inference procedures measure their own accuracy. One application is modeling agents reasoning about each other's limited reasoning.
This is joint work with Chung-chieh Shan.
nips2008.pdf [266K]
Abstract of the poster at the NIPS2008 workshop ``Probabilistic Programming: Universal Languages, Systems and Applications''
Chung-chieh Shan: Self-applicable probabilistic inference without interpretive overhead.
Talk at the meeting of the IFIP Working Group 2.11 (program generation), April 2009.
< http://www.cs.rutgers.edu/~ccshan/rational/bay-slides.pdf >
The paper, in particular, presents a step-by-step refining of the finally-tagless to a very shallow embedding of the probabilistic DSL. That is, probabilistic functions are embedded as host-language functions, calls to them are embedded as host function calls, and random integers can be added using the addition operation of the host language. This approach lets us take advantage of the existing infrastructure of the host language while also reducing notational and interpretive overhead.
This is joint work with Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk at DSL-WC. July 17, 2009, Oxford, UK.
< http://www.cs.rutgers.edu/~ccshan/rational/dsl-slides.pdf >
Delimited control and breadth-first, depth-first, and iterative deepening search
Another illustration, only in Haskell rather than OCaml, and without probabilities
This is joint work with Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk at UAI. June 19, 2009, Montreal, Canada.
< http://www.cs.rutgers.edu/~ccshan/rational/embedpp-slides.pdf >
This is joint work with Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk at the ML workshop. Baltimore, MD, September 26, 2010.
< http://www.cs.rutgers.edu/~ccshan/rational/ml-slides.pdf >
probM.mli [4K]
The interface of the library: the `syntax' of HANSEI
probM.ml [18K]
The implementation of the library core: The dist
function, evidence assertion, reification and reflection, call-by-need evaluation, sharing and memoization, reification and memoization in nested scopes
inference.mli [3K]
inference.ml [13K]
Exact and approximate inference procedures: the explorers of the lazy search tree resulting from reifying probabilistic programs
pMap.ml [6K]
pMap.mli [5K]
Dependency: PMap - Polymorphic maps, by Xavier Leroy, Nicolas Cannasse, and Markus Mottl. The original code was augmented with the function insert_with
Dependency: Delimited continuations in OCaml
Makefile [3K]
How to compile the library and run the tests and examples
exactInfM.ml [11K]
Exact inference, variable elimination, likening to IBAL
samplingM.ml [16K]
Importance sampling with look-ahead: a new, general algorithm for approximate inference; contrast with rejection sampling
slazy.ml [27K]
Lazy evaluation (or, evidence pulling) and sharing. Probabilistic context-free grammars. Memoization of stochastic functions
nested.ml [16K]
Many examples and regression tests of nested inference and nested call-by-need probabilistic computations
``An urn contains an unknown number of balls--say, a number chosen from a Poisson or a uniform distributions. Balls are equally likely to be blue or green. We draw some balls from the urn, observing the color of each and replacing it. We cannot tell two identically colored balls apart; furthermore, observed colors are wrong with probability 0.2. How many balls are in the urn? Was the same ball drawn twice?''
colored_balls.ml [5K]
The implementation of the model and sample inferences: e.g., of the posterior distribution of the number of balls in the urn given that we drew ten balls and all appeared blue. Our results seem in excellent agreement with those by BLOG.
bloodtype.ml [9K]
The well-commented code of the model and sample inferences
hmm.ml [11K]
The well-commented code of the model and the record of time complexity investigations and optimization
OR
We write the model to be independent from the structure of the graph. Typical queries ask for the (distribution of the) values of the attribute at a particular node. Therefore, we express the attribute propagation backwards: rather the specifying how the attribute of a node propagates to its descendants, we describe how a node obtains the attribute from its ancestors. The process is trivially declared in OCaml, as a recursive function that, given a non-root node, finds its ancestors and combines their attributes using noisy-or. We easily extend this function with assertions of available evidence, to support conditional queries.
Although the graph does not have directed cycles, it does have many undirected ones. As we work backwards, we may encounter a given node several times. Each time the value of node's random attribute must be the same. We have to use memoization.
noisy_or.ml [8K]
The well-commented code of several variations of the model. All in all, our exact inference scales to the large version of the benchmark, with 19 nodes.
The very large search space and the very low probability of evidence make exact inference or rejection sampling hopeless. IBAL was able to handle the model using a combination of very clever and extensive analyses of model's code. Since HANSEI is embedded in OCaml and is a regular OCaml library, it cannot obtain the code of the whole program. It turns out that with the combination of reification and lazy evaluation we are able to handle this model with accuracy roughly comparable to that of IBAL.
music1a.ml [6K]
Warm-up: splitting a note sequence in two parts at a random place, and applying a randomly chosen operation (deletion, identity, or a note transposition by various amounts) to both parts of the sequence. Crucially this simplified model does not recursively subdivide and transform the sequence. Therefore, the state space remains sufficiently small even for exact inference. The lazy evaluation was essential.
music2.ml [16K]
The full model, with recursive subdivision. The state space is very large, and the probability of evidence is very low, of the order 10^(-7). Only importance sampling is feasible.
A flat region of space is observed by a radar on a 10x10 screen. A number of aircrafts fly through the region. The state (the location and the velocity) of an aircraft at each time step is a stochastic function of its state at the previous time step. With certain probability, the radar detects an aircraft, displaying a blip on the the screen. Due to the limited resolution of the radar, several aircrafts can give rise to the same blip. A number of false alarms may occur, each producing a blip. We observe the screen at several time moments and note the blips. The problem is to estimate the number of aircrafts. We may also want to identify the trajectory of each aircraft. A more general model allows aircrafts to disappear and new aircrafts to appear at each time step.
We found it crucial to sample the state of an aircraft and the location of a false alarm lazily. If the x-coordinate of the detected aircraft does not match the x-coordinate of any observed blip, we immediately report the contradiction (of the possible world with the evidence). We do not need to compute the y-coordinate of the aircraft, let alone the locations of other aircrafts and false alarms. Because of laziness, extending our model to 3D would increase the computation time by 50% (rather than ten-fold, for the 10x10x10 grid).
Sequential Monte Carlo was another necessary technique. We compute the complete distribution for the state of all the aircrafts at each time step. We can use exact inference or our importance sampling. For the ideal case without noise, we did both and observed good agreement, even with only 1500 samples. Computing the state of the system requires roughly the same amount of work per time step. This fact lets us trace the evolution of the system for as long as needed. As the noise decreases, the sampling results seem to converge to the `ground truth' (which can be inferred exactly). For simple cases, the computed results match calculations by hand.
The very end of the code file shows the results for the general case, of three blips moving horizontally. As the time progresses, the observed evidence favors more and more there being three aircrafts.
Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov. BLOG: Probabilistic Models with Unknown Objects.
In: Lise Getoor and Ben Taskar, eds. Introduction to Statistical Relational Learning. MIT Press, 2007 -- pp. 373--398
< http://people.csail.mit.edu/milch/papers/blog-chapter.pdf >
David Poole: the AILog2 web page
< http://www.cs.ubc.ca/spider/poole/aibook/code/ailog/ailog2.html >
The web page refers to the complete AILog2 code for the airplane-blip example. The code simplifies the problem: first, it restricts aircraft movements to a plane. Then it assumes that observations of the radar screen are incomplete: the screen may contain other blips. The latter assumption is an over-simplification. For our model we used the observation procedure as given in Example 1.3 of Milch's et al. BLOG chapter. Alas, that BLOG code did not specify various priors; we picked what we considered reasonable priors, taking a hint from the AILog model whenever possible.
blip_ailog.ml [9K]
blip_naive.ml [13K]
The re-implementation of the AILog code and the naive but ultimately unsuccessful attempt at the full model.
Theoretically, whenDraw
N
independent random samples from the uniform distribution of integers within the interval0..M-1
. What is the probability that the sequence of samples is sorted?
M
is large (and so duplicates are unlikely),
the probability of the drawn samples being ordered is
1/N!
. The task is to estimate that probability in
one's favorite probabilistic system, setting M
to be 1 million and
N
at least 10. With these parameters, the size of the search space is
10^60
. Clearly exact inference and rejection sampling
are preposterous to consider.Harik's problem is ideal for a benchmark: it is easy to explain and to model. The expected result is also known. Getting any useful estimate of the result is rather difficult, especially without extensive fine-tuning of the inference algorithm.
Here are the results of solving the problem in HANSEI, using the importance sampling procedure from HANSEI library. In the model, the numbers are represented as lazy lists of bits, which proved crucial. Run times are in seconds on EeePC 701 (633 MHz CPU).
M = 2^20, N = 10 1000 samples runtime 8 secs estimated prob is 1.69e-07 2000 samples runtime 16 secs estimated prob is 1.83e-07 4000 samples runtime 32 secs estimated prob is 1.83e-07 8000 samples runtime 64 secs estimated prob is 2.23e-07 1/10! is 2.76e-07 M = 2^20, N = 20 24000 samples runtime 468 secs estimated prob is 3.64e-20 1/20! is 4.11e-19The convergence is a bit slow. We stress the use of a general-purpose importance sampling procedure, which knows nothing about numbers and numeric order, and which has not been tuned for the problem at hand.
This is joint work with Chung-chieh Shan.
Chung-chieh Shan. Slides of the talk at the Texas Linguistics Society (UT Austin) and the workshop on Logical Methods for Discourse (LORIA), 2009.
< http://www.cs.rutgers.edu/~ccshan/rational/discourse-slides.pdf >
incite.ml [17K]
The code for the model and for several experiments with it
Our goal thus is to build a sequence of N
all-different random
variables with the same range: a sample from the sequence should have
no duplicates. If the value of each variable is an integer from 0
through N-1
, the problem is then the building of a stochastic
bijective function[0..N-1] -> [0..N-1]
. Defining a stochastic
relation that is a function demands care: the naively
[0,1] -> [0,1]
stochastic relation f1
below is not a function, which is
clearly seen from the probability table for the tuple (f1 0, f1 1, f1
0, f1 1)
computed by Hansei. (In the OCaml session
transcript below, the responses of the OCaml top-level are
indented.)
open ProbM;; let uniformly_list lst = List.nth lst (uniform (List.length lst));; val uniformly_list : 'a list -> 'a = <fun> exact_reify (fun () -> let f1 = function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1] in (f1 0, f1 1, f1 0, f1 1)) - : (int * int * int * int) Ptypes.pV = [(0.0625, Ptypes.V (1, 1, 1, 1)); (0.0625, Ptypes.V (1, 1, 1, 0)); (0.0625, Ptypes.V (1, 1, 0, 1)); (0.0625, Ptypes.V (1, 1, 0, 0)); (0.0625, Ptypes.V (1, 0, 1, 1)); (0.0625, Ptypes.V (1, 0, 1, 0)); (0.0625, Ptypes.V (1, 0, 0, 1)); (0.0625, Ptypes.V (1, 0, 0, 0)); (0.0625, Ptypes.V (0, 1, 1, 1)); (0.0625, Ptypes.V (0, 1, 1, 0)); (0.0625, Ptypes.V (0, 1, 0, 1)); (0.0625, Ptypes.V (0, 1, 0, 0)); (0.0625, Ptypes.V (0, 0, 1, 1)); (0.0625, Ptypes.V (0, 0, 1, 0)); (0.0625, Ptypes.V (0, 0, 0, 1)); (0.0625, Ptypes.V (0, 0, 0, 0))]The transcript shows importing the Hansei library and defining a procedure to uniformly sample from a list. The probability table computed next makes it patent that repeated evaluations of
f1 0
may
give different results: f1
is not a function. We have to use
the memo
facility of Hansei to define stochastic functions,
such as f2
:exact_reify (fun () -> let f2 = memo (function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1]) in (f2 0, f2 1, f2 0, f2 1));; - : (int * int * int * int) Ptypes.pV = [(0.25, Ptypes.V (1, 1, 1, 1)); (0.25, Ptypes.V (1, 0, 1, 0)); (0.25, Ptypes.V (0, 1, 0, 1)); (0.25, Ptypes.V (0, 0, 0, 0))]The value of
f2 0
is non-deterministic: it can be 0
in one
possible world and 1
in another. Once the value of f2 0
is chosen,
however, it stays the same. Therefore, f2
is a function: repeated
applications of f2
to the same argument give the same value.
The stochastic function f2
however is not injective: there are
possible worlds (the first and the last one, in the above table)
in which f2 0
and f2 1
have the same value.
The simplest way to turn f2
into an injective function is to reject
the worlds in which it turns out not injective. We define f3
by
asserting that applying it to different arguments must give different
values:
exact_reify (fun () -> let f2 = memo (function | 0 -> uniformly_list [0;1] | 1 -> uniformly_list [0;1]) in let f3 x = if f2 0 == f2 1 then fail (); f2 x in (f3 0, f3 1, f3 0, f3 1));; - : (int * int * int * int) Ptypes.pV = [(0.25, Ptypes.V (1, 0, 1, 0)); (0.25, Ptypes.V (0, 1, 0, 1))]We obtained
f3
that is stochastic, is a function (applying it to the same
arguments gives the same values) and is injective (applying it
to different arguments gives different values). Alas, it is not
efficient. The weights in the resulting probability table sum to 0.5
rather than to one, betraying the waste of the half of
the choice space. We would like to avoid making choices that we
come to regret later. It behooves us to generate only the `right'
worlds.
An injective and surjective function
[0..N-1] -> [0..N-1]
is a permutation of the sequence [0..N-1]
.
Hence we merely need to compute all permutations of the sequence, and sample
from it (using the already defined uniformly_list
). For N=2
,
there are only two permutations, leading us to f4
(here letlazy
is a version of memo
specialized to thunks):
exact_reify (fun () -> let perms = letlazy (fun () -> uniformly_list [ [0;1]; [1;0] ]) in let f4 = List.nth (perms ()) in (f4 0, f4 1, f4 0, f4 1));; - : (int * int * int * int) Ptypes.pV = [(0.5, Ptypes.V (1, 0, 1, 0)); (0.5, Ptypes.V (0, 1, 0, 1))]
The function f4
is stochastic, is a function, is injective, and is
defined without wasting any choices. Yet we are still not satisfied.
For the general N
, we will have to generate all permutations of
the sequence [0..N-1]
so to sample from them. The first
application f4 x
to any argument x
involves sampling from N!
choices.
(Further applications of f4
use already made choices.) If the value
of f4 x
turns out inconsistent with further constraints on the
world, we have to reject not only the choice that has lead to f4 x
but also (N-1)!
choices that have lead to f4 y
for all
other arguments y
. We will have wasted a lot of choice space.
The waste can be reduced by computing the permutations on demand. We introduce
a function lazy_permutation : 'a list -> (unit -> 'a) list
that maps a list lst
of values to a list of random variables.
The variables are correlated.
Sampling the first random variable gives a value uniformly chosen
from lst
. Sampling the second variable gives an element of lst
that is different from the one chosen by the first variable, etc. Sampling
all the random variables gives a permutation of lst
. Crucially,
if we reject the sample of one random variable, all further random
variables are left unsampled: The permutation is computed
to the needed extent. If we handle the result of
lazy_permutation
as a list, in sequence, we can save quite many choices.
The source code below demonstrates that lazy_permutation
indeed
samples from all permutations, uniformly. The code defines
the lazy stochastic injective function f5
such that determining
f5 0
requires only N
choices, f5 1
-- N*(N-1) choices, etc.
The code for lazy_permutation
relies on the function
uniformly_uncons : 'a list -> 'a * 'a list
that
uniformly selects one element from a list, returning the element
and the remainder of the list. The code thus illustrates an acute
observation by Dylan Thurston and Chung-chieh Shan that a sorting
algorithm corresponds to a permutation algorithm. To further
illustrate the relationship, the code implements a lazy permutation
function that is based on insertion sort, non-deterministically
inserting elements one-by-one into the initially empty list.
It turns out slower than the selection-sort--based permutation.
Purely Functional Lazy Non-deterministic Programming
The cited paper describes an alternative implementation of lazy
permutations, which corresponds to insertion sort rather than
selection sort. This alternative method requires defining a data
structure for lists with a non-deterministic tail. The alternative
method is more general since it represents lists whose size is
random as well. On the other hand, the selection-sort--based method
here works with the ordinary lists of OCaml, and is faster.
Solving the zebra puzzle with a sequence of all-different random variables.
oleg-at-pobox.com or oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML