From posting-system@google.com Mon Jul  7 20:14:32 2003
Date: Mon, 7 Jul 2003 13:14:28 -0700
From: oleg@pobox.com (oleg@pobox.com)
Newsgroups: comp.lang.scheme
Subject: [ANN] Efficient (cyclic) unification
Message-ID: <7eb8ac3e.0307071214.131e456b@posting.google.com>
Status: OR

This is to announce an efficient Scheme implementation of cyclic and
acyclic unification of terms (ordered trees):
	http://pobox.com/~oleg/ftp/Scheme/index.html#Unification

The corresponding procedures can determine if two terms can be unified,
what is the unifier, and if it is cyclic. For example, the algorithm
unifies
	(M _a _b) with (M _b _a) (the unifier is _b -> _a)

        (_x 4 3 _w) with (3 _y _x _z) (the unifier is
					((_w . _z) (_y . 4) (_x . 3))))

	(p (g _x) (f a)) with (p _y _x) 
			the unifier: ((_x . (f a)) (_y . (g (f a))))

	(p _x _x) with (p _y (f _y))
		the unifier:
			((_g1164 . (f _g1164))
			 (_y . (f _g1164))
			 (_x . (f _g1164)))

The latter is a cyclic unification. Term variables are
underscored. You can change the style of pattern variables if you
re-define a predicate pattern-var? and a constructor genvar.

Many Prolog systems use unification without the occurs check. The
SICTus Prolog Manual states that unification without the occurs check
is wrong for logical programs, but practical. Our unification
algorithm however has exactly the same complexity both for cyclic
(without the occurs check) and acyclic unifications.

Unlike the classical text-book algorithm of unification (due to
Robinson, 1965), the present unification algorithm does NOT suffer the
exponential blow-out in time and in space when unifying pathological
terms such as

	 (h _x1 _x2 _x3 _x4 
	     (f _y0 _y0) (f _y1 _y1) (f _y2 _y2) (f _y3 _y3) _y4)
and
	 (h (f _x0 _x0)(f _x1 _x1)(f _x2 _x2)(f _x3 _x3)
	     _y1 _y2 _y3 _y4 _x4)


In fact, as we increase the size of the term, space requirements grow
linearly. And so does the run time. The linear time complexity in the
above case is just a fortunate accident. In general, one can't expect
a linear time complexity because the algorithm uses associative lists
to store bindings to pattern variables, and assq to locate a
binding. If bindings are stored in a hash table or a vector, the
algorithmic complexity improves. In practice, the current
implementation seems to work well. The algorithm can be seen as
related to the linear unification algorithm due to Gerard
Huet. However, our algorithm obviously has a quite different
intuition, as the source code shows.

The corresponding source code is extensively commented and comes with
a large test suite. It is tested on Gambit and Bigloo and should work
with any other R5RS system (one might need to adjust SRFI-2 and my
prelude).