From posting-system@google.com Mon Jul 7 20:14:32 2003 Date: Mon, 7 Jul 2003 13:14:28 -0700 From: oleg@pobox.com (oleg@pobox.com) Newsgroups: comp.lang.scheme Subject: [ANN] Efficient (cyclic) unification Message-ID: <7eb8ac3e.0307071214.131e456b@posting.google.com> Status: OR This is to announce an efficient Scheme implementation of cyclic and acyclic unification of terms (ordered trees): http://pobox.com/~oleg/ftp/Scheme/index.html#Unification The corresponding procedures can determine if two terms can be unified, what is the unifier, and if it is cyclic. For example, the algorithm unifies (M _a _b) with (M _b _a) (the unifier is _b -> _a) (_x 4 3 _w) with (3 _y _x _z) (the unifier is ((_w . _z) (_y . 4) (_x . 3)))) (p (g _x) (f a)) with (p _y _x) the unifier: ((_x . (f a)) (_y . (g (f a)))) (p _x _x) with (p _y (f _y)) the unifier: ((_g1164 . (f _g1164)) (_y . (f _g1164)) (_x . (f _g1164))) The latter is a cyclic unification. Term variables are underscored. You can change the style of pattern variables if you re-define a predicate pattern-var? and a constructor genvar. Many Prolog systems use unification without the occurs check. The SICTus Prolog Manual states that unification without the occurs check is wrong for logical programs, but practical. Our unification algorithm however has exactly the same complexity both for cyclic (without the occurs check) and acyclic unifications. Unlike the classical text-book algorithm of unification (due to Robinson, 1965), the present unification algorithm does NOT suffer the exponential blow-out in time and in space when unifying pathological terms such as (h _x1 _x2 _x3 _x4 (f _y0 _y0) (f _y1 _y1) (f _y2 _y2) (f _y3 _y3) _y4) and (h (f _x0 _x0)(f _x1 _x1)(f _x2 _x2)(f _x3 _x3) _y1 _y2 _y3 _y4 _x4) In fact, as we increase the size of the term, space requirements grow linearly. And so does the run time. The linear time complexity in the above case is just a fortunate accident. In general, one can't expect a linear time complexity because the algorithm uses associative lists to store bindings to pattern variables, and assq to locate a binding. If bindings are stored in a hash table or a vector, the algorithmic complexity improves. In practice, the current implementation seems to work well. The algorithm can be seen as related to the linear unification algorithm due to Gerard Huet. However, our algorithm obviously has a quite different intuition, as the source code shows. The corresponding source code is extensively commented and comes with a large test suite. It is tested on Gambit and Bigloo and should work with any other R5RS system (one might need to adjust SRFI-2 and my prelude).