General-purpose algorithms and data structures, illustrated in Haskell. Part II
We show a simple example of achieving all the benefits of an imperative data structure -- including sharing and the efficiency of updates -- in a pure functional program. Our data structure is a doubly-linked, possibly cyclic list, with the standard operations of adding, deleting and updating elements; traversing the list in both directions; iterating over the list, with cycle detection. The code:
IORef
, STRef
, TVars
, or any other destructive updates;The algorithm is essentially imperative -- hence supporting the identity check and the in-place `update' -- but implemented purely functionally. Although the code relies on many local, type safe `heaps', there is emphatically no global heap and no global state.
It is not for nothing that Haskell has been called the best imperative language. Imperative algorithms are implementable as they are -- yet genuinely functionally, without resorting to the monadic sub-language but taking the full advantage of clausal definitions, pattern guards, laziness.
Haskell-Cafe discussion ``Updating doubly linked lists''. January 2009.
[0,1]
. Mart'n Escardo's technique can tell, in finite
time, if a given total computable predicate is satisfied
over all possible infinite bit strings. Furthermore, for
so-called sparse predicates, Marti'n Escardo's technique is very
fast.We re-formulate the problem in terms of streams and depth-limited depth-first search, and thus cast off the mystery of deciding the satisfiability of a total computable predicate over the set of all Cantor numerals, which are uncountable.
As an additional contribution, we show how to write functions over Cantor numerals in a `natural' monadic style so that those functions become self-partially evaluating. The instantiation of the functions in an appropriate pure monad gives us transparent memoization, without any changes to the functions themselves. The monad in question is pure and involves no reference cells.
On `dense' functions on numerals (i.e., those that need to examine most of the bits of their argument, up to a limit), our technique performs about 9 times faster than the most sophisticated one by Marti'n Escardo'.
StreamPEval.hs [12K]
Extensively commented Haskell98 code
Marti'n Escardo': Seemingly impossible functional programs.
< http://math.andrej.com/2007/09/28/seemingly-impossible-functional-programs/ >
Prelude.foldr
subsume the left fold even in practice.
Right fold, such the foldr
on lists
foldr :: (a -> b -> b) -> b -> [a] -> b foldr f z [] = z foldr f z (x:xs) = f x (foldr f z xs)is one of the most common and fundamental patterns of processing a data structure. In fact, a data structure can be faithfully represented by -- is isomorphic to -- its right fold. For example, the entire list processing library, from
head
and tail
to drop
and take
and even zipWith
can be written entirely in terms of foldr
.The left fold, often called `reduce',
foldl :: (b -> a -> b) -> b -> [a] -> b foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xscan likewise be written as the right fold:
foldl_via_foldr f z l = foldr (\e a z -> a (f z e)) id l zTherefore, providing both folds within the same library seems redundant. Let us however take a second look at
foldl_via_foldr
: the foldr
in that expression is a so-called `second order fold',
building a closure which is then applied to z
. To appreciate
the costs, let us consider the evaluation of a simple example foldl_via_foldr f z [e1,e2,e3]
for some f
, z
and the three list
elements, in call-by-name/need and call-by-value.In call-by-name (call-by-need is the same), our example evaluates as follows:
foldl_via_foldr f z [e1,e2,e3] === foldr (\e a z -> a (f z e)) id [e1,e2,e3] z === -- constructing thunk (foldr (\e a z -> a (f z e)) id [e2,e3]) (\a z -> a (f z e1)) (foldr (\e a z -> a (f z e)) id [e2,e3]) z === -- and a thunk (f z e1) foldr (\e a z -> a (f z e)) id [e2,e3] (f z e1)We have constructed two thunks and about to force the first one, unrolling
foldr
once more. A few more steps lead to=== -- a big thunk which duplicates the original list f (f (f z e1) e2) e3a thunk whose structure mirrors the original list
[e1,e2,e3]
. We have
essentially duplicated the list -- worse than duplicated since a thunk
(closure) typically takes more space than a cons cell. We are not done
yet: we have to reduce the thunk to the weak-head-normal form. If f
is
strict in its left argument (such as addition), we have to reduce (f z e1)
, then (f (f z e1) e2)
, etc. -- effectively traversing the
original list once again, this time to compute the result of f
reductions.
For comparison we evaluate the same example with the native left fold,
actually, using its strict version foldl'
:
foldl' :: (b -> a -> b) -> b -> [a] -> b foldl' f !z [] = z foldl' f !z (x:xs) = foldl' f (f z x) xsThe reductions, again in call-by-name, look as follows:
foldl' f z [e1,e2,e3] === foldl' f (f z e1) [e2,e3] === -- forcing f z x let !v1 = f z e1 in foldl' f v1 [e2,e3] === {- ... -} let !v1 = f z e1 in let !v2 = f v1 e2 in let !v3 = f v2 e3 in v3No closures or thunks are constructed; we first compute
f z e1
as v1
, then f v1 e2
, etc. -- reducing the list element-by-element in
constant space. In contrast, foldl_via_foldr
had built two sets of
thunks (whose size is proportional to the length of the list) and
effectively traversed the list twice. One can express foldl'
via foldr
by adding a strictness annotation to foldl_via_foldr
.
The applications like f z e1
are then reduced immediately; the thunks foldr (\e a z -> a (f z e)) id [e2,e3]
, etc. still have to be
built. The benchmarks in the accompanying code bear the conclusions
out. The strict left fold foldl'
is the fastest and the most
space-economical way to reduce a list with a strict function: when
interpreting the Haskell code and when compiling with all
optimizations. In the latter case, foldl'
reductions are more than
the order-of-magnitude faster than foldr
and foldl_via_foldr
reductions and three times as faster then foldl'_via_foldr
. Furthermore, summing an integer list with foldl'
runs truly in constant space, with no allocations. In contrast,
emulating foldl
or foldl'
via foldr
looks, from the amount of
allocated memory, like duplicating the original list. (See however
below for Prelude.foldr
.) The case for providing foldl'
as
a primitive in a data traversal library is compelling.The case for the left fold is even more compelling in strict languages. Our example of reducing a three-element list runs under call-by-value thusly:
foldl_via_foldr f z [e1,e2,e3] === foldr (\e a z -> a (f z e)) id [e1,e2,e3] z === (\a z -> a (f z e1)) (foldr (\e a z -> a (f z e)) id [e2,e3]) zBefore the application of
(\a z -> a (f z e1))
can proceed the argument (foldr (\e a z -> a (f z e)) id [e2,e3])
has to be evaluated first,
pushing the pending application onto stack:=== -- evaluating the argument, application on stack let f23 = foldr (\e a z -> a (f z e)) id [e2,e3] in (\a z -> a (f z e1)) f23 z === {- ... -} let f0 = foldr (\e a z -> a (f z e)) id [] in let f3 = (\a z -> a (f z e3)) f0 in let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 zAt this point the stack looks quite like the original list itself, with list elements in closures rather than cons cells. We have in effect copied the whole list onto stack. For long lists, stack overflow is imminent. Now we have to unwind the stack:
=== let f0 = id in let f3 = (\a z -> a (f z e3)) f0 in let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 z === let f3 = (\z -> id (f z e3)) in -- constructing closure, a value let f23 = (\a z -> a (f z e2)) f3 in (\a z -> a (f z e1)) f23 z === let f3 = (\z -> id (f z e3)) in -- constructing closure, a value let f23 = (\z -> f3 (f z e2)) in -- constructing closure, a value (\a z -> a (f z e1)) f23 zUnwinding the stack has built the closures
f3
and f23
. The list got
copied again, from the stack to the heap, as closures. The application (\a z -> a (f z e1))
can proceed, and the list is traversed once more,
this time computing the f
reductions:=== let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in f23 (f z e1) -- evaluating (f z e1) === let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in let v1 = f z e1 in f23 v1 === {- ... -} let f3 = (\z -> id (f z e3)) in let f23 = (\z -> f3 (f z e2)) in let v1 = f z e1 in let v2 = f v1 e2 in let v3 = f v2 e3 in v3The total cost: copying the list onto the stack and then to the heap, requiring O(N) stack and heap space, and effectively three traversals. On the other hand, the primitive left fold accomplishes the job in constant space, with a single pass through the list. Thus emulating left fold via right fold is a bad idea in practice.
An interesting variation is folding with a non-strict function. For example, indexing of a list
index :: Int -> [a] -> a index _ [] = error $ "No such index" index 0 (x:_) = x index n (_:xs) = index (n-1) xsis also theoretically expressible through the right fold:
index_foldr :: Int -> [a] -> a index_foldr n xs = foldr (\x r n -> if n == 0 then x else r (n - 1)) (const (error $ "No such index")) xs nAn argument similar to the one for the left fold may convince us that writing
index
via foldr
is not good in practice, due to the cost
of closure construction and extra traversal. Writing index
in terms
of foldl
is not a good idea either since the left fold always
traverses the list through the very end. Query functions like index
should stop scanning the data structure as soon as the desired element
is found. Therefore, we need something like foldl
with early
termination -- which is what iteratees are. In strict languages, whose
folds are generally effectful, that is perhaps the only choice.
However, in Haskell there is another, surprising choice.
We have seen that foldl_via_foldr
is not practically viable and
hence foldl'
is better be provided by a data processing library.
And yet the theory that neglects memory consumption is not
practically futile. In GHC, compiling foldl_via_foldr
using
specifically Prelude's foldr
and full optimizations produces the
same code as the hand-written foldl
with the explicit recursion.
GHC, implementing theoretically justified transformations, can
afford to drop foldl
and foldl'
from the standard library without
any loss of performance.
We have to, however, re-write foldr
as follows, which is how
it is programmed in the standard Prelude:
foldr_Prelude :: (a -> b -> b) -> b -> [a] -> b foldr_Prelude f z = go where go [] = z go (x:xs) = f x (go xs)We have applied the equivalence-preserving transformation called `lambda-dropping', the opposite of `lambda-lifting'. The correctness of lambda-dropping is shown theoretically. Lambda-lifting is pervasive in compiling higher-order languages; lambda-dropping is of limited applicability and so is not done automatically. GHC authors had to write
foldr_Prelude
by hand.
When compiling foldl_via_foldr
, GHC first inlines foldr_Prelude
,
obtaining:
foldl_via_foldr f z l = go l z where go [] = id go (x:xs) = (\e a z -> a (f z e)) x (go xs)The next step are two beta-reductions, the second of which substitutes
go xs
. Generally substituting an expression
is only sound in call-by-name/need, or when we can prove the expression is
effect-free and terminating. In this particular case, the substitution is
clearly always valid. (It is not clear how smart a call-by-value
compiler has to be to see that.) The result:foldl_via_foldr f z l = go l z where go [] = id go (x:xs) = \z -> (go xs) (f z x)or, equivalently
foldl_via_foldr f z l = go l z where go [] z = z go (x:xs) z = (go xs) (f z x)is literally the left fold. A look at the Core code produced by GHC shows that
foldl_via_foldr
and foldl
compile to the same
code. After inlining and code substitutions GHC has derived the left
fold. Likewise, GHC derives the optimal, hand-written index
from index_foldr
.Applying theory to practice is indeed a messy business. One has to know the assumptions of the theory and use one's judgment. The theoretical expressivity of the left fold in terms of the right fold does not usually apply to practice, where we care about running out of memory and k-times slowdowns. Therefore, the left fold, albeit theoretically redundant, should be provided in a data structure library along with the right fold, especially in call-by-value languages. In the latter case, the left fold with an early termination makes for a better primitive.
We have also seen that a theory that abstract away memory consumption
and constant factors is not at all practically useless. The
abstraction uncovers the essentials of foldr
and help derive very
powerful optimizations, which do make, in some cases, the left fold
redundant after all, even in practice. It seems befitting to end with
two quotes from Kant: ``[T]he general rule must be supplemented by an
act of judgment whereby the practitioner distinguishes instances
where the rule applies from those where it does not. ... No-one can
pretend to be practically versed in a branch of knowledge and yet
treat theory with scorn, without exposing the fact that he is an
ignoramus in his subject.''
Tim Sheard and Leonidas Fegaras: A Fold for All Seasons
FPCA 1993
< http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.2756 >
Beyond Church encoding: Boehm-Berarducci isomorphism of algebraic data types and polymorphic lambda-terms
Boehm and Berarducci's motivation was to translate programs with data structures into pure System F programs, with just functions.
Jeremy Gibbons and Geraint Jones: The Under-Appreciated Unfold
ICFP 1998
< http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.1735 >
One of the points of the paper is the argument against using foldr as a big hammer. Looking at other combinators, such as unfold, leads to the insightful and efficient code.
Olivier Danvy and Ulrik P. Schultz. Lambda-Dropping: Transforming Recursive Equations into Programs with Block Structure
BRICS Report Series 1999, RS-99-27
< http://www.brics.dk/RS/99/27/BRICS-RS-99-27.pdf >
How to zip folds: A library of fold transformers (streams)
The list processing library implemented in terms of the right fold
From enumerators to cursors: turning the left fold inside out
An early argument for the left fold with early termination as a traversal primitive
Immanuel Kant: Theory and Practice
(The full title, literally from German: ``On the Proverb: That May be True in Theory but Is of No Practical Use'')
A few salient quotes
Quoted from Jeffrie G. Murphy: Kant on Theory and PracticeAn aggregation of rules, even of practical rules, is called a theory, as long as these rules are thought of as principles possessing a certain generality and, consequently, as being abstracted from a multitude of conditions that nonetheless necessarily influence their application. Conversely, not every undertaking [Hantierung] is a practice [Praxis]; rather, only such ends as are thought of as being brought about in consequence of certain generally conceived [vorgestellten] principles of procedure [Verfahrens] are designated practices.
For to the concept of the understanding that contains the rule must be added an act of judgment by means of which the practitioner decides whether or not something is an instance of the rule. And since further rules cannot always be added to guide judgment in its subsumptions (for that could go on infinitely), there can be theoreticians who, lacking judgment, can never be practical in their lives.
Quoted from James Rachels: Theory and Practicethe general rule must be supplemented by an act of judgement whereby the practitioner distinguishes instances where the rule applies from those where it does not.
No-one can pretend to be practically versed in a branch of knowledge and yet treat theory with scorn, without exposing the fact that he is an ignoramus in his subject.
We limit attention to closed terms since all interesting terms mentioned earlier are closed. De Bruijn indices let us avoid wasting time on building alpha-equivalent terms. (The terms will be pretty-printed in the conventional, named-variable notation, for clarity.) Thus we will be generating terms described by the following grammar (data type declaration), with the side condition that the terms are closed.
data Exp = V Int | A Exp Exp | L Exp
The generator is a truly straightforward application of the simplest
non-deterministic operations in the MonadPlus
interface.
a_term_naive :: MonadPlus m => m Exp a_term_naive = L `liftM` go 0 where -- go n: generate a lambda term with free variables 0..n go n = choose_var n `mplus` choose_lam n `mplus` choose_app n choose_var 0 = return (V 0) choose_var n = return (V n) `mplus` choose_var (n-1) choose_app n = liftM2 A (go n) (go n) choose_lam n = L `liftM` go (n+1)
Absent constants, a closed lambda-term must start with a lambda-binder. The rest could be either a variable, an application of two non-deterministically chosen terms, or an abstraction with a random body.
The equally straightforward implementation of MonadPlus
, the List monad,
fails to generate anything interesting within the first 10 000 terms. It is
not difficult to see why. Here are the first 8 generated terms:
Lx.x, Lx.Ly.x, Lx.Ly.y, Lx.Ly.Lz.x, Lx.Ly.Lz.y, Lx.Ly.Lz.z, Lx.Ly.Lz.Lu.x, Lx.Ly.Lz.Lu.y
The List monad realizes an incomplete search strategy: there is no guarantee that any given term will ever come. No applications are indeed forthcoming with the List monad. A better implementation of non-determinism is needed, with a complete search strategy: for example, iterative deepening or FBackTrack. With iterative deepening (which in our implementation produces the same sequence as breadth-first search but without taking Gbytes of memory), the first 10 generated terms
Lx.x, Lx.Ly.x, Lx.Ly.y, Lx.x x, Lx.Ly.Lz.x, Lx.Ly.Lz.y, Lx.Ly.Lz.z, Lx.x (Ly.x), Lx.x (Ly.y), Lx.x (x x)
look quite hopeful. Within the first 10 000 generated terms,
the self-application Lx.x x
, or delta
, comes 4th; the third Church
numeral comes 716th and omega
comes 3344th. Alas, there is no
successor or the S combinator, let alone the Y combinator. With
the FBackTrack implementation, the first few terms may look even more hopeful
Lx.x, Lx.x x, Lx.Ly.x, Lx.x x x, Lx.Ly.x x, Lx.x (x x), Lx.Ly.Lz.x, Lx.(Ly.x) x, Lx.x (Ly.x)The term
delta
comes the second, the third Church numeral comes
695th. Alas, within the first 99 977 terms nothing else interesting
comes. Generating interesting terms is not at all as straightforward
as it seems. Even sophisticated MonadPlus implementations did not
help.
It is crucial to recognize that the straightforward search for lambda-terms
is biased, and that bias is not in favor of interesting terms. Generally,
the fewer
non-deterministic choices it takes to produce a term, the sooner the term
comes. The straightforward generator clearly
favors selectors (terms of the form Lx.Ly...Lu.x
) and simple
applications such as Lx. (x x) (x x)
. We should put the brake on
generating abstractions: each new L
adds a new variable and hence
dramatically many more possible terms. We should encourage
the generator to play more with the variables it already has.
To prevent the string of applications of a variable to itself,
we should produce terms in the general order of their size. Again the goal
is to explore the search space more uniformly. It is tempting to generate
only normal forms. Alas, the Y combinator and omega
do not have normal
forms. Therefore, we do generate redices, but only of interesting kinds
(whose argument is an abstraction), and only occasionally. The following code
incorporates these ideas:
a_term :: MonadPlus m => m Exp a_term = L `liftM` go 0 where -- go n: generate a lambda term with free variables 0..n go n = do size <- iota 1 gen n True size -- gen n l s: generate a lambda term with free variables 0..n and -- of the size exactly s. The second argument tells if to generate -- abstractions gen _ _ s | s <= 0 = mzero gen n _ 1 = choose_var n gen n True 2 = choose_lam n 2 gen n False 2 = mzero gen n True s = choose_app n s `mplus` choose_lam n s gen n False s = choose_app n s choose_var n = msum . map (return . V) $ [0..n] choose_lam n s = penalty (40*n) $ L `liftM` gen (n+1) True (s-1) choose_app n s = do let s' = s - 1 -- Account for the 'A' constructor let gen_redex = do lefts <- range 4 (s' - 3) liftM2 A (choose_lam n lefts) (choose_lam n (s' - lefts)) let gen_noredex = do lefts <- range 1 (s' - 1) liftM2 A (gen n False lefts) (gen n True (s'-lefts)) gen_noredex `mplus` penalty 4 gen_redex
The auxiliary iota i
produces an integer greater or equal to i
; range i j
chooses an integer between i
and j
, inclusive. The
operation yield
-- Lowers the priority of m, so choices of m will be tried less often yield :: MonadPlus m => m a -> m a yield m = mzero `mplus` mand its n-th iterate
penalty n
produce a string of failures before trying
the argument computation m
. When a complete search strategy sees
many failures it tends to turn away and to pay more attention to other parts
of the search space.Here are the first 10 terms produced by the sophisticated generator:
Lx.x, Lx.Ly.y, Lx.Ly.x, Lx.x x, Lx.x (Ly.y), Lx.Ly.y y, Lx.Ly.x y, Lx.x (x x), Lx.Ly.y x, Lx.x (Ly.x)The term
delta
comes fourth, omega
comes 54th, the Y combinator 303d,
the third Church numeral 393d, the S combinator 1763d and the successor
4865th.The conclusion, although obvious in hindsight, is still thought-provoking: interesting lambda-terms are really hard to encounter by accident. They are exquisitely rare in the space of possible lambda-terms and distributed non-uniformly. A monkey banging on even the sophisticated lambda-typewriter may have printed the Y combinator, but would unlikely to print even the addition combinator within its lifetime.
FBackTrack.hs [3K]
Simple fair and terminating backtracking monad
Searches.hs [9K]
Iterative deepening search
The sample problem comes from a programming contest; it may just as
well be given at an interview. Its core -- modular arithmetic with
large numbers -- is at the heart of cryptographic libraries. The
problem is: given a natural number n
that can be as big as 10^6
,
print the nine rightmost decimal digits of n!
disregarding the
trailing zeros. Although the final code is in C, we will be using
Haskell for all development.
We chose Haskell because it lets us easily write down the well-defined, unambiguous specification:
spec :: Int -> Int spec = fromIntegral . lastnz9 . fact . fromIntegral fact :: Integer -> Integer -- n! fact n = product [1..n] lastnz9 :: Integral a => a -> a -- Last 9 decimal digits of a number, not counting trailing zeros lastnz9 n = drop0s n `mod` 10^9 drop0s n | (q,0) <- quotRem n 10 = drop0s q | otherwise = nHere
fact
computes the factorial, drop0s
drops the trailing zeroes
from the decimal representation of the number (that is, keeps dividing
the number by 10 until it no longer divides) and the mod
operation gives
the last nine decimal digits of the result, if there are that many.
This is our ideal program. Another benefit of Haskell is that
the specification is executable -- at least in some cases. We can, therefore,
use it as the `ground truth' in unit tests.
The specification spec
is perfect as the ideal program, clearly
describing the intended result. It also runs. However, it is not
something we want to run in real life: spec 10^6
will take a lot of time,
and a whole lot of space. Indeed, 10^6!
is about (2*10^5)^(10^6)
,
with at least 5 million decimal digits.
As the first step
and with an eye towards for-loops in the final C code, we rewrite fact
in the accumulator-passing style (the threaded accumulator corresponds
to a mutable variable in C):
fact_accum' :: Int -> Int -> Integer -> Integer fact_accum' n i acc | i > n = acc | otherwise = fact_accum' n (i+1) (acc * fromIntegral i) fact_accum n = fact_accum' n 1 1Since we only need the nine rightmost digits not counting trailing zeros, let's accumulate only those:
fact_accum_9' :: Int -> Int -> Int -> Int fact_accum_9' n i acc | i > n = acc | otherwise = fact_accum_9' n (i+1) (lastnz9 (acc * i)) fact_accum_9 n = fact_accum_9' n 1 1To see that
fact_accum_9
conforms to the ideal spec
, we test it
on a few sample numbers: 37, 53, 100. We get the correct result. Moreover,
we do the `smallcheck': exhaustively check for small values of n
:all (\n -> spec n == fact_accum_9 n) [0..21]The idea of the Small Check is that errors, if any, tend to show up for small arguments. The above test returns
True
. It seems we are done:
we can re-write our solution in C, and submit it as the answer.
The contest-running software rejects our solution as erroneous, without
any details. After more and more testing, luckily
we eventually stumble on the problem: although fact_accum_9 n
gives
the same answers as spec n
for very many n
, it deviates from the ideal for
some special, isolated n
, for example, 25
:
*FactLimit> spec 25 330985984 *FactLimit> fact_accum_9 25 80985984
What went wrong? Let us look at fact_accum_9
carefully. Clearly it
is derived from spec
by substituting-in the accumulator-passing implementation fact_accum
and pushing the operation lastnz9
`inside'. We have thus
assumed that lastnz9
distributes through multiplication:
lastnz9 (i*y) === lastnz9 (i * lastnz9 y)where
i
is smaller than 10^9
. That seems correct: the mod
that
underlies lastnz9
indeed
has the required property. If i
has any factors of 10
, they will be removed
by the drop0s
in lastnz9
. Let us, however, take i
to be 25
and y
to be divisible by 4
and without trailing zeros,
that is, y
represented as k*10^9 + 4r
for some k>=0
and 0 < 4r < 10^9
. Thenlastnz9 (25 * (k*10^9 + 4r)) === lastnz9 (100 * (k*10^9/4 + r)) === lastnz9 (k*10^9/4 + r) | === r if k is divisible by 4 | === lastnz9 (k*25*10^7 + r) otherwise lastnz9 (25 * lastnz9 (k*10^9 + 4r)) === lastnz9 (25 * 4r) === r
The assumption is hence violated if k
is not divisible by 4
. In our case, drop0s (fact 24) `div` 10^9
is 62044840173
and it is indeed not
divisible by four. Thus the property we took for granted does not
actually holds. What is worse, it fails in rather special circumstances,
which took some Math to find out.
Let us make a few observations:
mod
operation distributes over multiplication:
(x*y) `mod` p === ((x `mod` p) * (y `mod` p)) `mod` p
.
If p=10^9
, we merely have to deal with the numbers up to 10^9
, which
comfortably fit within 32-bit signed integers, with a 64-bit accumulator
for the intermediate result of the multiplication. The problem in the
earlier program was the subtle interaction of mod
and drop0s
.n!
has any factors of 5
, it has even more factors of 2
. That is,
drop0s n!
is divisible by 2
but not divisible by 5
.n!
come from the factors of 5
contributed by the multiplicands
n
, n-1
, n-2
,... If we suppress the factors 5
during the accumulation,
drop0s
becomes redundant. We can take the full advantage of point 1.drop0s n!
. By point 2, it is representable in the form 2^k * r
,
for some even r
not divisible by 5
. Suppose n+1
factors as
2^i * 5^j * s
where s
is likewise odd and not divisible by 5
.
Then drop (n+1)!
is 2^(k+i-j) * (r*s)
. We are sure, by point 2, that
k+i>j
. It is enough to compute r
, s
and their product mod 10^9
,
using the property of mod
from point 1.-- return (n',k) such that n == n'*f^k factors n f = go 0 n where go k n | (q,0) <- quotRem n f = go (k+1) q | otherwise = (n,k) modulus = 10^9 factl :: Int -> Int factl n = go 2 0 1 where go i twos acc | i > n = shifts acc twos go i twos acc = let (i1,k1) = factors i 2 (i2,k2) = factors i1 5 in go (i+1) (twos + k1 - k2) (acc * i2 `mod` modulus) -- compute (acc * 2^twos) `mod` modulus shifts acc 0 = acc shifts acc twos | twos >= 4 = shifts ((acc `shift` 4) `mod` modulus) (twos - 4) | otherwise = (shift acc twos) `mod` modulusThe function
shifts
performs the delayed multiplication by 2^twos
,
by a sequence of shifts. We again rely on the property of mod
to
limit the precision of all operations to nine decimal digits. For speed,
we shift in batches of four.It is rather easy to transcribe the above algorithm in C (see below). The C code worked on first try and was accepted as the answer to the challenge.
Thus, tests -- and types -- are necessary and should be used, as a spot-check for corner cases and silly mistakes made in transcribing the idea into code. Useful the tests and the types as they are, for verification, they are not sufficient. They have to be complemented with other ways to be sure in the code, such as thinking about the problem mathematically. Proofs are not sufficient either and have to be complemented with tests to spot-check for realism of their assumptions and adequacy of their conclusions. All in all, there is no royal road -- or any road, it seems -- to the correct code, as there are none to an invention or discovery.
factlimit.c [<1K]
The C code, based on the correct version of the optimal Haskell algorithm.
Robert P. Colwell. The Pentium Chronicles: The people, passion and politics behind Intel's landmark chips
IEEE Society Press, 2006. Published by John Wiley & Sons, Inc. 187pp.
Bob Colwell was the first manager within Intel who learned about the
FDIV problem, when a member of his team demonstrated the bug in
Pentium (but not in his P6). From talking to Pentium developers Bob Colwell
found out how the bug came about. It turns out that the change to FDIV
was introduced very late in the Pentium design process, in response to a
call to save the die space. The changed algorithm indeed fit on a smaller
die (which was however irrelevant since that saved space
was inside an already laid out block). The management was at first
reluctant to put the change in so late in the process. But it was
convinced by the mathematical proof of correctness.
The proof also made the validators test the change less
stressfully. Alas, the proof turned out faulty.
It is probably not coincidental that after the FDIV problem Intel has developed keen interest in automated formal theorem proving.
We were lucky to find the problematic case without too many tries.
For floating-point computations (used in Intel FPU) the numbers that
may cause problems are rare and special. Finding them is a mathematical
problem, like the one addressed in the paper:
John Harrison: Isolating critical cases for reciprocals using integer factorization.
Proc. 16th IEEE Symposium on Computer Arithmetic, Santiago de Compostela, Spain 2003, IEEE Computer Society Press, pp. 148-157 2003.
n
in base b
is the integer l
such that b^l <= n < b^(l+1)
. The number one greater than the integer
logarithm base b
is the size of n
, that is, the number of digits,
in its base-b
representation. We present
the Haskell98 code that is just as efficient as the internal GHC.Integer.Logarithms.integerLogBase#
function but uses no unboxed
data, no optimizations, and is not even compiled.
Naively, the integer logarithm is computed by repeated divisions of n
by b
, until the result reaches 1. This procedure requires l
divisions, where l
is the logarithm. We present two efficient algorithms:
the first uses only multiplications, no more than 2*log_2 l
of them, and (log_2 l) * sizeof(n)
extra memory for the powers of b
. The second
algorithm does log_2 l
multiplications and no more than log_2 l
integer divisions (and the same amount of extra memory) to compute l
.
Here is the multiplication-only procedure that returns l+1
where l
is the integer logarithm of n
in base b
. It is a composition
of two functions, which together compute the bits of l
: major_bit
determines
the upper bound and other_bits
improves it.
data BaseP = BaseP !Integer -- b^k !Int -- k mul_bp :: BaseP -> BaseP -> BaseP mul_bp (BaseP bk1 k1) (BaseP bk2 k2) = BaseP (bk1*bk2) (k1+k2) basewidth :: Integer -> Integer -> Int basewidth b _ | b < 1 = error "basewidth: base must be greater than 1" basewidth b n | n < b = 1 basewidth b n | n == b = 2 basewidth b n = major_bit [BaseP b 1] where major_bit :: [BaseP] -> Int major_bit bases@(bp:bps) = let bpnext@(BaseP bk2 k2) = bp `mul_bp` bp in case compare bk2 n of EQ -> k2 + 1 -- n == b^(2k) GT -> other_bits bp bps -- b^(2k) > n LT -> major_bit (bpnext : bases) -- b^(2k) < n other_bits (BaseP _ i) [] = i+1 -- b^i < n < b^(i+1) other_bits bp (bphalf:bps) = let bpup@(BaseP bik ik) = bp `mul_bp` bphalf in case compare bik n of EQ -> ik + 1 -- n == b^(i+k) GT -> other_bits bp bps -- b^i < n < b^(i+k) LT -> other_bits bpup bps -- b^(i+k) < n < b^(i+2k)The correctness and the complexity analysis follow from the invariants of the two auxiliary functions. In any call
major_bit bases
, the list bases
is [BaseP b^k k | j <- [d,d-1..0], k=2^j]
for some d
and n > b^(2^d)
. In the first
invocation, d
is 0 and progressively increases until such d>0
is
found that b^(2^(d-1)) < n <= b^(2^d)
. In any call other_bits
(BaseP bi i) bases
, the list bases
is [BaseP b^k k | j <- [d,d-1..0], k=2^j]
for some d
, k=2^d
and b^i < n <
b^(i+2k)
. Each invocation of other_bits
halfs k
until it reaches
one. The other, more efficient as it turns out, algorithm modifies other_bits
to divide n
by the candidate lower bound b^i
.
That integer logarithm function computes that the 48th Mersenne prime 2^57885161-1
has 17425170 digits in its decimal representation --
in 1.2
seconds.
r^n
for all r>=2
and n>1
.
Thanks to lazy evaluation, the table is automatically sized as needed.
There is no need to guess the maximal size of the table so to allocate it.powers = [2..] : map (zipWith (*) (head powers)) powersThe rest of the algorithm exhibits similar stream-wise processing and computations of tables in terms of themselves.
Messages speedup help by Damien R. Sullivan, Bill Wood, Andrew J Bromage, Mark P Jones, and many others posted on the Haskell-Cafe mailing list on March 6-8, 2003
< http://www.haskell.org/pipermail/haskell-cafe/2003-March/004065.html >
< http://www.haskell.org/pipermail/haskell-cafe/2003-March/004075.html >
s f g x = f x (g x) puzzle = (!!) $ iterate (s (lzw (+)) (0:)) [1] where lzw op [] ys = ys lzw op (x:xs) (y:ys) = op x y : lzw op xs ys
Incidentally, a small change gives a different series:
puzzle1 = (!!) $ iterate (s ((lzw (+)).(0:)) (1:)) [] where lzw op [] ys = ys lzw op (x:xs) (y:ys) = op x y : lzw op xs ys
Finally, how can we possibly live without the following:
puzzle2 = (!!) $ iterate (s ((lzw (+)).(1:).(0:)) (0:)) [1,1] where lzw op xs [] = [] lzw op (x:xs) (y:ys) = op x y : lzw op xs ys
Hints:
*Main> puzzle 5 [1,5,10,10,5,1] *Main> puzzle1 5 [1,2,4,8,16] *Main> puzzle2 5 [1,1,2,3,5,8,13]
Discussion thread, started on Haskell-Cafe by Thomas Bevan on Aug 22, 2003.
< http://www.haskell.org/pipermail/haskell-cafe/2003-August/004977.html >
oleg-at-okmij.org
Your comments, problem reports, questions are very welcome!
Converted from HSXML by HSXML->HTML