LinAlg
: Basic Linear Algebra and Optimization classlib
LinAlg
: Basic Linear Algebra and Optimization classlibThis C++ class library introduces Matrix
, Vector
, subMatrices
, and LAStream
s over
the real domain. The library contains efficient and fool-proof
implementations of level 1 and 2 BLAS (element-wise operations and
various multiplications), transposition, determinant evaluation and
matrix inverse. There are operations on a single row/col/diagonal of a
matrix. Distinct features of the package are Matrix views,
Matrix streams, and LazyMatrices. Lazy
construction allows us to write matrix expressions in a natural way
without introducing any hidden temporaries, deep copying, and any
reference counting.
LinAlg stresses Matrix streams, which provide a sequential
access to a matrix or its parts. LABlockStream
s may span over an arbitrary rectangular block of a matrix, including
the whole matrix, a single matrix element, and all other block sizes
in between. Matrix streams are seek
-able and
subrange-able. A stream or a substream are always created
in-line. They are safe and allocate no heap storage.
LinAlg demonstrates that many of Linear Algebra algorithms indeed require only a sequential access to a matrix or its sub-blocks. The sequential access is simpler and faster than a random one. Notable examples of such ``serial algorithms'' are matrix inverse and Householder bidiagonalizations. Every feature of the package is extensively tested by included validation code.
Other features:
Ax=b
, where the matrix
A
does not have to be square, and b
does not
have to be a vector.Matrix A = inverse(D); C = A*B;
assert(
of_every(ConstMatrixRow(m,i)).max_abs(of_every(vect))
== 2.0 );
The README file [plain text file]
LinAlg.tar.gz [115K]
Complete source code, validation code, sample code, and test
run outputs
LinAlg.h [58K]
LinAlg interface definition
LAStreams.h [28K]
LAStreams interface definition
linalg-4.4_vc7.zip [377K]
VC 7 Project filewhich was very kindly contributed by Markus Bader
Functional Style in C++: Closures, Late Binding, and
Lambda Abstractions
borrows many code snippets from LinAlg
(lazy) SVD promising a matrix in extended LinAlg.shar [plain text file]
An old but still relevant article posted on
comp.c++.moderated, comp.sys.mac.programmer.tools,
comp.sys.mac.scitech, sci.math.num-analysis newsgroups on Tue Feb 27
11:31:08 CST 1996
LinAlg is also available from netlib
<http://netlib.bell-labs.com/netlib/c++/linalg.tgz>
FreeBSD port and a package
This C++ class library performs radix-2 Discerete Fourier
transform of a real or complex sequence, and sin
, cos
, and complex Fourier integrals of an evenly tabulated
function.
The input can be either real
or complex
with/without zero padding; you can request either the full complex
transform, or only real
, im
, or abs
part of it.
fft.tar.gz [36K]
Commented C++ source code, verification code and test run outputs.
The verification code demonstrates many possible usage scenarious
LinAlg
: Basic Linear Algebra and Optimization classlib is required as a dependency
[The Abstract of the paper]
Recent work showed that staging and abstract interpretation can be
used to derive correct families of combinatorial circuits, and
illustrated this technique with an in-depth analysis of the Fast
Fourier Transform (FFT) for sizes 2^n. While the quality of the
generated code was promising, it used more floating-point operations
than the well-known FFTW codelets and split-radix algorithm.
This paper shows that staging and abstract interpretation can in fact be used to produce circuits with the same number of floating-point operations as each of split-radix and FFTW. In addition, choosing between two standard implementations of complex multiplication produces results that match each of the two algorithms. Thus, we provide a constructive method for deriving the two distinct algorithms.
Generating radix-2 FFT kernels with the operation count
exactly matching that of FFTW is much harder than one may think.
It is emphatically not enough to specialize the textbook FFT code
to a particular sample size N, and apply a partial evaluator
or a smart compiler to eliminate trivial multiplications (by one and zero)
and trivial additions (to zero), and to lift common factors.
First of all, the specialized FFT code will contain factors
like-2.44921270764475452e-16
(that is the computed
IEEE double value of sin((pi/. 2.0) *. 4.0)
). That factor
is not zero and the corresponding multiplication is not trivial,
although it should be.
A frequently occurring expression (sin(pi/4)*x + cos(pi/4)*y)
presents a subtler problem: although the ideal sin(pi/4)
and cos(pi/4)
are equal, their computed values are not.
Therefore, the partial evaluator or a compiler, which see
the specialized code, hence, only the numeric values of the
FFT factors, are not able
to convert the expression to sin(pi/4)*(x + y)
, saving one multiplication. Thus the necessary condition for
generating the optimal FFT kernels is the use of the exact
trigonometry. The roots of unity exp(-I j*2pi/N)
must be
manipulated symbolically rather than numerically. The condition is not
sufficient: an additional insight -- the use of symmetry -- was
required to match the operation count of FFTW (see the
function add_sub_aa
in the generator code below).
fft.pdf [69K]
Relating FFTW and Split-Radix. Joint work with Walid Taha.
Proc. of ICESS'04, the First International Conference on
Embedded Software and System, December 9-10 2004,
Hangzhou (Zhejiang), China.
fft-neo.ml [58K]
The self-contained commented MetaOCaml code
The code derives radix-2 FFT kernels with the number of
additions and multiplications exactly matching those in
FFTW kernels. A slight change in the generation algorithm
produces mixed-radix FFT kernels.
FFTW: Fastest Fourier Transform in the West
<http://www.fftw.org>
Generating optimal code with confidence
The explanation of the general framework of staging and
abstract interpretation
This site's top page is http://okmij.org/ftp/
Converted from SXML by SXML->HTML