Numerical Math and Scientific Computing


LinAlg: Basic Linear Algebra and Optimization classlib

This C++ class library introduces Matrix, Vector, subMatrices, and LAStreams over the real domain. The library contains efficient and fool-proof implementations of level 1 and 2 BLAS (element-wise operations and various multiplications), transposition, determinant evaluation and matrix inverse. There are operations on a single row/col/diagonal of a matrix. Distinct features of the package are Matrix views, Matrix streams, and LazyMatrices. Lazy construction allows us to write matrix expressions in a natural way without introducing any hidden temporaries, deep copying, and any reference counting.

LinAlg stresses Matrix streams, which provide a sequential access to a matrix or its parts. LABlockStreams may span over an arbitrary rectangular block of a matrix, including the whole matrix, a single matrix element, and all other block sizes in between. Matrix streams are seek-able and subrange-able. A stream or a substream are always created in-line. They are safe and allocate no heap storage.

LinAlg demonstrates that many of Linear Algebra algorithms indeed require only a sequential access to a matrix or its sub-blocks. The sequential access is simpler and faster than a random one. Notable examples of such ``serial algorithms'' are matrix inverse and Householder bidiagonalizations. Every feature of the package is extensively tested by included validation code.

Other features:

The README file tells far more about the features and how to use them. The file contains many commented code snippets.
FreeBSD 4.6.2 (a port and a package) with gcc 3.2 and gcc 2.95.x, gcc 2.96 Linux 2.4.9 i686; gcc 2.8.1 (optimization off) on HP-PA, Solaris/Sparc; SGI's native compiler (optimization off); Visual C++ 5.0 WinNT 4.0; Metrowerk's CodeWarrior C++, BeOS Release 4.
Public Domain
The current version is 4.4, November 29, 2002.

The README file [plain text file]

LinAlg.tar.gz [115K]
Complete source code, validation code, sample code, and test run outputs

LinAlg.h [58K]
LinAlg interface definition

LAStreams.h [28K]
LAStreams interface definition [377K]
VC 7 Project filewhich was very kindly contributed by Markus Bader

Functional Style in C++: Closures, Late Binding, and Lambda Abstractions
borrows many code snippets from LinAlg

(lazy) SVD promising a matrix in extended LinAlg.shar [plain text file]
An old but still relevant article posted on comp.c++.moderated,, comp.sys.mac.scitech, sci.math.num-analysis newsgroups on Tue Feb 27 11:31:08 CST 1996

LinAlg is also available fromnetlib
<>, the technical computing portal for all scientific and engineering needs
FreeBSD port and a package


Fourier integrals and Discrete/Fast Fourier transform

This C++ class library performs radix-2 Discerete Fourier transform of a real or complex sequence, and sin, cos, and complex Fourier integrals of an evenly tabulated function.

The input can be either real or complex with/without zero padding; you can request either the full complex transform, or only real, im, or abs part of it.

Public Domain
The current version is 1.2, December 25, 1998.

fft.tar.gz [36K]
Commented C++ source code, verification code and test run outputs. The verification code demonstrates many possible usage scenarious

LinAlg: Basic Linear Algebra and Optimization classlib is required as a dependency


Generating optimal FFT code and relating FFTW and Split-Radix

[The Abstract of the paper]
Recent work showed that staging and abstract interpretation can be used to derive correct families of combinatorial circuits, and illustrated this technique with an in-depth analysis of the Fast Fourier Transform (FFT) for sizes 2^n. While the quality of the generated code was promising, it used more floating-point operations than the well-known FFTW codelets and split-radix algorithm.

This paper shows that staging and abstract interpretation can in fact be used to produce circuits with the same number of floating-point operations as each of split-radix and FFTW. In addition, choosing between two standard implementations of complex multiplication produces results that match each of the two algorithms. Thus, we provide a constructive method for deriving the two distinct algorithms.

Generating radix-2 FFT kernels with the operation count exactly matching that of FFTW is much harder than one may think. It is emphatically not enough to specialize the textbook FFT code to a particular sample size N, and apply a partial evaluator or a smart compiler to eliminate trivial multiplications (by one and zero) and trivial additions (to zero), and to lift common factors. First of all, the specialized FFT code will contain factors like-2.44921270764475452e-16 (that is the computed IEEE double value of sin((pi/. 2.0) *. 4.0)). That factor is not zero and the corresponding multiplication is not trivial, although it should be. A frequently occurring expression (sin(pi/4)*x + cos(pi/4)*y) presents a subtler problem: although the ideal sin(pi/4) and cos(pi/4) are equal, their computed values are not. Therefore, the partial evaluator or a compiler, which see the specialized code, hence, only the numeric values of the FFT factors, are not able to convert the expression to sin(pi/4)*(x + y), saving one multiplication. Thus the necessary condition for generating the optimal FFT kernels is the use of the exact trigonometry. The roots of unity exp(-I j*2pi/N) must be manipulated symbolically rather than numerically. The condition is not sufficient: an additional insight -- the use of symmetry -- was required to match the operation count of FFTW (see the function add_sub_aa in the generator code below).

The current version is December 2004.

Relating FFTW and Split-Radix.
Joint work with Walid Taha.
Proc. of ICESS'04, the First International Conference on Embedded Software and System, December 9-10 2004, Hangzhou (Zhejiang), China.
<> [58K]
The self-contained commented MetaOCaml code
The code derives radix-2 FFT kernels with the number of additions and multiplications exactly matching those in FFTW kernels. A slight change in the generation algorithm produces mixed-radix FFT kernels.

FFTW: Fastest Fourier Transform in the West

Generating optimal code with confidence
The explanation of the general framework of staging and abstract interpretation

Last updated January 1, 2011

This site's top page is or
Your comments, problem reports, questions are very welcome!

Converted from SXML by SXML->HTML