Dope FFT Methods

[nmethodology]

Methods Employed

Seven different DFT/FFT algorithms were implemented, and their computational
complexity was measured. The program was written in C (since that's what the
author knew best, and because it is fast). The runtime behavior of the seven
algorithms was analyzed using performance counters inserted in the algorithm
bodies. We incremented counters each time an addition, multiplication or
trigonometric function was used. These were all real multiplications and
additions, since we were counting the actual computations the computer was
doing. Several different sizes were run (all powers of four, since two
radix-4 Cooley-Tukey algorithms were being tested), and their performance
measured. Obviously relatively heavy weight must be given to trigonometric
functions, since they take significantly longer than additions or
multiplications. Similarly, multiplications should be weighted more heavily
on most modern computers. It should be noted that only true multiplications were
counted as such; modern compilers attempt to improve performance by converting
simple operations like division by two into arithmetic shifts. Since this
only happens with static divisions by powers of two, it was easy to determine
at what points in the code this transformation would take place. We simply counted
those transformable multiplications as additions, since most modern processors
complete all simple arithmetic operations (including additions) in the same
time, generally one cycle.

We did not attempt to count any other factors that might affect performance
(such as memory access patterns, interactions with the cache organization,
etc), since these factors would be much more difficult to account for without
either more sophisticated profiling software or a simulator of some sort. We
also felt that the computations would be most significant to the runtime
performance, since most of the working sets used would easily fit into a
modestly sized cache.

The correctness of the algorithms was tested based on visual inspection and
through comparison with FORTRAN implementations found in "DFT/FFT and
Convolution Algorithms" by our own C.S. Burrus. All input data for the actual
testing was generated using the random() interface in the Solaris C library.
We restricted the range to [0,0xfff] or [0,4095].

[/methodology]