Dope History

[history]

Theoretical Background

In its simplest and most naive form, the DFT is an extremely slow operation.
It requires N^2 complex operations, where N is the size of the DFT being
performed. However, a myriad of more efficient algorithms have been developed
to exploit the inherent symmetry in the DFT. The discovery of these
algorithms has been largely responsible for the explosion in popularity of
digital signal processing, and for a huge number of devices many of us use
every day. A large subset of these algorithms depend on N being a power of 2,
4 or 8, though the radix-2 and radix-4 are more efficient than the radix-8
(since a length 8 DFT requires multiplications). These radix-based algorithms
break the larger DFT into a sequence of radix-length DFTs. They also take
advantage of symmetry in the algorithm to save computations. Since length 2
and 4 DFTs require no multiplications, this alone results in a huge
performance benefit.

Several other optimizations were explored on top of simple radix-2 and radix-4
algorithms. We implemented versions of the radix-2 and radix-4 Cooley-Tukey
algorithm that save computational cycles by removing all multiplications by
numbers like 1 or j. These resulted in significant additional savings. We
also evaluated versions that pre-compute all necessary trigonometric functions
and use table lookups during the actual algorithmic computations. While these
obviously did not reduce the total number of trigonometric functions that
needed computing, they are useful for times when the cost of an initialization
phase can be amortized with other initialization, and when real-time results
are necessary (since table lookups, which might generally be serviced by the
cache, are much quicker than inline computation of a sine or cosine).

[/history]