[main] . [history] . [methods] . [results] . [future work] . [files] . [who is dope] . [links]

-------------------------------------

DOPE FFT RESULTS

-------------------------------------

[results]

 

Results

Our results correspond quite well to our expectations. Figures 1-3 show
the number of additions, multiplications and trig functions used per
algorithm, respectively (note that the graphs are semilog plots).

 

 

 

 

The names of the algorithms implemented are:

DFTSimple - A naive N^2 algorithm with no optimizations.
DFTFaster - A reorganized version of DFTSimple that considerably
reduces the amount of integer arithmetic and
trigonometric calculation necessary.
DFTTable - DFTFaster with table lookup of trig functions. This is
necessary because DFTFaster still computes 2N^2
trig functions, when only N (or fewer) values are needed.
DFTTukey2 - A simple Cooley-Tukey radix-2 algorithm. No further
optimizations are made.
DFTTukey2MultTable - DFTTukey2 with multiplications by 1,j,etc removed, and
table lookup of trig functions.
DFTTukey4 - A straightforward radix-4 Cooley-Tukey algorithm that
uses table lookup of trig functions.
DFTTukey4Mult - DFTTukey4 with multiplications by 1,j,etc removed.

As you can see, DFTFaster significantly reduces the number of trig functions
and additions needed by DFTSimple, though it slightly increases the number of
multiplications needed. This would still be a significant performance win,
since the trig functions are extremely computationally intensive.

DFTTable performs significantly better than DFTFaster, since it drastically
reduces the number of trig functions that need to be computed.

However, the real performance gains are not seen until radix-2 or radix-4
algorithms are used. DFTTukey2, though a simplistic, straightforward
implementation, sees massive reduction in all areas of computation other than
trig functions (since there is a bottom limit on the number of trig functions
that have to be computed).

DFTTukey2MultTable yields a modest reduction in multiplications as compared to
DFTTukey2, as expected. This shows that there are indeed a lot of useless
multiplications being performed in the straightforward algorithm.

Further large performance gains are realized upon moving to a radix-4
implementation. DFTTukey4 performs significantly fewer additions and
multiplications than even DFTTukey2MultTable, an optimized radix-2 version.
This shows that the increase in the complexity of the butterfly necessary for
radix-4 computations (which yields more complex array accesses) is by far
overwhelmed by the reduction in computation. This should be clear, since a
length 4 DFT also requires no multiplications.

Finally, extremely significant additional speedup is obtained by removing
useless multiplications by 1, j, etc. This shows that there is even more
useless computation inherent in the simple radix-4 algorithm than in the
simple radix-2 algorithm. Again, trig functions are not reduced, for the same
reason.

Thus, we have seen that the use of complicated but efficient radix-2 and
radix-4 algorithms truly do provide massive speedups. It is amazing how much
redundancy is present in the simplistic DFT, and how much the symmetry of that
algorithm can by exploited. As we were running the test program, we noticed
that while DFTSimple, DFTFaster and DFTTable were taking several minutes to
complete on the larger (4096, 16384) values of N, the radix-2 and radix-4
versions never took more than a couple of seconds.

[/results]