
Beat Detection Algorithm
We adapted a beat detection algorithm from the MIT Media Lab to Matlab
for our project. Although the algorithm is quite mathematically
involved, it basically amounts to emphasizing the sudden impulses of
sound in the song and then finding the fundamental period at which
these impulses appear. This is done by breaking the signal into
frequency bands, extracting the envelope of these frequencybanded
signals, differentiating them to emphasize sudden changes in sound,
and running the signals through a combfilterbank and choosing the
highest energy result as our tempo.
Our algorithm does not account for the possibility of a varying tempo
in a piece of music. It extracts 2.2 seconds of music from the middle
of a song, tempoanalyzes it, and assumes that the tempo found is
associated with the entire piece of music. The sample length, 2.2
seconds, is the minimal amount of information we can work with to have
at least two beats for the slowest tempo we are allowing for,
60 bpm. We are choosing 2.2 seconds rather than 2 seconds to ensure
that the tempochoosing processing does not exagerrate the energy of
60 bpm. A two second sample could have its beginning and end perceived
as beats by a 60bpm search process. Since 60bpm is our lowest search
tempo, any sample longer than two seconds works fine.
Step 1: Filterbank
The signal is divided up into six separate signals, each consisting of
the frequency content of the original signal from a certain
range. This has the general effect of separating "notes" from
different instrument groups and allowing them to be analyzed
separately. Tempoanalyzing the original signal could be errorprone
due to conflicting downbeats of different instruments. This separation
is performed by taking the FFT of the signal and then taking
appropriate sized chunks of the FFT and assigning them to their
frequency bands. (Chunk length is dependent on both the sampling
frequency and the signal length.) We chose to follow the method
outlined in Scheirer, 1998 and broke our signal into the bands
0200Hz, 200400Hz, 400800Hz, 8001600Hz, 16003200Hz, and finally
3200HzSampling Frequency. This gave us a total of six bands to work
with. The inverse FFT of these signals is taken and their time domain
representation is sent to the smoothing function.
Step 2: Smoothing
Since we are only looking for the tempo of our signal, we need to
reduce it to a form where we can see sudden changes in sound. This is
done by reducing the signal down to its envelope, which can be though
of as the overall trend in sound amplitude, not the frequencies it
carries. Essentially, we take each of our six frequencybanded
signals and lowpass filter them. To accomplish this we first fullwave
rectified our signals in order to decrease highfrequency content and
so that we would only have to deal with the positive side of the
envelope we were searching for. We then convolved each of our signals
by the right half of a Hanning window with a length of .4
seconds. Again, we chose to do this operation in the by transforming
to the frequency domain, multiplying, and inverse transforming to
decrease computation time. The resulting plots of the frequencybanded
signals definitely correspond to the envelopes of the original
signals.
Step 3: DiffRect
Now that we have the signals in an enveloped form, we can simply
differentiate them to accentuate when the sound amplitude changes. The
largest changes should correspond to beats since the beat is just a
periodic emphasis of sound. The six frequencybanded signals are
differentiated in time and then halfwave rectified so we can only see
increases in sound. The signals are now ready to be tempoanalyzed.
Step 4: Comb Filter
This is the most computationally intensive step. We need to convolve
the differentiated frequencybanded signals with various comb filters
to determine which yields the highest energy. A comb filter is
basically a series of impulses that occur periodically, at the tempo
you specify. Convolving a comb filter with a total of three impulses
with our signal should give an output that has a higher energy when
the tempo of the comb filter is close to a multiple of that of the
song. This is because the convolving with the three impulse comb
filter just results in an output vector made up of an echoed version
of our original signal. This echoed output will have a higher energy
if the tempo of the signal and comb filter match because it will
result in there being higher peaks (overlap from echo) in the output
which when squared will give a higher energy.
We implemented the comb filter by specifying a range of tempos we
wanted to search for and the resolution or spacing between them. This
established a set of comb filters to convolve with our signal. The FFT
of each frequencybanded signal was multipled by the FFT of each comb
filter and the energy of each of these was taken. Then the energies of
the frequencybands were summed so that we were left with a vector of
energies of tempos. This output takes the form of a peak at the
fundamental tempo of the song followed by smaller, wider peaks at
multiples of this tempo. We then choose the maximum value of these
energies to be the fundamental tempo of our piece.
Beat Detection Block Diagram
Matlab Code
 filterbank.m 
divides a time domain signal into individual frequency
bands.
 hwindow.m 
rectifies a signal, then convolves it with a half Hanning
window.
 diffrect.m  differentiates a signal, then halfwave rectifies the
result.
 timecomb.m  finds the tempo of a musical signal, divided into
frequency bands.
 control.m  takes in the names
of two .wav files, and outputs their combinationbeatmatched, and phase aligned.
Matlab Plots
Step 1:
Filterbank (output of 1 frequency band)
Step 2: Smoothing
Step 3: DiffRect
Step 4: Comb Filter
