Average Energy Algorithm
We came up with a novel way of doing compression. We have no idea how this might turn out, but this is what the algorithm is about:
First, we performed a Discrete Cosine Transform of the signal. Now that we are in the frequency domain, we calculated the energy at each frequency. Then, we found the mean and the standard deviation of the energy from the spectrum. The first thing we tried was to keep all frequencies with energies within 1 standard deviation from the mean. For those frequencies which have energies outside this range, we zeroed them out. Then, we tried keeping frequencies with energies within 2 standard deviations from the mean and also within 3 standard deviations from the mean. Certainly the more energies we keep, the less compressed the signal is going tol be. After we did that, we performed the Inverse Discrete Cosine Transform and listened to the compressed version.
Here are the
original Matlab codes:
keep_1std.m
(Keep the frequencies with energies within 1 standard deviation from the mean)
keep_2std.m
(Keep the frequencies with energies within 2standard deviations from the mean)
keep_3std.m
(Keep the frequencies with energies within 3 standard deviations from the
mean)
Results:
To our dismay, this algorithm is
somewhat discouraging. The overall quality of the compressed signals is bad,
and the amount of compression was insignificant. In fact, the amount of compression
varies for different audio signals. Only if we kept the frequencies with energies
within 3 standard deviations from the mean was the quality of the compressed
version okay. We deduced from this experimentation that this algorithm only
works well if the signal is monotonous and has fairly flat tones, has no prolonged
period of silence, and has little noise. For example, if there is suddenly
a very high-pitched tone which deviates a lot from the mean of the energy
spectrum, this might be zeroed out. If the signal is not monotonous, it is
difficult to obtain an decent representation of the original audio file because
the energy spectrum of the signal could be skewed. If we apply this algorithm
to such signals, we could very well have ignored the frequencies that are
much higher or lower than the mean. Silent periods have energies of zero.
Thus if there is a prolonged period of silence, we would have many frequencies
with zero energies. This certainly affects the mean of the energy spectrum;
in fact, the mean of the energy spectrum is actually lower than it should
be. Similarly if there is a lot of noise, which has low energies, the calculated
mean of the energy spectrum would be lower than the intended mean of the energy
spectrum of the pure audio signal.
All in all, it was interesting to try this algorithm. We think that if the energy spectrum of the signal resembles more like the normal distribution, the results would turn out a bit better.
Keep
1 std from mean |
Keep
2 stds from mean |
Keep
3 stds from mean |
||||||||
Audio File | Original
size |
Size
after compression |
%
compressed |
Quality |
Size
after compression |
%
compressed |
Quality |
Size
after compression |
%
compressed |
Quality |
DING.wav (fairly flat tones) | 20191 |
19915 |
1.37 |
Good |
20061 |
0.64 |
Good |
20088 |
0.51 |
Good |
NOTIFY.wav | 29823 |
29196 |
2.1 |
Okay |
29469 |
1.19 |
Good |
29574 |
0.83 |
Good |
Musica Close.wav | 43777 |
37773 |
13.71 |
Okay |
40905 |
6.56 |
Good |
41963 |
4.14 |
Good |
Musica Asterisk.wav | 26304 |
25513 |
3.01 |
Okay |
25941 |
1.38 |
Good |
26087 |
0.82 |
Good |
Utopia Open.wav | 4330 |
48 |
98.89 |
Can't
hear anything |
99 |
97.71 |
Can't
hear anything |
149 |
96.56 |
Can't
hear anything |
EE_REV.wav | 117600 |
67303 |
42.77 |
Bad |
83121 |
29.32 |
Okay |
91762 |
21.97 |
Okay |
TADA.wav | 42752 |
29583 |
30.8 |
Bad |
34563 |
19.15 |
Bad |
36830 |
13.85 |
Okay |
RECYCLE.wav | 12671 |
57 |
99.55 |
Can't
hear anything |
109 |
99.14 |
Can't
hear anything |
168 |
98.67 |
Can't
hear anything |
Listen to what this algorithm does:
Original | keep_1std |
keep_2std |
keep_3std |
DING.wav | |||
NOTIFY.wav | |||
Musica Close.wav | |||
Musica Asterisk.wav | |||
Utopia Open.wav | |||
EE_REV.wav | |||
TADA.wav | |||
RECYCLE.wav |
Spectrograms:
DING.wav
NOTIFY.wav
Musica Close.wav
Musica Asterisk.wav
Utopia Open.wav
EE_REV.wav
TADA.wav
RECYCLE.wav