Average Energy Algorithm

We came up with a novel way of doing compression. We have no idea how this might turn out, but this is what the algorithm is about:

First, we performed a Discrete Cosine Transform of the signal. Now that we are in the frequency domain, we calculated the energy at each frequency. Then, we found the mean and the standard deviation of the energy from the spectrum. The first thing we tried was to keep all frequencies with energies within 1 standard deviation from the mean. For those frequencies which have energies outside this range, we zeroed them out. Then, we tried keeping frequencies with energies within 2 standard deviations from the mean and also within 3 standard deviations from the mean. Certainly the more energies we keep, the less compressed the signal is going tol be. After we did that, we performed the Inverse Discrete Cosine Transform and listened to the compressed version.

Here are the original Matlab codes:
keep_1std.m
(Keep the frequencies with energies within 1 standard deviation from the mean)
keep_2std.m
(Keep the frequencies with energies within 2standard deviations from the mean)
keep_3std.m
(Keep the frequencies with energies within 3 standard deviations from the mean)

Results:
To our dismay, this algorithm is somewhat discouraging. The overall quality of the compressed signals is bad, and the amount of compression was insignificant. In fact, the amount of compression varies for different audio signals. Only if we kept the frequencies with energies within 3 standard deviations from the mean was the quality of the compressed version okay. We deduced from this experimentation that this algorithm only works well if the signal is monotonous and has fairly flat tones, has no prolonged period of silence, and has little noise. For example, if there is suddenly a very high-pitched tone which deviates a lot from the mean of the energy spectrum, this might be zeroed out. If the signal is not monotonous, it is difficult to obtain an decent representation of the original audio file because the energy spectrum of the signal could be skewed. If we apply this algorithm to such signals, we could very well have ignored the frequencies that are much higher or lower than the mean. Silent periods have energies of zero. Thus if there is a prolonged period of silence, we would have many frequencies with zero energies. This certainly affects the mean of the energy spectrum; in fact, the mean of the energy spectrum is actually lower than it should be. Similarly if there is a lot of noise, which has low energies, the calculated mean of the energy spectrum would be lower than the intended mean of the energy spectrum of the pure audio signal.

All in all, it was interesting to try this algorithm. We think that if the energy spectrum of the signal resembles more like the normal distribution, the results would turn out a bit better.

 
Keep 1 std from mean
Keep 2 stds from mean
Keep 3 stds from mean
Audio File
Original size
Size after compression
% compressed
Quality
Size after compression
% compressed
Quality
Size after compression
% compressed
Quality
DING.wav (fairly flat tones)
20191
19915
1.37
Good
20061
0.64
Good
20088
0.51
Good
NOTIFY.wav
29823
29196
2.1
Okay
29469
1.19
Good
29574
0.83
Good
Musica Close.wav
43777
37773
13.71
Okay
40905
6.56
Good
41963
4.14
Good
Musica Asterisk.wav
26304
25513
3.01
Okay
25941
1.38
Good
26087
0.82
Good
Utopia Open.wav
4330
48
98.89
Can't hear anything
99
97.71
Can't hear anything
149
96.56
Can't hear anything
EE_REV.wav
117600
67303
42.77
Bad
83121
29.32
Okay
91762
21.97
Okay
TADA.wav
42752
29583
30.8
Bad
34563
19.15
Bad
36830
13.85
Okay
RECYCLE.wav
12671
57
99.55
Can't hear anything
109
99.14
Can't hear anything
168
98.67
Can't hear anything

Listen to what this algorithm does:

Original
keep_1std
keep_2std
keep_3std
DING.wav
NOTIFY.wav
Musica Close.wav
Musica Asterisk.wav
Utopia Open.wav
EE_REV.wav
TADA.wav
RECYCLE.wav

 

Spectrograms:
DING.wav
NOTIFY.wav
Musica Close.wav
Musica Asterisk.wav
Utopia Open.wav
EE_REV.wav
TADA.wav
RECYCLE.wav