Compress This: Lossy Compression



Lossy compression (quantization) is feasible because of the abilities of the human auditory system. The frequency range of the human ear is from about 20 Hz to 20,000 Hz. Thus, any audio samples that are below 20 Hz can be removed without corrupting listening quality. Two other properties of the auditory system, frequency masking and temporal masking, make lossy compression possible. Frequency masking occurs when a frequency we can normally hear is masked by a nearby frequency. The masked frequency can be removed because the ear cannot hear the sound. Temporal masking occurs when a weak frequency is preceded by a strong frequency. If the time interval between the frequencies is short, the sound associated with the weaker frequency may not be heard.

The human brain can be tricked audibly in the sense that we "hear" sounds that do not really exist. We subconsciously hear what we expect to hear. Our brains subconsciously attempt to fill in the blanks of all the songs, sounds, verbal phrases, etc, that we hear everyday. The same concept also applies to our visual abilities. For example, the cool special effect that is appearing in new action movies is the 360-camera pan. You get a circular 360-degree view of the heroes in the fighting mode against the villains (remember those parts in The Matrix and Charlie's Angels). The effect is created by filming the scene with several cameras set up around it. In the editing room shots from the different cameras are strung together. Of course, this technique is not seamless (you cannot film each an every angel around the circle). However, when you watch these strung together camera shots, you brain automatically fills in these blanks. You see a seamless circular pan!

When quantizing the signals, our goals were to reduce the size of the file and minimize the noise added to the file. The lossless compression step was taken into consideration because one file could be bigger than another before the lossless compression but smaller than the other one after they have both been compressed.

We experimented with four different types for quantization so that we could try to find an acceptable balance between size reduction and noise minimization. The basic process was the same: normalize the signal to [-1. 1] and then round each sample to the nearest quantization level. The main different between the various type is how these quantization levels are chosen. The first type, linear quantization, has equally spaced levels along the interval. The second type is based on a sinh function in order to give more levels near zero and fewer near 1 and -1. Sinh quantization tries to preserve the quieter parts of the signal by giving them higher resolution. A more or less opposite type is tangential quantization, which uses the tangent function to provide more levels at the extremes and fewer near zero. Tan quantization gives preference to the louder parts of the signal.

The last method used is a variant of tan quantization. Block-tangential quantization splits the signal into blocks prior to normalization. These blocks are then normalized individually, and the whole resulting signal is quantized using using tangential quantization. Block-tangential tries to exploint the fact that the large values in a block will probably mask the smaller values. More resolution should therefore be reserved for the larger amplitudes, hence the use of tan quantization. And because each block is normalized on its own, a chunk out of a quiet part of a song will not suffer nearly as much from the tangential quantization.

The test signals were quantized with each method using 4, 6, and 8 bits per sample. Fewer than 4 bits introduced too much quantization noise. We determined that using 6 bits was in most situations pointless. The lossless compression system performed much worse on the 6-bit quantized signals. So in many cases the 8-bit quantization produced a smaller file after the lossless step, and the 8-bit quantized signals always had a smaller mean squared error than their 6-bit counterparts.