White Noise Excited LPC

Here we simply work out the AR coefficients and throw white noise through the prediction error filter. As you can see by playing the sound file below the quality of the speech is not terribly good. Although we can sort of make out what the person is saying it is not clear who is actually speaking.
As speech is not exactly AR the coefficients aren't enough to represent the speech. We need some more information and this is the motivation for the next idea.

White Noise Excited Speech

Pitch Excited LPC

In this technique we make a distinction between frames. We have frames classified as either 'voiced' or 'unvoiced'. In the case of 'unvoiced' frames we continue to excite it with white noise. 'Voiced' frames, on the other hand, are excited with a periodic pulse train with a pulse rate corresponding to the fundamental frequency of the frame. The fundamental frequency is calculated using pitch detection. The estimation of the Pitch period is the most important and vulnerable aspect of Pitch Excited LPC. Bad pitch prediction could worsen the simplest LPC method of using just white noise as excitation while a good pitch prediction algorithm would improve the speech quality a whole lot and allow speaker identification. We find that there is set down rules or exact science behind pitch prediction and we try several ad-hoc techniques. Pitch detection algorithms can be divided into:

(a) those that utilize the frequency domain properties of the speech.

(b) Those that utilize the time domain properties of the speech

(c) Those that use both time and frequency domain properties of the speech signal.

We briefly studied the Gold Rabiner algorithm and implemented the Homomorphic and the Autocorrelation Pitch tracker methods.

Figure 4

If you play the sound file below it is clear that the quality of the speech is much higher then with just the white noise. Pitch excited LPC works well at low bit rates, however, it can never produce toll quality speech just because of the uncertainty of pitch detection.

Pitch Excited Speech

Residual Excited LPC

This technique gets rid of the pitch detection side and instead concentrates on the residual (approximately white). Essentially you just perform some kind of compression on the residual. The nice thing about this is that you have a direct tradeoff between the compression ratio and the quality of the speech. The less compressed the residual the higher quality speech. In our scheme we simply stored the sign of the residual and used that as our excitation for our linear prediction error filter. If you play the below speech file you will see that the quality of this simple compression yields very impressive results. It is clear what the speaker is saying and also who is speaking. This is an improvement over PELPC.

Typically, RELP involves low pass filtering and downsampling the residual and so we need to perform high frequency regeneration on the synthesis side which can be problematic.

Residual Excited Speech

Code Book Excited LPC

This technique falls under the more general idea of analysis by synthesis. In a nutshell, this involves performing an exhaustive search of all the codewords in a certain codebook to find the 'best' excitation. More, specifically, after working out the AR coefficients for the speech frames, the speech is excited with each of the codewords in the codebook until the one that minimizes some perceptual error is found. The index of this 'best' codeword along with the normalized energy of the frame is sent or stored along with the AR coefficients.

In general this is the best technique known for high quality, low bit rate speech compression, however, it has the disadvantage of being the most computationally demanding.