White Noise Excited LPC
Here we simply work out the AR coefficients and throw white noise through the
prediction error filter. As you can see by playing the sound file below the
quality of the speech is not terribly good. Although we can sort of make
out what the person is saying it is not clear who is actually speaking.
As speech is not exactly AR the coefficients aren't enough to represent the
speech. We need some more information and this is the motivation for the
next idea.
White Noise Excited Speech
Pitch Excited LPC
In this technique we make a distinction between frames. We have frames
classified as either 'voiced' or 'unvoiced'. In the case of 'unvoiced'
frames we continue to excite it with white noise. 'Voiced' frames, on the
other hand, are excited with a periodic pulse train with a pulse rate
corresponding to the fundamental frequency of the frame. The fundamental
frequency is calculated using pitch detection. The estimation of the Pitch period is the most important and vulnerable
aspect of Pitch Excited LPC. Bad pitch prediction could worsen the simplest
LPC method of using just white noise as excitation while a good pitch
prediction algorithm would improve the speech quality a whole lot and allow
speaker identification. We find that there is set down rules or exact
science behind pitch prediction and we try several ad-hoc techniques. Pitch
detection algorithms can be divided into:
(a) those that utilize the frequency domain properties of the speech.
(b) Those that utilize the time domain properties of the speech
(c) Those that use both time and frequency domain properties of the speech
signal.
We briefly studied the Gold Rabiner algorithm and implemented the
Homomorphic and the Autocorrelation Pitch tracker methods.
Figure 4
If you play the sound file below it is clear that the quality of the speech
is much higher then with just the white noise. Pitch excited LPC works well
at low bit rates, however, it can never produce toll quality speech just
because of the uncertainty of pitch detection.
Pitch Excited Speech
Residual Excited LPC
This technique gets rid of the pitch detection side and instead
concentrates on the residual (approximately white). Essentially you just
perform some kind of compression on the residual. The nice thing about this
is that you have a direct tradeoff between the compression ratio and the
quality of the speech. The less compressed the residual the higher quality
speech. In our scheme we simply stored the sign of the residual and used
that as our excitation for our linear prediction error filter. If you play
the below speech file you will see that the quality of this simple
compression yields very impressive results. It is clear what the speaker
is saying and also who is speaking. This is an improvement over PELPC.
Typically, RELP involves low pass filtering and downsampling the residual
and so we need to perform high frequency regeneration on the synthesis side
which can be problematic.
Residual Excited Speech
Code Book Excited LPC
This technique falls under the more general idea of analysis by synthesis.
In a nutshell, this involves performing an exhaustive search of all the
codewords in a certain codebook to find the 'best' excitation. More,
specifically, after working out the AR coefficients for the speech frames,
the speech is excited with each of the codewords in the codebook until the
one that minimizes some perceptual error is found. The index of this 'best'
codeword along with the normalized energy of the frame is sent or stored
along with the AR coefficients.
In general this is the best technique known for high quality, low bit rate
speech compression, however, it has the disadvantage of being the most
computationally demanding.