Determining Pitch Period
For each frame, we must determine if the speech is voiced or unvoiced. We do this by searching for periodicities in the residual (prediction error) signal.
In the figures below, we see the residuals for two typical frames, one voiced and one unvoiced. Clearly, the unvoiced frame is very noise-like, but the periodicities in the voiced residual are not easy to see. Therefore, we compute the autocorrelation of both residuals. In the unvoiced frame, the autocorrelation is near zero except for the spike at Rx(0), as we expect for white noise. However, the autocorrelation for the voiced frame clearly displays the periodicities.
| Unvoiced |
| Error Residual |
 |
| Voiced |
| Error Residual |
 |
| Unvoiced |
| Autocorrelation |
 |
| Voiced |
|
| Autocorrelation |
 |
To determine if the frame is voiced or unvoiced, we apply a threshold to the autocorrelation (shown below). Typically, this threshold is set at Rx(0) * 0.3. If no values of the autocorrelation sequence exceed this threshold, then we declare the frame unvoiced. If we have periodicities in the data (as in the second figure), then we should see spikes which do exceed the threshold; in this case we declare the frame voiced. Notice that the distance between spikes in the autocorrelation function is equivalent to the pitch period of the original signal.