ECE532 Project

Determining Pitch Period
For each frame, we must determine if the speech is voiced or unvoiced. We do this by searching for periodicities in the residual (prediction error) signal.

In the figures below, we see the residuals for two typical frames, one voiced and one unvoiced. Clearly, the unvoiced frame is very noise-like, but the periodicities in the voiced residual are not easy to see. Therefore, we compute the autocorrelation of both residuals. In the unvoiced frame, the autocorrelation is near zero except for the spike at R_x(0), as we expect for white noise. However, the autocorrelation for the voiced frame clearly displays the periodicities.

Unvoiced

Error Residual

Voiced

Error Residual

Unvoiced

Autocorrelation

Voiced

Autocorrelation

To determine if the frame is voiced or unvoiced, we apply a threshold to the autocorrelation (shown below). Typically, this threshold is set at R_x(0) * 0.3. If no values of the autocorrelation sequence exceed this threshold, then we declare the frame unvoiced. If we have periodicities in the data (as in the second figure), then we should see spikes which do exceed the threshold; in this case we declare the frame voiced. Notice that the distance between spikes in the autocorrelation function is equivalent to the pitch period of the original signal.

BACK