Speech production is basically a source-filter model. The source is the air provided by the lungs. The filter is the spectral shaping performed by the vocal tract. The convolution of the two in the time domain produces the desired utterance. Because these are two separate processes, the source excitation and the filter implementation can be analyzed separately. Optimizing both the source sub-model and the filter sub-model can improve our speech utterance.

Source Excitation

The source excitation can be one of two types. For voiced speech, the vocal folds close and open rhythmically to make the air from the lung into an "impulse train." This impulse creates the pitch for a sustained voiced sound. For unvoiced speech, the source is white noise. In this case, the air that flows out of the lungs and between the tongue and mouth produces a relatively random sound.

Acoustic Tube and Transmission Line Model

In its simplest form, vocal tract can be modeled as a lossless acoustic tube. The cross-sectional area of the tube and the speed of the air determine the sound pressure and volume velocity, which in turn determines the output speech. The vocal tract can also be modeled as a transmission line. The acoustical resistance, mass, and compliance are distributed along the tube in the same manner as the resistance, inductance, and capacitance along a transmission line.

The single acoustic tube/transmission line model is not adequate to model the wide range of sounds we create. Since speech production is characterized by changing vocal-tract shape, it is more appropriate to create multiple acoustic tubes or cascading transmission lines as models. The specific shape of a vocal-tract and how it changes in time determines the actual word utterances we perceive as speech.

The following figure shows how the source/filter approach applies to the human vocal production organs. The lungs, vocal folds and trachea all belong to the "source" side of the model. The various cavities, the velum and the tongue hump are part of the filter end.