Anatomy of Speech Production

We have to first understand how human speech production works in order to create a model for machine speech. Our understanding of the anatomy of speech production can help us create a model for machine speech.

In general, a speech signal is an air pressure wave that travels from the speaker's mouth to the listener's ears. Figure1 is a schematic of the anatomy of speech production. The lung produces the initial air pressure that is essential for the speech signal; the pharyngeal cavity, oral cavity, and nasal cavity shapes the final output sound that is perceived as speech.

The pharyngeal cavity and oral cavity (collectively known as the vocal tract) contracts and relaxes dynamically to create all sorts of sounds through resonance. The nasal cavity opens another air hole to create what linguists call nasal sounds (ie. /m/, /n/). Together, these cavities characterize the sounds we produce.