Future Work

Our work only begins the journey to the ideal speech synthesizer. There is room for improvement and refinement in every step we took. Given enough time, there are several improvements we could have undertaken.

First, we could have incorporated more linguistic features in our analysis of the reference speech signals. Looking at such characteristics as stress, intonation, segment length, and coarticulation, and devising ways of transferring these to the synthesis end, would result in more convincing speech.

Dynamic generation of LPC coefficients could help smooth out the synthesized speech. An abrupt change in coefficients is noticeable, so taking a new set of coefficients every few microseconds or so would remove the obvious breaks as well as offer us improved resolution.

Taking the model further, we could have left the zeroes in the original linear prediction filter. We removed them for simplicity's sake, removing the effects of the nasal cavity at the same time. Replacing them would allow the synthesis of nasal sounds such as /n/ and /m/.

Finally, we could incorporate this simple model into the grand scheme of a text-to-speech system. Such a system would be able to take a text word given to it and synthesize the speech, drawing from its knowledge about word structure and phoneme characteristics.