  The training should be done over a period of time to take into account differences in background noise, speaker health, microphone differences, and other various factors. Furthermore, an instantaneous key could be skewed because the user will attempt to sound similar at each training session, which will actually produce dissimilarities (by a corollary of Murphy's Law).

A composite key could help alleviate this problem, i.e., taking the five signals that are used in training and producing an "average" of them in terms of magnitude and fundamental frequency. Furthermore, the results could improve over time if the system took successful attempts and added them to the average key. This way, the recognition could get better over time without extra training sessions.

The current lower thresholds for matching determined by makelock.m are set rather high, in a range between 0.90 and 0.95. The reason the number is set high is for the worst case scenario, when the intruder knows your password. As seen in the results, though, intruders without knowledge of the password were never successful in accessing the system. Therefore, it is possible to set the thresholds lower so that the owner's acceptance rates increase while an intruder's acceptance rate remains nominal.

Finally, a speaker's voice has many more characteristics that make it identifiable to the human ear. Computationally, a system can also try to draw out such unique differences and match those to users as well. For example, formants are a function of a person's vocal tract, so tracking such data could further improve results. As mentioned earlier, using linear predictive coding (LPC) and cepstrum techniques may produce more accurate results that increase the rates of acceptance and denial.

© 1999
Sara MacAlpine
JP Slavinsky
Nipul Bharani
Aamir Virani