MadSci Network: Computer Science |
Dear Lance, It is certainly possible to create synthesized speech that sounds very natural. We know quite a bit about the acoustic structure of individual speech sounds (phonemes) and can model this structure with a computer synthesizer. However, going from natural-sounding synthesis of individual phonemes to natural-sounding text is not as easy as stringing phonemes together like beads on a necklace. When we produce speech, our mouths are constrained in their movements. They can only move so far so fast! As a result, the way a particular phoneme is produced is highly dependent upon the phonemes that are spoken before and after it. (Try saying "dean" and "draw", feeling where your tongue hits the roof of your mouth as you speak. There is a subtle but important difference in the position where your tongue makes contact for "d"). This context-dependency is part of what gives speech a "natural" sound. Most simple text-to-speech devices simply string together phonemes, avoiding all of these context-dependencies - thus the robotic, unnatural sound. Many scientists are working to design better text-to-speech devices, but the problem has proven quite difficult. In English, especially, individual letters of the alphabet do not even represent the same sound in every word! (the "i" in "bit" and "bite" for example). The mapping from letter to sound is extremely complex and thus difficult to model in a text-to-speech device. So will these difficulties ever be solved? My own view is that we've been approaching the problem from the wrong perspective. Instead of searching for engineering solutions, we might be better off to continue to work toward understanding how speech is produced and perceived by human speakers and listeners in hopes of gaining clues of how to model these processes in computers. If you'd like more information about this field of research, please feel free to contact me. Lori L. Holt llholt@facstaff.wisc.edu
Try the links in the MadSci Library for more information on Computer Science.