NetTalk
NETtalk (Sejnowski and Rosenberg)

- network learned to pronounce English text (mapped text to phonemes)
- network input: moving window of 7 characters
- network output: phoneme code for center character in input window
- output fed to a phoneme-to-speech converter
- each input character represented by a group of 29 units (localist representation)
- 203 total input units
- 80 hidden units
- 26 output units for phonemes
- trained on 1024 words using a side-by-side English/phoneme source
- intelligible speech after 10 training epochs; 95% accuracy on training
corpus after 50 epochs
- some hidden units developed meaningful responses (e.g., vowels vs. consonants)
- generalization: 78% accuracy on continuation of training text
- damaging network produced graceful degradation, with rapid recovery on
retraining
- DECtalk performs better, but uses hand-coded linguistic rules developed
over a decade
First Grade Corpus
... when we walk home from school. I walk
home with two friends and sometimes we can't run home from school
now. Because ?? everytime she wants to run, she gets very ?? and
stuff. And then she can't breathe very well and she gets sick.
That's why we can't run.
I like to go to my grandmother's house.
Well because she gives us candy. Well and we eat there sometimes.
Sometimes we sleep overnight there.