The person who voiced the original iPhone Siri has revealed herself as Susan Bennett, although Apple won’t confirm it. But what interested me in the story was how the system was created.
For four hours a day, every day, in July 2005, Bennett holed up in her home recording booth. Hour after hour, she read nonsensical phrases and sentences so that the “ubergeeks” — as she affectionately calls them; they leave her awestruck — could work their magic by pulling out vowels, consonants, syllables and diphthongs, and playing with her pitch and speed.
These snippets were then synthesized in a process called concatenation that builds words, sentences, paragraphs. And that is how voices like hers find their way into GPS and telephone systems.
I used to wonder whether they had coded the voice to utter lots of words or even phrases that could be arranged in multiple ways to provide the answers to questions, but it seems like the basic component units are even smaller. Pretty impressive people, these ubergeeks.
This does answer one question that puzzled me. I have an old iPhone 3G that was handed down from my daughter when she upgraded, so I don’t have Siri. But when my daughter was visiting for the first time after this came out, we were all playing around on her phone asking all manner of silly but innocent questions to see what Siri would say. We were startled when she suddenly said, “I am horny”, stopping us dead in our tracks.
We wondered why the people at Apple would program such a sentence but now it seems clear that that particular sentence was not recorded but reconstructed due to some combination of triggers in our questions. We could not get her to say it again.
But was the reconstruction a sheer fluke or were some of the ubergeeks having a bit of fun by secretly throwing in the possibility of such an answer?