The next Siri won't put the emPHAsis on the wrong sylLAble.
That's more or less the promise Apple made during last week's World Wide Developers Conference Keynote. Demonstrating onstage, Apple's senior vice president of Software Engineering, Craig Federighi, asked Siri about the weather.
"Here's the forecast for the next three days: Sunny, sunny, and sunny," replied Siri.
Each "sunny" sounded a shade different. Though Federighi declared it "very powerful," the developer audience didn't break into wild applause.
Maybe that's a victory in itself. With the upcoming iOS 11, the now 6-year-old Siri will sound so natural that no one will notice, and by notice I mean those cringe-worthy moments when Siri (or really any voice assistant) attempts to pronounce a name, location, or offer a more natural reply and it sounds like they swallowed a fly mid-sentence. (My personal favourite is when Siri mangles the name of my hometown.)
Part of that is a result of how Siri's voice was originally built. As Susan Bennett, the woman widely considered to be the first voice of Siri, recounted to The Guardian late last year, Nuance, which built Siri's original voice recognition and response, had her record "hundreds and hundreds of sentences and phrases created to get all sound combinations in the phrases."
And, no, she wasn't recording, "The Weather in El Paso is 100 degrees and sunny."
Instead, Bennett and others who were the original Siri voices recorded sentence after sentence that didn't make any sense. Things like "Fasa, ask fasa ask sati" and "Say the shrading again, say the shraeding again."
With all those speech parts, Siri could construct reasonable facsimiles of voice responses for a dizzying array of questions, even if they didn't all sound exactly human.
Read the full article here.