A good article. Also, I'm checking to see whether this list is still
functioning.
>Date: Fri, 17 Dec 2004 02:03:54 -0500
>From: Alan Cantor <acantor@cantoraccess.com>
>Subject: FW: Synthesizing human emotions
>To: EASI@MAELSTROM.STJOHNS.EDU
>
>Synthesizing human emotions
>
>By: Michael Stroh, Sun Staff AP Worldstream, November 29, 2004
>
>Speech: Melding acoustics, psychology and linguistics, researchers teach
>computers to laugh and sigh, express joy and anger.
>
>Shiva Sundaram spends his days listening to his computer laugh at him.
>Someday, you may know how it feels.
>
>The University of Southern California engineer is one of a growing number of
>researchers trying to crack the next barrier in computer speech synthesis --
>emotion. In labs around the world, computers are starting to laugh and sigh,
>express joy and anger, and even hesitate with natural ums and ahs.
>
>Called expressive speech synthesis, "it's the hot area" in the field today,
>says Ellen Eide of IBM's T.J. Watson Research Center in Yorktown Heights,
>N.Y., which plans to introduce a version of its commercial speech
>synthesizer that incorporates the new technology.
>
>It is also one of the hardest problems to solve, says Sundaram, who has
>spent months tweaking his laugh synthesizer. And the sound? Mirthful, but
>still machine-made.
>
>"Laughter," he says, "is a very, very complex process."
>
>The quest for expressive speech synthesis -- melding acoustics, psychology,
>linguistics and computer science -- is driven primarily by a grim fact of
>electronic life: The computers that millions of us talk to every day as we
>look up phone numbers, check portfolio balances or book airline flights
>might be convenient but, boy, can they be annoying.
>
>Commercial voice synthesizers speak in the same perpetually upbeat tone
>whether they're announcing the time of day or telling you that your
>retirement account has just tanked. David Nahamoo, overseer of voice
>synthesis research at IBM, says businesses are concerned that as the
>technology spreads, customers will be turned off. "We all go crazy when we
>get some chipper voice telling us bad news," he says.
>
>And so, in the coming months, IBM plans to roll out a new commercial speech
>synthesizer that feels your pain. The Expressive Text-to-Speech Engine took
>two years to develop and is designed to strike the appropriate tone when
>delivering good and bad news.
>
>The goal, says Nahamoo, is "to really show there is some sort of feeling
>there." To make it sound more natural, the system is also capable of
>clearing its throat, coughing and pausing for a breath.
>
>Scientist Juergen Schroeter, who oversees speech synthesis research at AT&T
>Labs, says his organization wants not only to generate emotional speech but
>to detect it, too.
>
>"Everybody wants to be able to recognize anger and frustration
>automatically," says Julia Hirschberg, a former AT&T researcher now at
>Columbia University in New York.
>
>For example, an automated system that senses stress or anger in a caller's
>voice could automatically transfer a customer to a human for help, she says.
>The technology also could power a smart voice mail system that prioritizes
>messages based on how urgent they sound.
>
>Hirschberg is developing tutoring software that can recognize frustration
>and stress in a student's voice and react by adopting a more soothing tone
>or by restating a problem. "Sometimes, just by addressing the emotion, it
>makes people feel better," says Hirschberg, who is collaborating with
>researchers at the University of Pittsburgh.
>
>So, how do you make a machine sound emotional?
>
>Nick Campbell, a speech synthesis researcher at the Advanced
>Telecommunications Research Institute in Kyoto, Japan, says it first helps
>to understand how the speech synthesis technology most people encounter
>today is created.
>
>The technique, known as "concatenative synthesis," works like this:
>Engineers hire human actors to read into a microphone for several hours.
>Then they dice the recording into short segments. Measuring in the
>milliseconds, each segment is often barely the length of a single vowel.
>
>When it's time to talk, the computer picks through this audio database for
>the right vocal elements and stitches them together, digitally smoothing any
>rough transitions.
>
>Commercialized in the 1990s, concatenative synthesis has greatly improved
>the quality of computer speech, says Campbell. And some companies, such as
>IBM, are going back to the studio and creating new databases of emotional
>speech from which to work.
>
>But not Campbell.
>
>"We wanted real happiness, real fear, real anger, not an actor in the
>studio," he says.
>
>So, under a government-funded project, he has spent the past four years
>recording Japanese volunteers as they go about their daily lives.
>
>"It's like people donating their organs to science," he says.
>
>His audio archive, with about 5,000 hours of recorded speech, holds samples
>of subjects experiencing everything from earthquakes to childbirth, from
>arguments to friendly phone chat. The next step will be using those sounds
>in a software-based concatenative speech engine.
>
>If he succeeds, the first customers are likely to be Japanese auto and toy
>makers, who want to make their cars, robots and other gadgets more
>expressive. As Campbell puts it, "Instead of saying, 'You've exceeded the
>speed limit,' they want the car to go, "Oy! Watch it!"
>
>Some researchers, though, don't want to depend on real speech. Instead, they
>want to create expressive speech from scratch using mathematical models.
>That's the approach Sundaram uses for his laugh synthesizer, which made its
>debut this month at the annual meeting of the Acoustical Society of America
>in San Diego.
>
>Sundaram started by recording the giggles and guffaws of colleagues. When he
>ran them through his computer to see the sound waves represented
>graphically, he noticed that the sound waves trailed off as the person's
>lungs ran out of air. It reminded him of how a weight behaves as it bounces
>to a stop on the end of a spring. Sundaram adopted the mathematical
>equations that explain that action for his laugh synthesizer.
>
>But Sundaram and others know that synthesizing emotional speech is only part
>of the challenge. Yet another is determining when and how to use it.
>
>"You would not like to be embarrassing," says Jurgen Trouvain, a linguist at
>Saarland University in Germany who is working on laughter synthesis.
>
>Researchers are turning to psychology for clues. Robert R. Provine, a
>psychologist at the University of Maryland, Baltimore County who pioneered
>modern laughter research, says the truth is sometimes counterintuitive.
>
>In one experiment, Provine and his students listened in on discussions to
>find out when people laughed. The big surprise?
>
>"Only 10 to 15 percent of laughter followed something that's remotely
>jokey," says Provine, who summarized his findings in his book Laughter: A
>Scientific Investigation.
>
>The one-liners that elicited the most laughter were phrases such as "I see
>your point" or "I think I'm done" or "I'll see you guys later." Provine
>argues that laughter is an unconscious reaction that has more to do with
>smoothing relationships than with stand-up comedy.
>
>Provine recorded 51 samples of natural laughter and studied them with a
>sound spectrograph. He found that a typical laugh is composed of expelled
>breaths chopped into short, vowel-like "laugh notes": ha, ho and he.
>
>Each laugh note lasted about one-fifteenth of a second, and the notes were
>spaced one-fifth of a second apart.
>
>In 2001, psychologists Jo-Anne Bachorowski of Vanderbilt University and
>Michael Owren of Cornell found more surprises when they recorded 1,024
>laughter episodes from college students watching the films Monty Python and
>the Holy Grail and When Harry Met Sally.
>
>Men tended to grunt and snort, while women generated more songlike laughter.
>When some subjects cracked up, they hit pitches in excess of 1,000 hertz,
>roughly high C for a soprano. And those were just the men.
>
>Even if scientists can make machines laugh, the larger question is how will
>humans react to machines capable of mirth and other emotions?
>
>"Laughter is such a powerful signal that you need to be cautious about its
>use," says Provine. "It's fun to laugh with your friends, but I don't think
>I'd like to have a machine laughing at me."
>
>To hear clips of synthesized laughter and speech, visit
>www.baltimoresun.com/computer
>
>The first computer speech synthesizer was created in the late 1960s by
>Japanese researchers. AT&T wasn't far behind. To hear how the technology
>sounded in its infancy, visit
>http://sal.shs.arizona.edu/~asaspeechcom/PartD.html
>
>Today's most natural sounding speech synthesizers are created using a
>technique called "concatenative synthesis," which starts with a prerecorded
>human voice that is chopped up into short segments and reassembled to form
>speech. To hear an example of what today's speech synthesizers can do, all
>you need to do is dial 411. Or visit this AT&T demo for its commercial
>speech synthesizer: http://www.naturalvoices.com/demos/
>
>Many researchers are now working on the next wave of voice technology,
>called expressive speech synthesis. Their goal: to make machines that can
>sound emotional. In the coming months, IBM will roll a new expressive speech
>technology. To hear an early demo, visit http://www.research.ibm.com/tts/
>
>For general information on speech synthesis research, visit
>http://www.aaai.org/AITopics/html/speech.html
>
>Copyright =A9 2004, The Baltimore Sun
>
>http://www.baltimoresun.com/news/health/bal-te.voice29nov29,1,550833.story=
>? coll=3Dbal-news-nation
... Creating implements of mass instruction.
Lloyd Rasmussen, Senior Staff Engineer
National Library Service f/t Blind and Physically Handicapped
Library of Congress (202) 707-0535 <http://www.loc.gov/nls/z3986>
HOME: <http://lras.home.sprynet.com>
The opinions expressed here are my own and do not necessarily represent
those of NLS.
This archive was generated by hypermail 2b29 : Sun Dec 02 2012 - 01:30:04 PST