A POSSIBLE BREAKTHROUGH IN SYNTHETIC SPEECH QUALITY?

From: Brian Buhrow (buhrow@cats.ucsc.edu)
Date: Tue Mar 11 1997 - 10:28:52 PST


Date: Wed, 05 Mar 1997 22:05:55 -0500
To: ip-sub-1@majordomo.pobox.com
>From: David Farber <farber@cis.upenn.edu>

Sort of fun djf

FYI: This Comprehensive, Multi-Lingual Interactive TTS Web Site Features
German, English, Russian, Chinese, French, Romanian, Italian, and
Spanish Demos
___________

Can your PC actually speak your written text? The answer, if you
visit the newly-announced Bell Labs Text-to-Speech (TTS) Web site
(www.bell-labs.com/project/tts/), is a resounding yes. Bell Labs,
the research and development arm of Lucent Technologies -- which has
led the industry in speech synthesis research over the past seven
decades -- has designed the most comprehensive, interactive
multi-lingual TTS site on the Web, which allows users to produce
natural speech in several languages (German, French, Mexican,
Romanian, Chinese, Russian, English, Italian) directly from written
text.
   "Speech is a sound signal used for language communication. Bell
Labs researchers, world experts in signal processing, speech
modeling, and text analysis, have devoted decades of work to
improving the quality of synthetic speech," said Joe Olive, head of
the Language Modeling Research Department at Bell Labs.
   "The applications for text-to-speech synthesis are various, and
growing, and they include e-mail readers, voice-response systems, and
automatic order-verification systems, said Olive. "Such systems will
require the high-quality word pronunciation and speech
intelligibility that Bell Labs TTS systems deliver."
   The Bell Labs TTS system converts written text to speech through
sophisticated linguistic analysis, prosodic modeling, and speech
synthesis. Representations of language, typed on a users PC, are
enriched by phrasing information, intonation and stress. This
information is then generated by means of sophisticated machine
synthesis into clearly articulated speech.
   The Bell Labs TTS system handles not only English, but has expanded
its language set to include French, Italian, German, Russian,
Japanese, Mandarin and Taiwanese Chinese, Spanish (Peninsular and
Mexican), and Romanian.

SPEAKING ON THE WEB

   Visitors to the Bell Labs Text-to-Speech Synthesis Web site at
http://www.bell-labs.com/project/tts/, can sample speech in up to
nine different languages, as well as visit a demonstration area that
allows users to synthesize English, German, and Mandarin Chinese
sentences using male, female, and child intonations with effects such
as raspiness. The site offers the experience of high-quality
interactive, on-the-fly modifications of voice samples.
   The Bell Labs TTS system even handles German noun compounds, which
are notorious for being long and complex, and which cannot be
prestored in a dictionary.
   According to Jens Nagel, an engineer with Mannesmann Autocom in
Duesseldorf, Germany, "the (Bell Labs TTS system) is currently the
qualitatively best TTS engine for German, in terms of overall
intelligibility and especially the pronunciation of names."
   "Said Michael Tanenblatt, chief designer of the Web site, "the Bell
Labs TTS Web site represents our strong feeling that TTS systems have
started to play an important role in everyday communications. Users
can explore a range of fun, intriguing applications on our site,
which will increase the understanding of the role that TTS systems
play in bridging communications barriers."
   Ongoing developments in the Bell Labs Multimedia Communication
Research Laboratory include a "talking face" which uses the Bell
Labs TTS system to provide a visual personality to a computer. These
future developments will be reflected on the Bell Labs TTS Web site.
   "The average PC user has a new way to explore the future of speech
synthesis, as reflected on the Bell Labs TTS site," said James
Flanagan, director of the Center for Computer Aids for Industrial
Productivity, at Rutgers University, a 1996 National Medal of Science
winner and a leading expert in speech communications. "I applaud the
efforts of Bell Labs in providing such a stunning resource for speech
synthesis on the Web."

BELL LABS TTS SYSTEM: HOW IT WORKS
   Currently, all Bell Labs systems for speech synthesis are
concatenative. This means that natural speech segments are selected
and stored in an acoustic inventory.
   These speech segments are units known as diphones, which contain
transitions between adjacent phonetic segments. The Bell Labs TTS
system determines and selects elements of speech which minimize
discrepancies between adjacent segments. During synthesis, the TTS
system smoothly concatenates, or links, the stored elements.
   Lucent Technologies designs, builds and delivers a wide range of
public and private networks, communications systems and software,
consumer and business telephone systems and microelectronics
components.
   Bell Labs is the research and development arm for the company. It
is a worldwide R&D community of nearly 24,000 people. About 20,000
of them work in Lucents business units, developing products, systems
and technologies that keep Lucent competitive in its markets. About
4,000 Bell Labs people work in central labs, pursuing research and
developing advanced technologies on behalf of all the Lucent
operating units.
   Lucent Technologies was formed as a result of AT&Ts restructuring
and became a fully independent company separate from AT&T on
September 30, 1996.

BELL LABS TEXT-TO-SPEECH SYNTHESIS: THEN AND NOW

BELL LABS AND "TALKING MACHINES"

   Bell Labs first demonstrated an electronic speech synthesis device,
the "Voder," developed by H.W. Dudley, at the 1939 Worlds Fair.
The New York Times declared, in describing the machines operation,
"My God, it talks." This early analog system was the forerunner of
Bell Labs work in articulatory synthesis, conducted by Cecil Coker in
the 1960s, and Joe Olives work on concatenative synthesis in the
1970s.

BELL LABS: WHERE "HAL" FIRST SPOKE

   One of the more famous moments in Bell Labs synthetic speech
research was the sample created by John L. Kelly in 1962. Kellys
vocoder synthesizer recreated the song "Bicycle Built for Two" on
the ILIAC, with musical accompaniment from Max Mathews. Arthur C.
Clarke, then visiting friend and colleague John Pierce at the Bell
Labs Murray Hill facility, saw this remarkable demonstration and
later used it in the climactic scene of his novel and screenplay for
"2001: A Space Odyssey," where the HAL9000 computer sings this song
as he is disassembled by astronaut Dave Bowman.
   Joe Olive, recognized as the leading expert in text-to-speech
synthesis, recently contributed a chapter, "The Talking Computer:
Text to Speech Synthesis," to the book "HALs Legacy: 2001s Computer
as Dream and Reality," (M.I.T. Press, 1996), edited by David Stork.

LUCENT PRODUCTS AND THE BELL LABS TTS SYSTEM

   The Bell Labs TTS system is currently used in the product offerings
of several Lucent business units. The Lucent Business Communications
Systems Intuity Conversant integrated voice and information
processing system uses TTS signal processing cards for applications
that include, among others, an e-mail reader.
   Bell Labs TTS is also integrated in the Lucent Network Systems
Speech Processing Solutions offers. Here, TTS allows companies to
build applications such as voice dialing, voice-activated response
systems, or reservations centers using the AYC speech boards.
   For information on availability of the PC version of Bell Labs TTS
system, contact John Holmgren, business development manager, at
908-949-8864.
For more information about Bell Labs TTS, call the Multimedia
Communications Research Laboratory at 908-582-6435. For more
information about Lucent Speech Processing Solutions, call
800-772-5785. For more information about the Lucent Intuity
Conversant system, call 800-247-7000.
WORLD WIDE WEB SITE: http://www.bell-labs.com/project/tts/

________________



This archive was generated by hypermail 2b29 : Sun Dec 02 2012 - 01:30:04 PST