| |
Modeling Disfluencies in Spontaneous Speech
Funding Information
- Sponsor: National Science Foundation (NSF)
- NSF program: Interactive Systems
- NSF program official: Dr. Gary W. Strong
- Grant No.: IRI-9314967
- Award Period: February 14, 1994 - February 28, 1998
Principal Investigator
Co-Investigators
Project Summary
Spoken language is the medium used first and foremost by humans for
accurate and efficient interactive problem solving. As an input
modality for human-computer interaction, spoken language can offer:
(1) accessibility to an increasing number of people, including those
with little or no training, (2) increased access to a growing set
of data resources via telephone without a computer terminal,
(3) increased power for those already familiar with computer technology,
(4) an additional communication channel for more robust communication,
for use in unusual environments, and for devices for the disabled,
(5) flexibility of modality and use of computers by humans generally, and
(6) increased applications and job opportunities in areas that will
grow out of increased exposure of people to the potential of
technology.
Although there has been significant work devoted to some
spontaneous speech phenomena, such as "slips of the tongue," other
much more frequent types of spontaneous speech "disfluencies" have
been largely ignored, e.g., false starts, hesitations, filled pauses
and related phenomena. Such disfluencies are highly prevalent in
normal human communication. Although disfluencies are less frequent in
human-machine dialog, the causes and costs (e.g., in terms of
cognitive load on the user) of this discrepancy are unknown. Further,
because current speech understanding systems do not model disfluencies
well, when they do occur, they are correlated with speech recognition
and understanding errors. As spoken language systems evolve to allow
more natural human-machine dialogue, the rate of disfluencies is
likely to rise to rates closer to those observed in human-machine
communication. A better understanding of the interdisciplinary
aspects of disfluencies is critical to the development of a principled
treatment of these highly frequent attributes of spontaneous speech.
This project models disfluencies at lexical, syntactic, and
acoustic-prosodic levels. The goal is to gain insight into human
communication, and to develop algorithms to robustly recognize speech
that includes disfluencies. The approach involves analysis of
disfluencies in existing, digitized corpora and in speech collected in
controlled experiments. The investigation is undertaken by a team
representing expertise in different, complementary disciplines,
including linguistics, psycholinguistics, and cognitive psychology.
As the project enters its final phase, recent efforts at SRI have
investigated how results of the descriptive research can be integrated
in SRI's speech understanding system. In particular SRI has developed
methods for automatically detecting disfluencies, using
acoustic-prosodic information combined with specialized language
models. Related studies at Stanford have focused on syntactic
properties of disfluencies and on functional aspects. Additional
related work at MIT aims to understand the articulatory mechanisms
involved in self-interruption, as well as the relationship between
speech errors and sentence prosody.
Collaborators
- Becky Bates, Boston University
- John Bear, SRI International
- Laura Dilley, MIT
- Jean Fox Tree, University of California at Santa Cruz
- Astrid Hagen, Univ. Erlangen / MIT
- Gerald McRoberts, Stanford University
- Mari Ostendorf, Boston University
- Ken Stevens, MIT
- Andreas Stolcke, SRI International
- Tom Wasow, Stanford University
Reports
|
|