About Us

About Us

Objective

We propose to organize a two-day workshop, on Oct. 7-8, 2003 at Georgia Institute
of Technology in Atlanta, Georgia, to bring together leaders from US academia,
industry and government to discuss new approaches to accelerating automatic
speech recognition (ASR) technology advances. To focus our discussion, the symposium
will be “by invitation only” and extended to just scores of potential participants.
At the conclusion of the meeting, all the presentation materials will be collated
and a report will be generated and disseminated to the research community and
other stakeholders. Objectives of the proposed workshop include:

  • Assessing the capabilities and limitations of state-of-the-art ASR technologies
  • Sharing new ideas and discussing innovative approaches to multi-lingual
    ASR
  • Conceptualizing cyber-infrastructure approaches to education in speech scienceand
    technology
  • Discussing ways to facilitate collaborative research among all relevant
    disciplines, including:

    1. A sharable platform roadmap with plug-‘n’-play features to lower ASR
      entry barriers
    2. An objective evaluation methodology for developing sharable ASR components

[top]

Background

At the dawn of the 21st Century the automatic speech recognition community
is at a crossroad. On one hand, we have learned a great deal about how to build
practical speech recognition systems for almost any spoken language without
the need of a detailed understanding of the language. Data-driven machine learning
techniques, such as hidden Markov model (HMM) and artificial neural network
(ANN), are prevalent, with several research and software development packages
available to the general public. Advances in hardware, algorithms and data structure
have made implementation of large vocabulary continuous speech recognition system
affordable. Unfortunately, these systems are often restrictive, in that their
users have to follow a strict set of protocols in order to effectively use spoken
applications. The technology is also too fragile that the performance of an
ASR system usually degrades dramatically in adverse environments to an extent
that the technology becomes not usable even for cooperative users. When compared
with human speech recognition (HSR), the state-of-the-art ASR systems often
give much larger error rates even for simple recognition tasks operating in
clean environments. In noisy conditions such as those in moving vehicles, ASR
may produce error rates two orders of magnitude higher than HSR. Such a performance
gap is unacceptable for many application designers and users. Thus, there remain
a number of fundamental questions about ASR technology that the research community
must address. These questions offer significant opportunities for creative approaches
to ASR to both educators and students. As speech recognition technology is critical
to realize the promise of universal access to information for all people, these
challenges also provide funding agencies in different countries with strong
incentives to make additional investments in research and education in this
area.

[top]

Themes

The workshop theme will be centered on integrating multi-disciplinary sources
of knowledge, from acoustics, speech, linguistics, cognitive science, signal
processing, human computer interaction, and computer science, into every stage
of ASR component and system design. The key to success lies in the development
of sharable software platforms and objective evaluation methodologies that will
allow every group, big or small, to develop its own system and contribute to
the overall success of ASR. In past, and still today, only a handful of academic
groups have been able to participate in ASR research and build ASR systems.
With open architecture platforms, even individual investigators will be able
to conduct meaningful research in focused areas and make significant contributions
under this plug-‘n’-play framework. This approach to research and development
will lower ASR entry barriers and increase the talent pool in human language
and communication research by enabling a diverse community of researchers, including
underrepresented minorities, to participate through collaborative projects and
programs. Objective evaluation methodologies commonly adopted in the ASR community,
combined with shared language resources, would also be established to help researchers
assess their progress among peers, an excellent education scenario for students
and researchers. This combination of open platform and objective evaluation
paradigm could offer new and exciting collaborative opportunities for scientists
and engineers to work together to improve our understanding of speech and eventually
bridge the performance gaps between ASR and human speech recognition. This is
expected to be a community effort geared towards establishing a collaborative
and multidisciplinary ASR Community of the 21st Century.

[top]