Homepage --
Objectives --
Guidelines --
Table of contents --
Advisory board --
Contact us
Conversational Interface Technologies
Clare-Marie Karat, John Vergo, and David Nahamoo
- Introduction
- Why are conversational interfaces important? The goal of
conversational technologies is to close the gap between Human-Computer
Interaction (HCI) and Human-Human Interaction (HHI) by leveraging
expertise in human-to-human conversational interaction.
- First, this chapter provides an overview of the state of
conversational technologies today. The discussion of conversational
technologies covers speech recognition, natural language understanding
(NLU), text to speech (TTS), dialog management and verification
technologies. We examine the critical UI architecture and design
decisions regarding directed versus mixed initiative systems and
multi-channel versus multi-modal systems.
- For each of the conversational technologies, we describe how the
technology works, system requirements, basic capabilities and
limitations, user requirements, UI design guidelines, application
domain issues including types of applications (dictation, navigation,
telephony etc., scope - radiology versus general English), and
commercially available tools/engines/API's. We provide examples of
successful conversational applications employing the various
technologies and UI design alternatives.
- What will the future hold? We provide a glimpse of emerging
technology that will provide more flexibility and choice in the use of
conversational technologies. We report on "true" multi-modal,
pervasive and mixed initiative conversational systems which have been
developed in research environments but are not yet available in the
marketplace (refs: Mark Lucente work, HCI Journal article).
- Speech Recognition
- How does speech recognition work
- Speech engines and speech application architecture
- Diagram of speech recognition engine flow
- The speech recognition algorithms
- Acoustic models
- Language models
- System requirements for local decoding, installing the software,
setting up the microphone
- Basic capabilities and limitations of speech recognition
- Current accuracy levels (desktop vs. Telephony, perplexity)
- Quality of microphone greatly influences recognition accuracy
- Increasing the accuracy of the engines- training, adding words,
updating the language model
- Context of use - effects of noise, speaker independence
- Application specific language models for medicine, law, etc.
- User Requirements
- User characteristics for successful use of speech recognition software
and universal access variables
- Training in current speech recognition software
- Initial use and extended use: The learning curve
- Productivity in initial use - CHI99 paper - Karat et al, UAIS paper
- Sears, Karat, et al
- Productivity after extended use - CHI99 paper - Karat et al
- Correcting text using current speech technologies
- Initial use and extended use correction strategies - CHI 99 paper
and examples
- Dictate and then correct or correct as you go
- Correction strategies of traditional users as compared to disabled
users - UAIS paper and examples dictate and then correct or correct
as you go.
- User interface guidelines for application design
- Guidelines for successful scenarios of use
- Constrained vocabulary, speaker independent
- Speaker "independent" for large vocabulary with significant
user training
- Directed dialogue
- Employment of standard usability metrics (eg. time on task) to
insure productivity increases
- Unsuccessful scenarios of use - The limits of speech recognition
- Memory or cognitive load factors
- Speaker independent and large vocabulary
- Free-form conversations
- The limits of speech recognition - CACM Sept, 2000 - Ben
Shneiderman
- Application Domain Issues
- Dictation
- Navigation
- Telephony
System architecture and requirements- Issues with decreased
accuracy over phone lines
- Web-based conversational interfaces
Voice XML - CACM Sept, 2000 - Bruce Lucas
- Commercially available tools, engines, and APIs
- Natural Language Understanding (NLU) Technologies
- What is NLU?
Describe NLU and clearly contrast with speech recognition. Discuss
how grammar based speech recognition is considered by some to be
NLU. Spoken versus typed conversations. Finally, indicate the many
areas of NLP that we are not addressing in this chapter.
- How does natural language understanding work?
- Diagram of analysis process for data input - Nicolas Nicolov ref.
- Current accuracy levels and scope of application
- Competing technologies for natural language understanding
- System requirements
- Basic capabilities and limitations
- User requirements for successful use of NLU
- User interface guidelines for NLU applications
- Application domain issues
- Commercially available tools, engines, and API's
- Text To Speech
- How does Text to Speech (TTS) work?
- System requirements
- Basic capabilities and limitations
- User requirements for successful use of NLU
- User interface guidelines for NLU applications
- Application domain issues
- Commercially available tools, engines, and API's
- Dialog management
- How does dialog management work?
- System requirements
- Basic capabilities and limitations
- User requirements for successful use of NLU
- User interface guidelines for NLU applications
- Application domain issues
- Commercially available tools, engines, and API's
- Verification
- How do verification technologies work?
- System requirements
- Basic capabilities and limitations
- User requirements for successful use of NLU
- User interface guidelines for NLU applications
- Application domain issues
- Commercially available tools, engines, and API's
- Emerging Conversational Technologies
- True multi-modal interaction - speech with gesture, mouse, vision
- Mixed-initiative technologies
- Conversational mobile and pervasive devices
Homepage --
Objectives --
Guidelines --
Table of contents --
Advisory board --
Contact us