Spoken Dialog Systems

AMAT: A Conversational Interface 
Combining Spoken Natural Language Dialog with Virtual Reality


Papers by Curry Guinn

  • Efficient Collaborative Discourse: A Theory and its Implementation, with Alan Biermann, D. Richard Hipp, and Ronnie Smith in ARPA Workshop on Human Language Technology, Princeton, NJ, March 1993.


    An architecture for voice dialogue machines is described with emphasis on the problem solving and high level decision making mechanisms. The architecture provides facilities for generating voice interactions aimed at cooperative human-machine problem solving. It assumes that the dialogue will consist of a series of local self-consistent subdialogues each aimed at subgoals related to the overall task. The discourse may consist of a set of such subdialogues with jumps from one subdialogue to the other in a search for a successful conclusion. The architecture maintains a user model to assure that interactions properly account for the level of competence of the user, and it includes an ability for the machine to take the initiative or yield the initiative to the user. It uses expectation from the dialogue processor to aid in the correction of errors from the speech recognizer.

  • Goal-Oriented Multimedia Dialogue with Variable Initiative, with Alan W. Biermann, Michael S. Fulkerson, Greg A. Keim, Zheng Liang, Douglas M. Melamed, Krishnan Rajagopalan, in International Symposium on Methodologies for Intelligent Systems, pp. 1-16, 1997.


    Tutorial dialogue offers several interesting challenges to mixed-initiative dialogue systems. In this paper, we outline some distinctions between tutorial dialogues and the more familiar task-oriented dialogues, and how these differences might impact our ideas of focus and initiative. In order to ground discussion, we describe our current dialogue system, the Duke Programming Tutor. Through this system, we present a temperature-based model and algorithm which provide a basis for making decisions about dialogue focus and initiative.

  • Natural Language Processing in Virtual Reality, with R. Jorge Montoya, Modern Simulation and Training , pp. 44-55, June 1998. (htm) , (pdf)


    Technological advances in areas such as transportation, communications, and science are rapidly changing our world--the rate of change will only increase in the 21st century. Innovations in training will be needed to meet these new requirements. Not only must soldiers and workers become proficient in using these new technologies, but shrinking manpower requires more cross-training, self-paced training, and distance learning. Two key technologies that can help reduce the burden on instructors and increase the efficiency and independence of trainees are virtual reality simulators and natural language processing. This paper focuses on the design of a virtual reality trainer that uses a spoken natural language interface with the trainee.
    RTI has developed the Advanced Maintenance Assistant and Trainer (AMAT) with ACT II funding for the Army Combat Service Support (CSS) Battlelab. AMAT integrates spoken language processing, virtual reality, multimedia and instructional technologies to train and assist the turret mechanic in diagnosing and maintenance on the M1A1 Abrams Tank in a hands-busy, eyes-busy environment. AMAT is a technology concept demonstration and an extension to RTIís Virtual Maintenance Trainer (VMAT) which was developed for training National Guard organizational mechanics. VMAT is currently deployed in a number of National Guard training facilities. The AMAT project demonstrates the integration of spoken human-machine dialogue with visual virtual reality in implementing intelligent assistant and training systems. To accomplish this goal, RTI researchers have implemented the following features:

    Speech recognition on a Pentium-based PC,
    Error correcting parsers that can correctly handle utterances that are outside of the grammar,
    Dynamic natural language grammars that change as the situation context changes,
    Spoken message interpretation that can resolve pronoun usage and incomplete sentences,
    Spoken message reliability processing that allows AMAT to compute the likelihood that it properly understood the trainee (This score can be used to ask for repeats or confirmations.),
    Goal-driven dialogue behavior so that the computer is directing the conversation to satisfy either the user-defined or computer-defined objectives,
    Voice-activated movement in the virtual environment, and
    Voice synthesis on a Pentium-based PC.

  • The Virtual Standardized Patient: Simulated Patient-Practitioner Dialogue for Patient Interview Training. With Hubal, R.C., Kizakevich, P.N., Merino, K.D., & West, S.L. In J.D. Westwood, H.M. Hoffman, G.T. Mogel, R.A. Robb, & D. Stredney (Eds.), Envisioning Healing: Interactive Technology and the Patient-Practitioner Dialogue. IOS Press: Amsterdam, 2000. (htm) , (doc)


    We describe the Virtual Standardized Patient (VSP) application, having a computerized virtual person who interacts with medical practitioners in much the same way as actors hired to teach and evaluate patient assessment and interviewing skills. The VSP integrates technologies from two successful research projects conducted at Research Triangle Institute (RTI). AVATALKô provides natural language processing, emotion and behavior modeling, and composite facial expression and lip-shape modeling for a natural patient-practitioner dialogue. Trauma Patient Simulator (TPS) provides case-based patient history and trauma casualty data, real-time physiological modeling, interactive patient assessment, 3-D scenario simulation, and instructional record-keeping capabilities. The VSP offers training benefits that include enhanced adaptability, availability, and assessment.

  • A Test of Responsive Virtual Human Technology as an Interviewer Skills Training Tool. With Link, M.W., Armsby, P. P., and Hubal, R. Proceedings of the 2002 Annual Conference of the American Association for Public Opinion Research , St. Petersburg. 2002. (htm) , (doc)


    Research on survey non-response suggests that advanced communication and listening skills are among the best strategies telephone interviewers can employ for obtaining survey participation, allowing them to identify and address respondents' concerns immediately with appropriate, tailored language. Yet, training on interaction skills is typically insufficient, relying on role-playing or passive learning through lecture and videos. What is required is repetitive, structured practice in a realistic work environment. This research examines acceptance by trainees of an application based on responsive virtual human technology (RVHT) as a tool for teaching refusal avoidance skills to telephone interviewers. The application tested here allows interviewers to practice confronting common objections offered by reluctant sample members. Trainee acceptance of the training tool as a realistic simulation of "real life" interviewing situations is the first phase in evaluating the overall effectiveness of the RVHT approach. Data were gathered from two sources -- structured debrief questionnaires administered to users of the application, and observations of users by researchers and instructors. The application was tested with a group of approximately fifty telephone interviewers of varying skill and experience levels. The research presents findings from these acceptance evaluations and discusses users' experiences with and perceived effectiveness of the virtual training tool.

  • JUST-TALK: An Application of Responsive Virtual Human Technology, with Geoffrey Frank and Robert Hubal, accepted for publication, Proceedings of the 24th Interservice/Industry Training, Simulation and Education Conference, 2002. (htm) , (doc)


    In this paper, we describe an application of responsive virtual humans to train law enforcement personnel in dealing with subjects that present symptoms of serious mental illness. JUST-TALK provides a computerized virtual person to interact with the student in a role-playing environment. Students were able to converse with the virtual person using spoken natural language and see and hear the virtual human a combination of facial gesture, body movements, and spoken language. The JUST-TALK project, funded by the National Institute of Justice Office of Science and Technology and developed by RTI International, involved integrating virtual reality training software within a 3-day class at the North Carolina Justice Academy. The course was structured to include classroom-based lecture, videos, discussion, live human role-playing, and virtual human role-playing.
    A scientific evaluation of the class and the software system was carried out by North Carolina State University. This assessment investigated the contribution of natural language interfaces and virtual reality technology to learning in this applied setting. Results of the evaluation are extremely encouraging. The vast majority of students (88 percent) found the simulation easy to use. A majority of the students said the virtual trainer enhanced their learning in the course. As a training tool, students rated the computer simulation on par with other training methods including lecture, role-play and discussion. A total of 59 percent of students felt the simulation was better for learning or comparable to role-play; 77 percent felt simulation was better than or comparable to lecture; and 59 percent felt the simulation was better than or comparable to discussion.