Using virtual reality to make people better public sepakers

Which presentations bring an audience to the edge of its seats — or send them into a snooze? USC Viterbi research project creates a virtual audience for training public speakers

Even for the greatest orators, a polished performance requires practice and feedback. But sharpening a speech in front of a crowd or even a close friend can induce anxiety in almost anyone, and even for those interested in rehearsing in front of real people, a willing group is not always available.

Enter Cicero, an interactive virtual audience solution being developed by researchers at the USC Institute for Creative Technologies and the USC Viterbi School of Engineering. Named for the Roman rhetorician, Cicero combines machine-learning models and Toastmasters tips to automatically evaluate a person’s delivery and provide constructive critiques for improvement.

“We’ve all had the experience practicing a presentation in front of mirrors or empty chairs,” said Stefan Scherer, co-leader of the effort and research assistant professor at ICT and the USC Viterbi Department of Computer Science. “But in order to get better, you need audience feedback, including non-verbal signals like nodding heads or downcast eyes that tell you if you are doing well or not. The goal of this project is to give people that feedback before it’s too late.”

Slump or soar?

To begin that process, Scherer and project co-leader Louis-Philippe Morency, director of ICT’s MultiComp Lab and a research assistant professor at USC Viterbi, made a science out of studying public speaking, compiling what characteristics studies and expert elocutionists have determined will put the members of an audience on the edge of their chairs and what will send them slumping in their seats.

Next, they brought in study subjects who gave speeches in front of a static virtual audience. Researchers recorded their performances, tracking components of the presenters’ speech, gaze and body movement, and measuring and monitoring more than 20 non-verbal characteristics associated with good or bad speaking performances.

The focus was on style, not substance. The team did not address the content of what people said — that aspect might be added later — but rather looked and listened for the way in which the speech was delivered. Were voices monotone or did inflections change? Did people speak in a breathy whisper or with a strong timbre? Did they make the most of the space on the stage, direct their eyes to specific people, wave their arms or clasp their hands?

“These are all measurable factors that go into determining whether a performance is effective or not,” Morency said. “People make these calculations automatically, and what we discovered is that computers can be taught to do the same.”

Qualities of effective speeches

In results presented at the International Conference on Intelligent Virtual Agents, the researchers reported that the initial Cicero prototype recognized properties of effective speeches, including strong voice quality, eye contact and gesturing, nearly as accurately as trained Toastmasters who had volunteered to appraise the talks.

The evaluative engine driving Cicero is MultiSense. Developed by ICT research programmer Giota Stratou, MultiSense can instantly quantify facial expressions, posture and speech patterns. The framework, combined with simple cameras, microphones and a Microsoft Kinect sensor, can automatically analyze people’s gestures, voices, eye contact and facial expressions to provide intelligent feedback that helps them improve their performances in public speaking.

In SimSensei, another ICT research project, MultiSense automatically assesses non-verbal behaviors associated with depression and allows a virtual interviewer to respond appropriately.

In the case of Cicero, the researchers’ next challenge is to combine MultiSense and SmartBody, a character-animation system overseen by ICT research scientist and Cicero co-investigator Ari Shapiro. SmartBody determines individualized feedback behaviors for each member of the virtual audience — behaviors driven by the practice performance and informed by learning strategies designed to effect positive change.

“We are realizing we don’t need to model everything a real audience would do,” Morency said. “Rather than have a virtual listener quietly fall asleep, we might have them shift their body and cough to signal to the speaker that people seem bored. If a speaker avoids eye contact, we might have an audience member clear his or her throat to get the presenter’s attention.”

Audience engagement

The team is currently conducting a study with 60 people who each give a presentation. Some people get no feedback, while others receive feedback in the form of green or red color bars that indicate levels of audience engagement. A third group gets its feedback from an interactive virtual audience. After receiving feedback (or not), each person presents again. After a comparison, the team hopes to determine which form of feedback is most effective.

Aside from helping to improve speechmakers’ skills, this phase of the Cicero study aims to advance ICT’s existing research on developing interactive virtual humans, including improving how these characters move, listen, react and perceive as they communicate with real people. The researchers are also using this project to better understand how to implement effective computer-delivered instruction, provide automatic assessments and model individualized behaviors for diverse groups of virtual humans.

ICT has long specialized in training systems to improve interpersonal skills. Cicero is sponsored in part by the U.S. Army to encourage the development of leaders who are confident speaking in front of a crowd. The National Science Foundation also provides funding. The team sees other potential applications that can improve how people present themselves and inform future human-computer interaction research, like preparing politicians for press conferences or job candidates for group interviews.

If Morency and Scherer succeed in making it easier for people across professions to more clearly get their points across, even the most apprehensive of announcers might be prompted to propose a toast.

But, the researchers cautioned, practice, whether with a virtual audience or a real one, is not the only factor when it comes to delivering a crowd-pleasing presentation.

“People project more confidence when they are enthusiastic about the message they want to deliver,” Scherer said. “There may be people who have plenty of training, but they don’t believe in what they are saying.”

And that may be the most valuable feedback of all.


Substack subscription form sign up