People can find a hundred ways to say the same thing, which poses a challenge to robots that are expected to keep up their end of conversations. A Disney Research team’s solution is to devise an automated method of crowdsourcing multiple lines of dialogue. After all, “hello” is a perfectly fine greeting, but not every time you see someone.
The team developed a persistent interactive personality, or PIP, that can translate high-level goals and variables into simple narratives, effectively summarizing situations it will find itself in. PIP then autonomously presents the descriptions to crowd workers to elicit appropriate speech for the context. These workers are each tasked with writing a single line of character dialogue, or describing a non-verbal behavior associated with a line of dialogue. Other crowd workers then evaluate the lines, suggesting which ones should be discarded as nonsense and scoring the ones judged acceptable.
The method can quickly generate a wide variety of expressions that are meaningful in the particular situations the robot will find itself in, said Jill Fain Lehman, senior research scientist at Disney Research, and provides a way of incrementally and continually updating the robot’s dialogue.
The researchers will present the method at the International Conference on Multimodal Interaction Nov. 12-16 in Tokyo.
For instance, if a robot named Martin is running a trivia contest, such a narrative might say, “The player has answered most of the questions right so far. This time, the player answers wrong. Martin says…” Possible answers could be: “Sorry, you broke your streak” or “Nice try, but that’s incorrect” or “You’re still doing pretty well overall.”
Some crowd workers are tasked with judging if the answer makes sense and scoring it for quality. Still other crowd workers may be asked to suggest which words in a reply should be emphasized, or if the robot should display an expression of surprise, sadness, or other emotion.
Lehman calls this method “semi-situated learning,” because even though the elicited lines are context-specific, the robot’s state at the time the line is uttered may include more information than the narrative expressed. Nevertheless, the language behavior tends to be meaningful in the moment.
In a case study of the trivia contest example that used crowd workers from Amazon Mechanical Turk, PIP elicited 680 crowd-sourced lines of language behavior. A total of 48 were eliminated either by other crowd workers or by PIP on the basis of average score. Combined with 84 sentences created by the researchers to provide game instructions and error-handling behaviors, the result was 716 lines of dialogue.
The different variations of dialogue were then programmed into a robot quizmaster who was stationed in an office environment and at two public events. Players at the public events heard almost unique language each time they played, but the office workers were much more likely to play the game many times and were more likely to hear the same lines repeated. Despite some office players competing more than 30 times, no one ever heard exactly the same dialogue twice.
“We didn’t expect people to like the quiz game quite so much,” Lehman said. “PIP can notice if language with a particular user is getting repetitive; if we had let PIP use this feature, it would have updated its own dialogue model in response.”
Ultimately, she added, plans are to use PIP to support a much broader set of interactions than the trivia game.
Combining creativity and innovation, this research continues Disney’s rich legacy of inventing new ways to tell great stories and leveraging technology required to build the future of entertainment.