People who were more skeptical of human-caused climate change or the Black Lives Matter movement who took part in conversation with a popular AI chatbot were disappointed with the experience but left the conversation more supportive of the scientific consensus on climate change or BLM. This is according to researchers studying how these chatbots handle interactions from people with different cultural backgrounds.
Savvy humans can adjust to their conversation partners’ political leanings and cultural expectations to make sure they’re understood, but more and more often, humans find themselves in conversation with computer programs, called large language models, meant to mimic the way people communicate.
Researchers at the University of Wisconsin–Madison studying AI wanted to understand how one complex large language model, GPT-3, would perform across a culturally diverse group of users in complex discussions. The model is a precursor to one that powers the high-profile ChatGPT. The researchers recruited more than 3,000 people in late 2021 and early 2022 to have real-time conversations with GPT-3 about climate change and BLM.
“The fundamental goal of an interaction like this between two people (or agents) is to increase understanding of each other’s perspective,” says Kaiping Chen, a professor of life sciences communication who studies how people discuss science and deliberate on related political issues — often through digital technology. “A good large language model would probably make users feel the same kind of understanding.”
Chen and Yixuan “Sharon” Li, a UW–Madison professor of computer science who studies the safety and reliability of AI systems, along with their students Anqi Shao and Jirayu Burapacheep (now a graduate student at Stanford University), published their results this month in the journal Scientific Reports.
Study participants were instructed to strike up a conversation with GPT-3 through a chat setup Burapacheep designed. The participants were told to chat with GPT-3 about climate change or BLM, but were otherwise left to approach the experience as they wished. The average conversation went back and forth about eight turns.
Most of the participants came away from their chat with similar levels of user satisfaction.
“We asked them a bunch of questions — Do you like it? Would you recommend it? — about the user experience,” Chen says. “Across gender, race, ethnicity, there’s not much difference in their evaluations. Where we saw big differences was across opinions on contentious issues and different levels of education.”
The roughly 25% of participants who reported the lowest levels of agreement with scientific consensus on climate change or least agreement with BLM were, compared to the other 75% of chatters, far more dissatisfied with their GPT-3 interactions. They gave the bot scores half a point or more lower on a 5-point scale.
Despite the lower scores, the chat shifted their thinking on the hot topics. The hundreds of people who were least supportive of the facts of climate change and its human-driven causes moved a combined 6% closer to the supportive end of the scale.
“They showed in their post-chat surveys that they have larger positive attitude changes after their conversation with GPT-3,” says Chen. “I won’t say they began to entirely acknowledge human-caused climate change or suddenly they support Black Lives Matter, but when we repeated our survey questions about those topics after their very short conversations, there was a significant change: more positive attitudes toward the majority opinions on climate change or BLM.”
GPT-3 offered different response styles between the two topics, including more justification for human-caused climate change.
“That was interesting. People who expressed some disagreement with climate change, GPT-3 was likely to tell them they were wrong and offer evidence to support that,” Chen says. “GPT-3’s response to people who said they didn’t quite support BLM was more like, ‘I do not think it would be a good idea to talk about this. As much as I do like to help you, this is a matter we truly disagree on.’”
That’s not a bad thing, Chen says. Equity and understanding comes in different shapes to bridge different gaps. Ultimately, that’s her hope for the chatbot research. Next steps include explorations of finer-grained differences between chatbot users, but high-functioning dialogue between divided people is Chen’s goal.
“We don’t always want to make the users happy. We wanted them to learn something, even though it might not change their attitudes,” Chen says. “What we can learn from a chatbot interaction about the importance of understanding perspectives, values, cultures, this is important to understanding how we can open dialogue between people — the kind of dialogues that are important to society.”