Students Trust ChatGPT Too Much. What About Everyone Else?
ChatGPT’s wrong answers may only be problematic to the extent that they’re believed or shared. But a new paper by Kenneth Church, Professor of the Practice at Khoury College of Computer Sciences, finds college students trust ChatGPT’s responses to a variety of prompts enough to turn them in as homework assignments, even when they include numerous factual errors.
Church asked students to use ChatGPT, web search, and other research methods to complete a variety of tasks, from deciphering obscure metaphors to characterizing the different viewpoints of the 19th century Opium Wars. Despite previous discussions in class about ChatGPT’s unreliability, the students submitted information that was incorrect and included numerous oversimplifications of complex subjects.
“My students could have easily fact-checked their homework, but they chose not to do so,” Church writes in his paper, published in the journal Natural Language Engineering. “They were prepared to believe much of what ChatGPT says, because of how it says what it says and its ease-of-use. It is easier to believe ChatGPT than to be skeptical.”
To Church, the results offer a warning for society about the willingness of people to forego their own due diligence when a quick answer is always one prompt away.
"We need to be worried about how ordinary people are going to interact with these machines," Church says. "The machine might be right a lot, but if it misleads you even occasionally, that's a real problem."
When To Trust ChatGPT
The study involved college students in Church’s natural language processing class. They were tasked with creating essays and completing various other assignments, including explaining metaphors and creating outlines, quotes, and research paper references.
ChatGPT proved to be a reliable tool for explaining metaphors and creating certain simple programs. But he found ChatGPT was "amazingly bad" at tasks like doing a literature survey on a topic and providing accurate references. A number of students submitted references that did not exist or that linked to the wrong papers. Checking those references would have alerted students to the issue.
"I had hoped that the students would do more fact-checking than they did, especially after having discussed machine ‘hallucinations’ in class, but users do not do as much fact-checking as they should," Church writes. "Perhaps the court of public opinion needs to increase the penalties for trafficking in misinformation."
Beyond the assignments, the most troubling finding to Church was the submission of essays that lacked depth and perspective. For the essay assignment on the First Opium War, Church also provided students with a YouTube video by an academic scholar discussing the competing opinions around the conflict. Most students seemed to ignore the nuanced discussion in favor of using ChatGPT's output more or less word for word.
"What I was trying to get at with the examples about the Opium Wars is this is a complicated set of facts that has at least six different perspectives to understand," Church says. "What I'm worried about is that it seems like ChatGPT doesn't really have a good understanding of depth and perspective. Oversimplifying things and only taking one view could end up being really bad for the world."
The Answers We Ask For
The results indicate a key limitation of today’s chatbots that users need to account for.
“Intelligent people can disagree about things,” Church says. “It seemed ChatGPT was incapable of realizing there could be two views, or comparing and contrasting those views. The chatbots will give you one side and imply it’s the only side.”
Many people find faults in ChatGPT’s answers and point fingers at its parent company OpenAI. Church takes a broader perspective.
“With misinformation we often blame the suppliers, but I think it’s important to think about both parties: suppliers and consumers,” Church says.
Church’s approach is informed by the human-centric philosophy of the Institute, where we define experiential AI as AI with a human in the loop. The Institute’s research considers how humans interact with these tools and our solutions merge the different strengths of human and machine intelligence.
“A lot of people are evaluating the machine on its own," Church says. "I'm actually more interested in the combination of people and machines. People are thinking of ChatGPT as getting right and wrong answers, but we need to evaluate it in the context of how the users are using it."
Indeed, much has been made about ChatGPT’s factual accuracy. But maybe we should be more concerned with its response to questions that have no right answers — only conflicting viewpoints with nuanced interpretations that should be evaluated independently. In such instances, errors are harder to catch, and users should bring a healthy skepticism to ChatGPT’s outputs in a similar manner to how they might interpret answers from a Wikipedia page or a partisan blog, even if chatbots respond with authority.
When you consider the user’s responsibility, the problems get more complex. But Church thinks that perspective gets us closer to a realistic solution.
“It seems like [using ChatGPT] is encouraging people who are quite smart and capable to believe a really oversimplified sound bite of what's going on in the world," Church says. "Maybe that's a statement against the world. Should I blame the suppliers, should I blame the incentives, or should I blame the consumers? It’s like junk food. Is it my fault I happen to like salt and sugar? If that’s what I’m going to buy then that’s what they’re going to sell me. Maybe we need guard rails in place to protect us from our worst instincts.”
Learn more about "Better Together: Text + Context,” a research project led by Church and sign up for our newsletter “In the AI Loop” for monthly stories and trends in AI.