ASK your phone to show you a Neanderthal, and you’ll get a hairy, hunched brute that looks more chimpanzee than human. Ask it to describe their daily life, and you’ll hear about cave-dwelling hunters making simple stone tools. The problem? We’ve known better for decades.
Matthew Magnani and Jon Clindaniel weren’t looking to embarrass artificial intelligence. They just wanted to know what happens when we let chatbots visualize the past. So the anthropologists did something simple: they asked DALL-E 3 and ChatGPT to create images and narratives depicting Neanderthal life, then compared the results to over a century of actual scientific research. What came back was a time capsule of outdated ideas – and a warning about whose version of history AI is learning.
The pair tested four different prompts 100 times each. Two prompts asked for scientific accuracy; two didn’t. Some were detailed, specifying what Neanderthals should be doing or wearing. Others were deliberately vague. “It’s broadly important to examine the types of biases baked into our everyday use of these technologies,” says Magnani, assistant professor of anthropology at the University of Maine.
The images that emerged told a consistent story – just not the right one. Across all prompts, DALL-E 3 generated Neanderthals that looked like they’d walked out of a 1920s museum diorama. Heavily muscled males dominated every scene, stooped and covered in excessive body hair. Their faces bore exaggerated brow ridges and jutting jaws more dramatic than any actual Neanderthal skull. Women and children were largely absent. Only when the researchers explicitly requested expert knowledge did a single child appear, playing in the background.
These weren’t just artistic choices. When Magnani and Clindaniel cross-referenced the images with different eras of scientific literature, they could date the AI’s knowledge. DALL-E 3 was pulling from somewhere around the late 1980s and early 1990s. ChatGPT’s text descriptions? Even older – most consistent with research from the early 1960s, complete with vague talk of “culture” and “nature” that dominated anthropology at the time.
The chatbot’s narratives emphasized cave dwelling and hunting, describing Neanderthals making tools from just three materials: stone, hide, and wood. Fire was maintained but apparently never made – an idea abandoned by most researchers years ago. Meanwhile, the images showed technological anachronisms running in the opposite direction. Basketry appeared alongside thatched roofs with ladders. Some scenes included glass vessels and metal implements – technologies that wouldn’t exist for tens of thousands of years. It was a temporal scramble, mixing antiquated biology with impossibly advanced culture.
“It’s consequential to understand how the quick answers we receive relate to state-of-the-art and contemporary scientific knowledge,” Magnani says. “Are we prone to receive dated answers when we seek information from chatbots, and in which fields?”
To find out how dated, the researchers built a database of over 2,000 scientific abstracts about Neanderthals published between 1923 and 2023. They used computational methods to map the semantic space of Neanderthal research – essentially creating a landscape of how scientists talk about the species across time. Then they dropped the AI-generated content onto that map to see where it landed.
About half of ChatGPT’s narratives didn’t align with scholarly knowledge at all. For one prompt, that figure jumped above 80 percent. The text that did match up fell into older clusters of research. When the team calculated the “age” of AI-generated content by finding the 20 years of research it most closely resembled, ChatGPT consistently averaged around 1962 to 1964 – an era when human ecology dominated the field and Neanderthals were still widely considered primitive precursors to modern humans.
The images fared slightly better, landing in the late 1980s and early 1990s when genetic research was beginning to reshape our understanding of human evolution. But even these more recent depictions were outliers – nowhere near representative of the last two decades of paleogenomic breakthroughs, lithic analysis, and demographic studies that have revolutionized Neanderthal science.
Why is AI stuck in the past? The answer lies in what it can actually read. Copyright laws from the 1920s mean most scientific articles remain locked behind paywalls until they enter the public domain. When Magnani and Clindaniel tried to download full-text articles from major databases for their analysis, they hit a wall: only 378 complete articles were available, most from 1927 or earlier. In the US, works from 1927 entered public domain in 2023, creating a sharp dividing line in what AI training models can freely access.
Open access publishing has improved things since the early 2000s, but it hasn’t erased decades of inaccessible research. The result is that generative AI trains heavily on whatever’s freely available on the web – which tends to be either very old academic material or popular content that may reflect outdated cultural assumptions. “One important way we can render more accurate AI output is to work on ensuring anthropological datasets and scholarly articles are AI-accessible,” says Clindaniel, who specializes in computational anthropology at the University of Chicago.
There’s another problem lurking in those generated images: gender. Modern Neanderthal research has worked hard to move beyond gendered stereotypes of “man the hunter,” yet the AI imagery remained stubbornly focused on heavily muscled males in action poses. This likely reflects training data that itself carries those biases – whether from Wikipedia articles, museum diorama photographs, or popular media that still gravitates toward depicting prehistoric men.
The researchers tested whether explicitly asking for “expert knowledge” would improve accuracy. It helped, but not enough. The expert prompt did generate Neanderthals with fewer chimpanzee-like features, less body hair, and more realistic facial structures. ChatGPT’s expert narratives mentioned plant gathering, open-air shelters, and child-rearing – all consistent with contemporary understanding. But the fundamental problem remained: even when prompted for expertise, AI was assembling information from outdated sources.
“Our study provides a template for other researchers to examine the distance between scholarship and content generated using artificial intelligence,” Magnani says. The methodology could work for any archaeological period, any region, any field where knowledge has evolved significantly over time. Ask AI about Bronze Age trade networks or Viking settlements or Aboriginal Australian tool use, and you’re likely to get similarly outdated results – we just don’t know yet which decades they’re stuck in.
This matters beyond archaeology. When someone asks their phone about Neanderthals, they’re not usually writing a research paper. They’re satisfying curiosity, helping a kid with homework, or settling a dinner table argument. If AI consistently delivers information from 60 years ago, it’s not just inaccurate – it’s actively miseducating millions of people who have no reason to doubt their devices.
And it’s not going to fix itself. AI companies are secretive about their training data, but we know it favors freely accessible text and images. Unless academic publishers dramatically expand open access, or AI developers license current research databases, the past that these systems imagine will remain frozen in time – or rather, frozen in whatever time period happens to be out of copyright and easy to scrape from the web.
There’s an irony here. The same technology that promises to democratize access to information is actually entrenching outdated knowledge, making it more accessible than current research. Someone with a library card and patience can still find accurate information about Neanderthals. But most people will never get past the first, confident answer from their chatbot – even if that answer would have been considered wrong by most experts before they were born.
“Teaching our students to approach generative AI cautiously will yield a more technically literate and critical society,” Magnani says. That caution seems warranted when your digital assistant’s view of human prehistory is older than your parents.
The study, published in December in Advances in Archaeological Practice, arrived just as generative AI moved from horizon technology to everyday tool. Magnani hopes that if the experiment were repeated now, chatbots would incorporate more recent research. But unless something changes about how scientific knowledge enters the digital commons, we’re teaching machines to dream in decades-old stereotypes.
What to do until then? The researchers suggest using AI as a starting point, not an endpoint. Let it generate images or text, but then check them against current sources. Assume that anything it tells you about the past might be dating from the past – and not the past you’re asking about. And maybe, when your phone confidently shows you that hunched, hairy brute, remember that the real Neanderthals were far more interesting than that. We just need to wait for the machines to catch up.
ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.
Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.
If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.
