Why ChatGPT Can't Handle Holocaust Stories

When Cornell University historian Jan Burzlaff fed Holocaust survivor testimony into ChatGPT, the AI missed something crucial. In the account of seven-year-old Luisa D., who fled into Polish forests in 1942, her mother had cut her own finger with a stone, spreading blood on the child’s lips so she could absorb moisture during extreme thirst. The AI summary completely overlooked this intimate, desperate act of maternal love.

This omission reveals a fundamental gap in artificial intelligence’s ability to process historical trauma, according to Burzlaff’s research published today in the journal Rethinking History. His findings suggest that while AI can efficiently summarize historical events, it struggles with the emotional and moral complexity that defines human experience.

“If historical writing can be done by a machine, then it was never historical enough,” Burzlaff states in his paper. The postdoctoral associate in Cornell’s Jewish Studies Program tested ChatGPT against five Holocaust testimonies recorded in 1995 across La Paz, Krakow, and Connecticut.

The Pattern Recognition Problem

AI excels at identifying patterns and creating coherent narratives from historical data. Students in Burzlaff’s experimental course consistently praised ChatGPT’s efficiency in organizing testimonial content into clear themes like “childhood in hiding” and “wartime trauma.” But this strength becomes a weakness when confronting accounts that resist neat categorization.

Consider Samuel W.’s testimony from Krakow, where he recalled receiving 25 lashes as punishment in a concentration camp. “I did not groan, not even once,” he remembered. “Even the German officer liked that and praised me in German afterwards.” The AI could process the facts but missed the psychological complexity of a man being brutalized, praised for his stoicism, and recalling the moment decades later without visible emotion.

“Essentially it ignored the extent these individuals suffered on an emotional level,” Burzlaff states.

The research comes as Microsoft recently ranked historians among jobs most susceptible to AI replacement. Yet Burzlaff argues that AI’s limitations with Holocaust testimony – representing extreme human suffering – suggest it will struggle even more with subtler historical narratives.

Beyond Technical Limitations

The issue extends beyond technical capabilities to fundamental questions about how we remember and interpret the past. AI systems generate content by predicting likely word sequences based on training data, but historical understanding requires grappling with contradiction, silence, and ethical weight.

Jacobo B.’s testimony exemplifies this challenge. Speaking from La Paz, he described hiding a ring under his tongue in a concentration camp, knowing discovery meant death. His memories jumped between Krakow, the forests of Eastern Europe, and Austrian Alps in an associative rush that defied linear narrative. “I was B2060,” he recalled of his prisoner number. “Nothing else.”

When AI processed such accounts, it converted complex emotional landscapes into digestible categories. The technology could identify themes but couldn’t inhabit the difficulty of representing behaviors under extreme duress.

“They summarize but do not listen, reproduce but do not interpret, and excel at coherence but falter at contradiction,” Burzlaff writes.

This limitation reflects broader concerns about AI’s role in education and public discourse. As generative AI tools become ubiquitous in classrooms and research, historians face pressure to distinguish their craft from machine-generated content.

Burzlaff’s paper offers five principles for historical writing in the AI age: prioritize interpretation over description, create rather than reproduce content, use large datasets without becoming one, refuse algorithmic approaches to ethics, and write as a person rather than responding to prompts.

The research highlights a crucial tension in our digital age. While AI can process vast amounts of historical data quickly, it cannot replace the human capacity for ethical engagement with the past. The technology operates through pattern recognition and statistical prediction, but history requires dwelling with ambiguity and moral complexity.

For educators and historians, these findings suggest AI should complement rather than replace human interpretation. The technology might help identify patterns across large archives, but the work of making meaning from those patterns remains distinctly human.

The implications extend beyond academic history to public memory and education. As AI-generated content proliferates, maintaining spaces for complex, contradictory, and unresolved historical narratives becomes increasingly important. The past, Burzlaff suggests, deserves better than algorithmic smoothing.

His research ultimately poses a question that extends far beyond Holocaust studies: in an age of artificial intelligence, what aspects of human experience resist automation? The answer may lie not in what machines can do, but in what they cannot – and should not – attempt.

Rethinking History: 10.1080/13642529.2025.2546174

Quick Note Before You Read On.

ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.

Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.

If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.

Why ChatGPT Can’t Handle Holocaust Stories

The Pattern Recognition Problem

Beyond Technical Limitations

Related

Leave a Comment Cancel reply