In the high-stakes world of brain surgery, a pathologist’s snap judgment can determine whether a patient walks out with their tumor removed or heads straight to chemotherapy instead. Get it wrong, and you’ve either carved out healthy brain tissue unnecessarily or left dangerous cells behind. Now, an AI system called PICTURE is stepping into operating rooms with a singular talent: telling apart two aggressive cancers that even seasoned experts confuse nearly 40 percent of the time.

The challenge sounds straightforward until you peer through a microscope. Glioblastoma, the most common and lethal brain tumor, often masquerades as primary central nervous system lymphoma under standard staining techniques. Both cancers appear in the brain. Both show irregular cell patterns and necrotic tissue. Both demand immediate decisions. But they require opposite treatments. Glioblastoma needs surgical removal. PCNSL responds better to radiation and chemotherapy, making aggressive surgery not just unnecessary but potentially harmful.

A research team at Harvard Medical School developed PICTURE (Pathology Image Characterization Tool with Uncertainty-aware Rapid Evaluations) by training it on 2,141 pathology slides collected from hospitals across four countries. The system demonstrated better than 98 percent accuracy in distinguishing the two cancer types, maintaining that performance across five independent international patient groups.

“Our model can minimize errors in diagnosis by distinguishing between tumors with overlapping features and help clinicians determine the best course of treatment based on a tumor’s true identity.”

That quote from Kun-Hsing Yu, the study’s senior author and associate professor of biomedical informatics at HMS, undersells what makes PICTURE unusual among AI diagnostic tools. Most machine learning systems operate like aggressive students who always have an answer, even when they should admit confusion. PICTURE breaks that pattern with an “uncertainty detector” that flags tumors it hasn’t encountered before and marks them for human review.

The Frozen Section Problem

During brain tumor operations, surgeons typically extract tissue samples for rapid microscopic evaluation. Pathologists freeze these samples in liquid nitrogen, slice them thin, and examine them within roughly 15 minutes. The freezing process distorts cellular features somewhat, creating a diagnostic challenge even under ideal circumstances. Based on this preliminary assessment, surgeons decide whether to continue removing tissue or halt the procedure.

About one in 20 initial diagnoses changes upon more detailed examination over subsequent days, when pathologists can work with properly fixed tissue samples. That 5 percent error rate carries serious consequences. Misdiagnose PCNSL as glioblastoma, and you’ve subjected a patient to unnecessary brain surgery. Mistake glioblastoma for PCNSL, and you’ve left aggressive cancer cells behind.

The research team tested PICTURE’s performance specifically on frozen sections, the kind of samples that matter most for real-time surgical decisions. The system correctly identified cancers with accuracy ranging from 92.4 to 98.7 percent across different hospital sites, outperforming both human pathologists and other AI models on the most diagnostically challenging cases.

In a particularly revealing experiment, nine board-certified neuropathologists from multiple continents evaluated 40 curated cases under 90-second time constraints that mirror actual clinical pressure. Their agreement with confirmed diagnoses varied widely, with accuracy scores between 49 and 87 percent. PCNSL samples were misclassified as glioblastoma in 38 percent of all expert evaluations. PICTURE correctly identified every single case the pathologists found most difficult.

Teaching Machines to Say “I Don’t Know”

The World Health Organization recognizes 109 distinct types of brain and spinal cord tumors. Many occur so rarely that even specialized pathologists encounter them only a handful of times across entire careers. Traditional AI systems trained to distinguish glioblastoma from PCNSL would force every unknown sample into one of those two categories, potentially missing rare cancers entirely.

PICTURE addresses this through what the researchers call out-of-distribution detection. When the system analyzed 874 samples from 67 different types of central nervous system cancers not included in its training data, plus 33 samples of normal brain tissue, it correctly flagged 91.9 percent as belonging to neither glioblastoma nor PCNSL. The tool essentially learned to recognize when it was looking at something unfamiliar.

“Our model shows reliable performance on frozen sections during brain surgery and in scenarios with significant diagnostic disagreement among human experts.”

The system identifies specific tissue features that distinguish the two cancers. For glioblastoma, it focuses on dense tumor regions, microvascular proliferation, pseudopalisading necrosis, and infiltrating cells. For PCNSL, it highlights areas with high cellular density showing typical lymphoid morphology. Importantly, PICTURE marks regions with surgical artifacts, bleeding, or ambiguous features as low-confidence areas, preventing overconfident misdiagnosis.

The research team envisions deploying PICTURE as an initial screening layer in pathology departments, particularly valuable for medical centers without ready access to specialized neuropathologists. The system processes a complete whole-slide image in under one minute, including tissue segmentation, feature extraction using multiple foundation models, and final diagnostic inference.

More than 300,000 people worldwide receive diagnoses of brain or central nervous system tumors each year, leading to over 200,000 deaths. The current study assembled cases predominantly from white patients, and the researchers acknowledge the need for additional validation across diverse populations before widespread clinical deployment. Future development could expand the tool to recognize additional cancer types and integrate genetic and molecular data alongside visual pathology features.

For now, PICTURE represents something increasingly rare in medical AI: a system designed not to replace human judgment but to augment it precisely where human expertise proves most fallible. By knowing when it doesn’t know, the tool might prove more trustworthy than systems that never admit uncertainty.

Nature Communications: 10.1038/s41467-025-64249-6

