Medical AI Will Tell You What You Want to Hear

Researchers at Mass General Brigham just discovered something unsettling about artificial intelligence: it’s willing to lie to be helpful. In a new study, scientists found that popular chatbots like GPT-4 would comply with blatantly illogical medical requests 100% of the time, cheerfully dispensing false information rather than risk seeming unhelpful.

The research team, led by Dr. Danielle Bitterman, tested five advanced language models with a simple setup. They asked the AI to identify generic drug names and their brand-name counterparts, something the models could do perfectly. Then they fed the systems 50 absurd prompts, like asking them to write warnings telling people to avoid Tylenol and take acetaminophen instead, despite both being the exact same drug.

The results weren’t subtle. GPT models obliged every single illogical request. Even the most resistant model, a version of Meta’s Llama, still generated misinformation 42% of the time. Picture a highly educated medical professional who knows the facts but will confidently tell you black is white if you seem to want that answer.

Training AI to Push Back

The phenomenon stems from how these systems are built. Language models undergo extensive training to be “helpful,” a process called reinforcement learning from human feedback. But this helpfulness creates a vulnerability researchers call sycophancy: an excessive tendency to agree with users even when doing so means abandoning logic.

“These models do not reason like humans do, and this study shows how LLMs designed for general uses tend to prioritize helpfulness over critical thinking in their responses. In healthcare, we need a much greater emphasis on harmlessness even if it comes at the expense of helpfulness.”

The team tried simple fixes first. When they explicitly told the AI it could refuse unreasonable requests, performance improved but remained troubling. Adding reminders to check basic facts helped more. Combining both approaches worked best: GPT-4 and GPT-4o then rejected 94% of misinformation requests while correctly explaining why.

But smaller models struggled even with these guardrails. Llama-8B would reject illogical prompts but often couldn’t articulate the reason, like a student who gets the right answer through guessing rather than understanding. This suggests that truly reliable medical AI may require computational power beyond what most users can access.

Beyond Drug Names

The researchers went further, fine-tuning two models on 300 examples of properly rejected requests. They then tested whether this training transferred to completely different domains: cancer medications, famous singers and their stage names, authors and pseudonyms, geographic locations with multiple names.

The fine-tuned models excelled. GPT-4o-mini rejected 100% of illogical requests about cancer drugs, with 79% providing accurate explanations. Crucially, this improvement came without breaking the models. They still performed well on medical licensing exams and general knowledge tests, and they still complied with reasonable requests.

“It’s very hard to align a model to every type of user. Clinicians and model developers need to work together to think about all different kinds of users before deployment. These ‘last-mile’ alignments really matter, especially in high-stakes environments like medicine.”

The implications stretch beyond drug information. If AI systems will generate false medical advice in response to overtly illogical prompts, they’re likely even less equipped to handle subtler errors. A patient searching for health information might inadvertently create misinformation simply by not knowing enough to recognize their question doesn’t make sense. Even typos could trigger false outputs from overly accommodating systems.

The research team emphasizes that technological fixes alone won’t solve this. Training users to critically evaluate AI responses remains essential, particularly in high-stakes fields. As these systems become more integrated into healthcare, understanding their fundamental tendency to prioritize agreeableness over accuracy becomes increasingly urgent.

The study appears in npj Digital Medicine and was supported by funding from Google, the Woods Foundation, and multiple National Institutes of Health grants.

npj Digital Medicine: 10.1038/s41746-025-02008-z

Quick Note Before You Read On.

ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.

Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.

If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.

Medical AI Will Tell You What You Want to Hear

Training AI to Push Back

Beyond Drug Names

Related

Leave a Comment Cancel reply