Doctors at UC San Francisco have figured out how to extract diagnostic hunches from the messiest part of the medical record: the free text notes clinicians scribble during hospital admissions. Their target is hepatorenal syndrome, a severe and often fatal form of kidney failure that strikes people with advanced liver disease and is maddeningly difficult to catch early. By feeding 314 patients’ admission notes into a large language model, they tested whether AI could quantify the collective judgment buried in clinical prose and predict who would ultimately get the diagnosis.
The study, published in Gastro Hep Advances, examined whether GPT-4o could convert narrative admission notes into a numeric sentiment score running from 0 to 1, representing how strongly the care team appeared to suspect hepatorenal syndrome. When researchers combined these sentiment scores and additional NLP-extracted terms with traditional clinical variables such as creatinine, albumin, bilirubin, and sodium levels, prediction accuracy jumped. The model’s AUROC climbed from 0.63 using structured data alone to 0.79 when AI-derived insights were added, a shift that could translate to earlier interventions and better outcomes.
Borrowing Technology From Amazon Reviews
The approach adapts a technique more familiar in consumer technology than in medicine. Sentiment analysis is the engine behind those five-star summaries on shopping platforms, and the researchers wondered whether the same principle could reveal the diagnostic instincts embedded in clinical documentation. GPT-4o sifted through admission notes, identifying key phrases related to ascites, diuretic intolerance, albumin administration, and other context-rich clues that clinicians record but structured databases typically miss. The model essentially listened to what doctors were already saying, then translated that into a number.
“The concept is inspired by sentiment analysis technology commonly used with reviews in online shopping platforms, where AI summarizes collective opinions,” said Jin Ge, MD, MBA. “We utilized this approach to determine if collective sentiment could predict an HRS diagnosis.”
Hepatorenal syndrome is a diagnosis of exclusion, one that depends heavily on clinical judgment, which means conflicting interpretations are common and can delay treatment. Multiple specialists often weigh in, residents update notes, and attending physicians revise impressions, creating a cacophony of opinions that rarely gets synthesized into a single coherent assessment. The AI-generated sentiment score offers a snapshot of how the entire care team collectively views a patient’s risk at the time of admission, cutting through the noise.
When the Crowd Is Wiser Than Any Single Doctor
The study suggests that AI could serve as a translator of clinical consensus in cases where providers do not fully agree. By synthesizing notes across specialties and training levels, the technology reveals patterns that individual clinicians might miss or discount. It is not replacing human judgment so much as aggregating it, turning scattered observations into a measurable signal. In theory, this could help align care decisions and expedite treatment plans, especially in cases where uncertainty reigns.
“Using the ‘wisdom of the crowd’ does not just predict outcomes, it offers a directional insight into what the clinical care team collectively thinks about a patient’s condition,” said Ge. “For cases with mixed opinions or uncertainty, AI-generated summaries could help align care decisions and expedite treatment plans.”
Although not yet implemented in clinical practice, the researchers plan to test the technology in real-world settings. They propose that similar sentiment systems could eventually support diagnostic decision-making, generate summary pages when many consultants contribute to care, or even assist with capturing complex patient characteristics that rarely show up in structured data fields. The approach might be especially useful for diagnoses like hepatorenal syndrome, where clinical suspicion often precedes laboratory confirmation and where the difference between early and late treatment can determine whether a patient survives long enough for a liver transplant.
For now, the study demonstrates that generative AI can unlock information long buried in narrative notes, revealing diagnostic insights that clinicians already hold but rarely have the tools to quantify. The wisdom was always there, just scattered across dozens of documents and hidden in plain sight.
Gastro Hep Advances: 10.1016/j.gastha.2025.100797
ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.
Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.
If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.