AI Tool Predicts Disease Outbreaks Using ChatGPT Technology

Scientists have developed an artificial intelligence system that uses the same technology behind ChatGPT to predict infectious disease outbreaks, marking a shift from traditional mathematical models to reasoning-based forecasting.

The tool, called PandemicLLM, outperformed existing state-of-the-art methods in predicting COVID-19 hospitalizations across all 50 U.S. states over 19 months of testing.

Beyond Number Crunching

Unlike conventional prediction models that rely purely on mathematical calculations, the new system treats disease forecasting as a reasoning problem. It processes multiple types of information simultaneously, including infection data, government policies, demographic information, and genetic surveillance of virus variants.

“COVID-19 elucidated the challenge of predicting disease spread due to the interplay of complex factors that were constantly changing,” said Lauren Gardner of Johns Hopkins, who led the research team. “When conditions were stable the models were fine. However, when new variants emerged or policies changed, we were terrible at predicting the outcomes because we didn’t have the modeling capabilities to include critical types of information. The new tool fills this gap.”

Real-Time Adaptation to New Threats

One of the system’s most promising features is its ability to incorporate new information about emerging virus variants without requiring complete retraining. During testing, researchers fed the model information about the SARS-CoV-2 BQ.1 variant that became dominant in late 2022. The system’s performance improved by 28% when it had access to this real-time genetic information.

What makes this particularly significant is the speed of response. Traditional forecasting models struggle when biological or social conditions change rapidly. But the AI system can process text descriptions of new variant characteristics and immediately factor them into its predictions.

Confidence Levels Guide Decision-Making

Rather than producing precise numerical forecasts that often prove unreliable, the system categorizes future trends into five levels: substantial decrease, moderate decrease, stable, moderate increase, and substantial increase. Importantly, it also provides confidence levels for each prediction.

Testing showed that when the system expressed high confidence (85% or above), its accuracy reached 75% for one-week forecasts and 77% for three-week forecasts. This reliability measure could prove crucial for public health officials making resource allocation decisions.

Learning from Pandemic Limitations

The COVID-19 pandemic exposed serious gaps in existing forecasting capabilities. Traditional models performed reasonably well during stable periods but failed dramatically when new variants emerged or policies changed. The researchers specifically designed their system to address these shortcomings.

“A pressing challenge in disease prediction is trying to figure out what drives surges in infections and hospitalizations,” Gardner explained, “and to build these new information streams into the modeling.”

The system processes four distinct data streams:

State demographics, healthcare capacity, and political characteristics
Disease surveillance data including cases, hospitalizations, and vaccination rates
Government policy information such as mask mandates and gathering restrictions
Genetic surveillance data tracking virus variants and their characteristics

Frozen Parameters Boost Efficiency

To make the system more practical for real-world deployment, the researchers discovered they could “freeze” most of the underlying language model parameters during training. This approach maintained prediction accuracy while dramatically reducing computational requirements – a critical consideration for health departments with limited technical resources.

The research, conducted with support from the National Science Foundation and Centers for Disease Control and Prevention, also revealed that different types of information become more or less important during different phases of an outbreak. For instance, policy and demographic factors proved more influential during periods of increasing infections, while epidemiological trends dominated during stable or declining phases.

Future Disease Preparedness

“Traditionally we use the past to predict the future,” said co-author Hao “Frank” Yang, a Johns Hopkins assistant professor specializing in reliable AI. “But that doesn’t give the model sufficient information to understand and predict what’s happening. Instead, this framework uses new types of real-time information.”

The researchers envision adapting their approach for other infectious diseases including influenza, bird flu, and respiratory syncytial virus. With proper data streams, the system could potentially provide early warning for various types of outbreaks.

Gardner emphasized the broader implications: “We know from COVID-19 that we need better tools so that we can inform more effective policies. There will be another pandemic, and these types of frameworks will be crucial for supporting public health response.”

The study appears in Nature Computational Science, representing a new approach to infectious disease surveillance that combines artificial intelligence reasoning with traditional epidemiological data to create more robust and adaptable forecasting tools.

Quick Note Before You Read On.

ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.

Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.

If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.