A team of researchers at the University of California San Diego has developed a new artificial intelligence tool that can learn to read medical images using only a fraction of the data typically required. The system, called GenSeg, dramatically cuts down the amount of expert-labeled scans needed to train diagnostic models, reducing data demands by up to 20 times. The breakthrough could help bring powerful imaging tools to hospitals and clinics with fewer resources, where annotated datasets are often scarce.
Learning From Just a Handful of Examples
Medical image segmentation, where each pixel in a scan is labeled as healthy or diseased tissue, is a cornerstone of many diagnostic tasks. Traditionally, training AI to perform segmentation has required thousands of pixel-by-pixel annotated images. But creating these datasets is expensive and time-consuming, often requiring highly trained specialists.
“Creating such datasets demands expert labor, time and cost,” said Li Zhang, lead author and PhD student in electrical and computer engineering at UC San Diego. “For many medical conditions, that level of data simply does not exist.”
GenSeg changes the game. It can learn from as few as 40 labeled images and still match or outperform standard methods trained on hundreds. The key is how it learns. GenSeg doesn’t just consume data, it generates it, smartly.
How It Works
GenSeg operates in stages. First, it learns how to generate realistic images from expert-labeled segmentation masks. Then it creates synthetic image-mask pairs to supplement the small real-world dataset. These combined examples are used to train a segmentation model. Through a feedback loop, the system tweaks its image generation based on how well the model performs.
“The segmentation performance itself guides the data generation process,” Zhang explained. “This ensures that the synthetic data are not just realistic, but also specifically tailored to improve the model’s segmentation capabilities.”
What It Can Do
GenSeg was tested on 19 datasets across a wide range of imaging types and medical conditions. It learned to identify:
- Skin lesions from dermoscopy images
- Breast cancer from ultrasound
- Foot ulcers from standard photos
- Polyps from colonoscopy scans
- Lungs from chest X-rays
- Placental vessels from fetoscopic images
In these tests, GenSeg often required just 40 to 100 labeled examples. In lung segmentation, for instance, it matched performance typically achieved with 175 examples using only 9. That is a 19-fold efficiency boost.
Outperforming the State of the Art
Beyond efficiency, GenSeg beat out several leading data augmentation tools and semi-supervised methods, including nnUNet, WGAN, and mutual correction frameworks. In both in-domain (same dataset) and out-of-domain (different dataset) tests, it consistently delivered better results.
One reason is that traditional data augmentation and semi-supervised methods treat data generation and model training as separate steps. GenSeg, by contrast, integrates both in a multi-level optimization loop. This makes its synthetic data more useful, not just more realistic.
Flexible and Scalable
The team showed that GenSeg works with various segmentation backbones, from UNet to DeepLab to Transformer-based SwinUnet. It also performs well in both 2D and 3D medical image segmentation tasks, such as hippocampus and liver scans.
While designed for low-data scenarios, GenSeg also improves results when large datasets are available. It can be dropped into existing workflows, and because it doesn’t alter the model architecture, it doesn’t increase the computational burden during diagnosis.
What Comes Next
The researchers aim to further refine GenSeg’s synthetic data generation, particularly for anatomically complex or variable cases. They also plan to incorporate direct feedback from clinicians to tailor training data more closely to real-world diagnostic needs.
“This project was born from the need to break this bottleneck and make powerful segmentation tools more practical and accessible,” said Zhang.
If successful, GenSeg could help democratize access to AI-assisted diagnostics, especially in resource-limited settings where labeled imaging data is hard to come by. In an age of data scarcity and rising healthcare costs, that is a powerful proposition.
Journal and Funding
Journal: Nature Communications
DOI: 10.1038/s41467-025-61754-6
Title: Generative AI enables medical image segmentation in ultra low-data regimes
Authors: Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie
Published: July 14, 2025
Discover more from NeuroEdge
Subscribe to get the latest posts sent to your email.