Ask an AI image generator to picture Washington D.C. and it will probably give you the Mall, the monuments, the Potomac glinting in afternoon light. Ask it to picture Blacksburg, Virginia, a college town of roughly 45,000 people tucked into the Appalachian highlands, and something goes wrong. The mountains might be there, the green spaces broadly right. But there will be no Hokie Stone on the university buildings. There may be a lake that doesn’t exist. The image will look, as Junghwan Kim puts it, generic. “It didn’t capture what makes Blacksburg unique,” he says. That observation, straightforward enough, turned out to contain a research question that goes well beyond one embarrassed AI and one overlooked college town.
Kim, a geospatial data scientist at Virginia Tech’s College of Natural Resources and Environment, has now quantified the gap between what AI imagines about large cities and what it imagines about smaller ones. The numbers are pointed.
His team, working with colleagues from Hong Kong University of Science and Technology and the University of Alabama, asked 129 participants to rate AI-generated images of four Virginia localities: Blacksburg, Richmond (population 220,000), Virginia Beach (450,000), and Washington D.C. The images were generated using OpenAI’s DALL-E 2 and structured around the five urban elements identified by mid-century city planner Kevin Lynch: landmarks, districts, paths, edges, and nodes. Participants scored each image on two dimensions, how realistic it looked and how faithfully it captured the place’s identity. Those two scores were combined into a single alignment figure, with 10 representing a perfect match and 2 essentially a generic fiction.
For landmarks, the gap was stark. Blacksburg’s landmark images scored an average of 5.0. The landmark images of the three large cities averaged 7.07. The difference was statistically significant; in plain terms, the AI simply knew more about the big places. For the other four urban elements (districts, paths, edges, nodes) the differences largely disappeared, which is its own finding, and one worth sitting with for a moment.
Where the AI Falls Flat
Landmarks, it turns out, are where AI image generators earn their credibility or lose it. They are the most culturally specific, the most locally distinctive, the hardest to fake with generic scenery. A district can look vaguely like many districts. A path is a path. But a landmark either exists or it doesn’t, and residents know the difference instantly. In Blacksburg, the giveaways were conspicuous: buildings with odd shapes that don’t correspond to any real structure, a lake-like feature in the upper corner of one image, and the consistent absence of Hokie Stone, the distinctive dolomite used throughout Virginia Tech’s campus. These weren’t subtle errors. They were the errors of a system that had learned from too little local data to have any real idea what it was depicting.
The study found something else that cuts to the core of why this matters. Participants who had lived in Blacksburg for five or more years rated the AI images more harshly than newer residents. Not slightly more harshly. Significantly so. The researchers interpreted this as a function of accumulated local knowledge; the longer you have lived somewhere, the sharper your internal reference point becomes, the more quickly you notice when a generated image is confabulating. “People are increasingly relying on AI-generated content to learn about places,” Kim said. “If smaller cities are not well represented in the data used to train these systems, then the images people see may not reflect the real identity of those communities.” The corollary is implicit but uncomfortable: the people least equipped to notice the errors are precisely those who have never lived there, the newcomers, the visitors, the people using AI to decide whether somewhere is worth a trip.
This is, perhaps, more serious than it sounds. AI-generated imagery is already being used in travel planning, urban design pitch decks, marketing materials, and public consultations about development. A planning authority in a small town might use these tools to generate visualisations for community meetings; those visualisations might depict a town that doesn’t quite exist. The research adds a new dimension to an existing body of work documenting what some researchers call the “uneven geography” of AI capabilities, the consistent finding that generative AI performs better on tasks involving densely documented, heavily photographed, widely covered places. Earlier work found that AI imagery aligned more closely with human perceptions of Brussels and Amsterdam than of Tokyo and Seoul. A study of Stockholm found that AI reproduced mainly tourist and commercial districts, flattening a city into its most photographed angles.
The Data Behind the Image
The mechanism isn’t mysterious. “AI systems learn from enormous amounts of online data,” Kim said. “Larger cities tend to have far more images, media coverage, and digital documentation available online. Smaller towns often do not have the same level of representation.” What gets trained in is what gets generated. Washington D.C. has the Lincoln Memorial photographed from roughly every angle in roughly every light; it has been the backdrop for a century of news footage, tourist snapshots, and civic imagery. Blacksburg has a campus, a downtown, a set of distinctive local buildings, and an Appalachian horizon, but these appear in far fewer images in the datasets that train systems like DALL-E. The result is that the AI knows the big city as a specific place, and knows the small town as a category. A college town, perhaps, somewhere mountainous. Something like that.
There are limits to what this study can claim. The sample of 129 participants, recruited largely through Virginia Tech, skews young and male, and the images were generated using DALL-E 2, a model that has since been superseded by considerably more capable systems. It’s at least plausible that newer models, trained on larger and more diverse datasets, have narrowed the gap somewhat. And the study asked only about visual realism and visual identity, not the deeper emotional or social dimensions of what it means to belong to a place.
But there is a harder question lurking behind the methodological caveats. If AI-generated imagery consistently underperforms for smaller communities, and if those communities are also the ones with the fewest resources to commission conventional professional visualisations, then the technology risks amplifying exactly the inequalities it might theoretically help to reduce. Urban planners in major metropolitan areas have access to rich digital infrastructure, professional rendering tools, and well-trained AI systems that produce broadly recognisable outputs. Their counterparts in smaller towns get something that looks like a generic placeholder. “Generative AI can be a powerful tool,” Kim said. “But we also need to understand where it falls short and who may be left out.”
There is a pleasing irony in the methodology: the study recruited local residents to evaluate the images precisely because they would notice what the AI got wrong. Which suggests a possible part of the solution. Systems trained partly on the assessments of people who actually live in the places being depicted would have a correction mechanism that purely data-driven approaches lack. Whether AI developers will build that kind of local feedback into training pipelines is a different question. The incentives, currently, do not obviously point that way. The places with the loudest digital voice are, as always, the places that already had one.
https://doi.org/10.1016/j.techsoc.2026.103360
Frequently Asked Questions
Why are AI image generators better at depicting big cities than small towns?
It comes down to training data. Larger cities are vastly more photographed, documented, and covered in news and social media, so the datasets used to train image generators contain far more examples of what those places look like. Small towns appear comparatively rarely, which means the AI learns a generic approximation rather than a specific place. The gap is most visible when depicting landmarks, which are the most locally distinctive features of any city or town.
Could people who’ve never visited a place even tell when an AI image gets it wrong?
The study suggests not reliably. Participants who had lived in Blacksburg for five or more years were significantly more critical of AI-generated images than newer residents, pointing to the role of accumulated local knowledge. Someone unfamiliar with a place lacks the internal reference point needed to spot the confabulations: the non-existent lake, the oddly shaped building, the absent Hokie Stone. This is one reason AI geographic bias matters in practice; the people most likely to be misled are those with no other way to check.
Is this problem getting better as AI models improve?
Possibly, but the fundamental issue is structural rather than technical. The Virginia Tech study used DALL-E 2, which has been superseded by more capable models, and newer systems may perform somewhat better. However, unless the underlying training datasets are made more geographically representative, improved models may still reproduce the same urban-rural skew. The gap reflects the uneven distribution of digital documentation of places, not just the limitations of any one model generation.
What are the real-world consequences of AI getting small towns wrong?
AI-generated imagery is already used in urban planning visualisations, travel planning, marketing, and public consultations. If the images produced for smaller communities are systematically less accurate, those communities may end up with planning and promotional materials that misrepresent them, and residents may find it harder to push back without professional alternatives. Researchers have flagged this as an equity issue: the places with the fewest resources to commission conventional visualisations are also the ones AI depicts least faithfully.
Could involving local residents in AI training help fix the problem?
The Virginia Tech team suggests this is at least part of the answer. Their study recruited local residents specifically because those people had the ground-truth knowledge to identify what AI got wrong. A training pipeline that incorporated structured feedback from people who live in the places being depicted could, in principle, correct the biases that purely data-driven approaches miss. Whether AI developers will prioritise that kind of community-level input is an open question, and one that researchers are pushing the field to take seriously.
Discover more from NeuroEdge
Subscribe to get the latest posts sent to your email.