Carnegie Mellon researchers develop method to help computer vision systems decipher outdoor scenes

PITTSBURGH — Computer vision systems can struggle to make sense of a single image, but a new method devised by computer scientists at Carnegie Mellon University enables computers to gain a deeper understanding of an image by reasoning about the physical constraints of the scene.

In much the same way that a child might use a set of toy building blocks to assemble something that looks like a building depicted on the cover of the toy set, the computer would analyze an outdoor scene by using virtual blocks to build a three-dimensional approximation of the image that makes sense based on volume and mass.

“When people look at a photo, they understand that the scene is geometrically constrained,” said Abhinav Gupta, a post-doctoral fellow in CMU’s Robotics Institute. “We know that buildings aren’t infinitely thin, that most towers do not lean, and that heavy objects require support. It might not be possible to know the three-dimensional size and shape of all the objects in the photo, but we can narrow the possibilities. In the same way, if a computer can replicate an image, block by block, it can better understand the scene.”

This novel approach to automated scene analysis could eventually be used to understand not only the objects in a scene, but the spaces in between them and what might lie behind areas obscured by objects in the foreground, said Alexei A. Efros, associate professor of robotics and computer science at CMU. That level of detail would be important, for instance, if a robot needed to plan a route where it might walk, he noted.

Gupta presented the research, which he conducted with Efros and Robotics Professor Martial Hebert, at the European Conference on Computer Vision, Sept. 5-11 in Crete, Greece.

Understanding outdoor scenes remains one of the great challenges of artificial intelligence. One approach has been to identify features of a scene, such as buildings, roads and cars, but this provides no understanding of the geometry of the scene, such as the location of walkable surfaces. Another approach, which Hebert and Efros pioneered with former student Derek Hoiem, now of the University of Illinois, Urbana-Champaign, has been to map the planar surfaces of an image to create a rough 3-D depiction of an image, similar to a pop-up book. But that approach can lead to depictions that are highly unlikely and sometimes physically impossible.

In the new method devised by Gupta, Efros and Hebert, the image is first broken into various segments corresponding to objects in the image. Once the ground and sky are identified, other segments are assigned potential geometric shapes. The shapes also are categorized as light or heavy, depending on appearance; a surface that appears to be a brick wall, for instance, would be classified as heavy.

The computer then attempts to reconstruct the image using the virtual blocks. If a heavy block appears unsupported, the computer must substitute an appropriately shaped block, or make assumptions that the original block was obscured in the original image.

Gupta said because this qualitative volumetric approach to scene understanding is so new, no established datasets or evaluation methodologies exist for it. He said in estimating the layout of surfaces, other than sky and ground, the method is better than 70 percent accurate, and its performance is almost as good when comparing its segmentation to ground truth. Overall, Gupta assesses the analysis as very good for 30 to 40 percent of the images and adequate for another 20 to 30 percent.

This work was supported by the National Science Foundation and a Guggenheim Fellowship to Efros. The Robotics Institute is part of Carnegie Mellon’s School of Computer Science. Follow the school on Twitter @SCSatCMU.

About Carnegie Mellon University: Carnegie Mellon ( is a private, internationally ranked research university with programs in areas ranging from science, technology and business, to public policy, the humanities and the arts. More than 11,000 students in the university’s seven schools and colleges benefit from a small student-to-faculty ratio and an education characterized by its focus on creating and implementing solutions for real problems, interdisciplinary collaboration and innovation. A global university, Carnegie Mellon’s main campus in the United States is in Pittsburgh, Pa. It has campuses in California’s Silicon Valley and Qatar, and programs in Asia, Australia, Europe and Mexico. The university is in the midst of a $1 billion fundraising campaign, titled “Inspire Innovation: The Campaign for Carnegie Mellon University,” which aims to build its endowment, support faculty, students and innovative research, and enhance the physical campus with equipment and facility improvements.


  1. Hi,
    extensively post about Computer Vision.
    Doing my research I find one amazing free to download book about Computer Vision.
    Hope you will be interested because it is really amazing,trust me. :)
    This book presents research trends on computer vision, especially on application of robotics, and on advanced approaches for computer vision (such as omnidirectional vision).
    The contents of this book allow the reader to know more technical aspects and applications of computer vision.
    The intended audience is anyone who wishes to become familiar with the latest research work on computer vision, especially its applications on robots.
    This book features representative work on the computer vision, and it puts more focus on robotics vision and omnidirectional vision.
    This is the link where you can find it:


Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.