A photo is worth a thousand words, but what if the image could also represent thousands of other images?
New software developed by UC Berkeley computer scientists seeks to tame the vast amount of visual data in the world by generating a single photo that can represent massive clusters of images. This tool can give users the photographic gist of a kid on Santa’s lap, housecats, or brides and grooms at their weddings. It works by generating an image that literally averages the key features of the other photos.
Users can also give extra weight to specific features to create subcategories and quickly sort the image results. In this way, blue-winged butterflies or orange tabby cats might rise to the top of photo collections.
The research, led by Alexei Efros, associate professor of electrical engineering and computer sciences, will be presented today (Thursday, Aug. 14) at the International Conference and Exhibition on Computer Graphics and Interactive Techniques, or SIGGRAPH, in Vancouver, Canada.
The authors noted that since photography was invented, there have been an estimated 3.5 trillion photos taken, including 10 percent within the past year. Facebook reports 6 billion photo uploads per month on its site, and YouTube gets 72 hours of video uploaded every minute.
“Visual data is among the biggest of Big Data,” said Efros, who is also a member of the UC Berkeley Visual Computing Lab. “We have this enormous collection of images on the Web, but much of it remains unseen by humans because it is so vast. People have called it the dark matter of the Internet. We wanted to figure out a way to quickly visualize this data by systematically ‘averaging’ the images.”
Efros worked with Jun-Yan Zhu, UC Berkeley computer science graduate student and the paper’s lead author, and Yong Jae Lee, former UC Berkeley postdoctoral researcher, to develop the system, which they have dubbed AverageExplorer.
The researchers provided examples of potential applications of this system, such as in online shopping, where a consumer may want to quickly home in on two-inch wedge heels in the perfect shade of red. Or perhaps media analysts would like to see Stephen Colbert’s typical body posture when the face of President Barack Obama appears in the graphic over his shoulder.
Lee, now an assistant professor in computer sciences at UC Davis, said the system could also be used to help improve the ability of computer vision systems to distinguish key features in an image, such as the tires on a car or the eyes on a face. When users mark those features on an average image, the entire collection of images is automatically annotated as well.
“In computer vision, annotations are used to train a system to detect objects, so you might mark the eyes, nose and mouth to teach the computer what a human face looks like,” said Lee. “Lots of data is needed to accurately train the system, so reducing the amount of effort and time to do this is critical. Instead of annotating each image individually, with AverageExplorer, we only need to annotate the average image, and the system will automatically propagate the annotations to the image collection.”
The researchers were inspired by artists like James Salavon, who has created average images from hundreds of photos of kids with Santa, newlyweds or baseball players to illustrate a concept. Average images can provide interesting insights, such as the convention in Western culture for brides to wear white and stand to the right of the groom in formal portraits, or for youth baseball players to get down on one knee in their official photo.
Many of the manual steps Salavon used to sort and align his images are now automated through the UC Berkeley tool.
Funding from Google, Adobe and the Office of Naval Research helped support this work.