Imagine an embassy bombing. Consider the massive amount and varied types of data that investigators need to review to determine who carried out the attack and how it was accomplished. Such a probe could involve the slow, painstaking examinations of video footage, photos, internet communications, telephone records, and other material.
An international team of scientists led by researchers at Johns Hopkins University and supported by an $11-million, five-year U.S. Department of Defense grant wants to streamline such investigations by developing algorithms for extracting relevant details from multimodal data. Participating scientists from nine universities in the U.S. and the United Kingdom will convene at JHU’s Homewood campus on Wednesday for their first group meeting on the challenging project.
The team’s ultimate goal is to teach a computer system to “think” like a digital Sherlock Holmes, and to quickly identify the most useful information and ignore details it deems irrelevant.
René Vidal, a Johns Hopkins biomedical engineering professor who is principal investigator on the DoD grant, said a key goal is to develop the technology that will enable a computer to characterize the information content of multimodal data. The goal is to develop methods that can distinguish, for example, what is happening in a particular photo and not just store it as another JPEG file.
“In a computer today,” Vidal said, “a picture of a car and a picture of a person are compressed in the same way. We want the computer to recognize what objects are present in a photo or video, what actions are taken, and to see what the contextual relationships among these entities are. That’s what we call the semantic constraints of the scene.”
For example, if a photo or video depicts a large truck parked behind a grocery store, where workers unload fresh fruit and vegetables, then a computer programmed to look for unusual or potentially deadly activity would likely dismiss that scene as mundane business as usual. But if the same truck approached a large group of protesters or people gathered for a celebration, the system would signal an alarm, based on the recent series of terrorist attacks on pedestrians carried out using trucks.
“The system is going to be task-dependent,” Vidal said. “If someone tells me to count the number of times a truck comes to a supermarket to unload fruits and vegetables, then a picture of that happening is very important. But if the task is to try to detect a truck that might attack people, then the truck unloading fruit at the supermarket will be considered irrelevant.”
He added that the more-intelligent computer system could provide invaluable time-saving help to the armed forces.
“If you were in the military, and you’ve recorded 10 million conversations to find information about the enemy, you’re not going to listen to 10 million conversations to find out what you need to know,” he said. “That’s where the question of which of those 10 million conversations contains something informative comes in. And not only which conversation, but which part of the conversation has something important in it.”
Vidal pointed out that other applications of intelligent computing advances are already turning up in pilot projects such as self-driving cars and the Amazon Go retail store that allows shoppers to fill their carts and depart without going through a conventional checkout cashier. Instead, the merchandise in the cart is tracked by the store’s computer system and automatically charged to the customer’s Amazon account at the exit.
The funding for Vidal’s team was awarded this year through the DoD’s Multidisciplinary University Research Initiatives program. In addition to Johns Hopkins, the U.S. schools participating in the project are Stanford University, the University of Maryland, UC Berkeley, USC, and UCLA. The U.K. researchers are from Oxford University, Imperial College; University of Surrey; and University College London.
At Johns Hopkins, the Department of Biomedical Engineering, in which Vidal holds his faculty appointment, is shared by the university’s Whiting School of Engineering and its School of Medicine. Vidal also is a core faculty member in the university’s Center for Imaging Science.