Inferotemporal cortex: How the brain recognizes objects

When the eyes are open, visual information flows from the retina through the optic nerve and into the brain, which assembles this raw information into objects and scenes.

Scientists have previously hypothesized that objects are distinguished in the inferior temporal (IT) cortex, which is near the end of this flow of information, also called the ventral stream. Anew study from MIT neuroscientists offers evidence that this is indeed the case.

Using data from both humans and nonhuman primates, the researchers found that neuron firing patterns in the IT cortex correlate strongly with success in object-recognition tasks.

“While we knew from prior work that neuronal population activity in inferior temporal cortex was likely to underlie visual object recognition, we did not have a predictive map that could accurately link that neural activity to object perception and behavior. The results from this study demonstrate that a particular map from particular aspects of IT population activity to behavior is highly accurate over all types of objects that were tested,” says James DiCarlo, head of MIT’s Department of Brain and Cognitive Sciences, a member of the McGovern Institute for Brain Research, and senior author of the study, which appears in the Journal of Neuroscience.

The paper’s lead author is Najib Majaj, a former postdoc in DiCarlo’s lab who is now at New York University. Other authors are former MIT graduate student Ha Hong and former MIT undergraduate Ethan Solomon.

Distinguishing objects

Earlier stops along the ventral stream are believed to process basic visual elements such as brightness and orientation. More complex functions take place farther along the stream, with object recognition believed to occur in the IT cortex.

To investigate this theory, the researchers first asked human subjects to perform 64 object-recognition tasks. Some of these tasks were “trivially easy,” Majaj says, such as distinguishing an apple from a car. Others — such as discriminating between two very similar faces — were so difficult that the subjects were correct only about 50 percent of the time.

After measuring human performance on these tasks, the researchers then showed the same set of nearly 6,000 images to nonhuman primates as they recorded electrical activity in neurons of the inferior temporal cortex and another visual region known as V4.

Each of the 168 IT neurons and 128 V4 neurons fired in response to some objects but not others, creating a firing pattern that served as a distinctive signature for each object. By comparing these signatures, the researchers could analyze whether they correlated to humans’ ability to distinguish between two objects.

The researchers found that the firing patterns of IT neurons, but not V4 neurons, perfectly predicted the human performances they had seen. That is, when humans had trouble distinguishing two objects, the neural signatures for those objects were so similar as to be indistinguishable, and for pairs where humans succeeded, the patterns were very different.

“On the easy stimuli, IT did as well as humans, and on the difficult stimuli, IT also failed,” Majaj says. “We had a nice correlation between behavior and neural responses.”

The findings support the hypothesis that patterns of neural activity in the IT cortex can encode object representations detailed enough to allow the brain to distinguish different objects, the researchers say.

Nikolaus Kriegeskorte, a principal investigator at the Medical Research Council Cognition and Brain Sciences Unit in Cambridge, U.K., agrees that the study offers “crucial evidence supporting the idea that inferior temporal cortex contains the neuronal representations underlying human visual object recognition.”

“This study is exemplary for its original and rigorous method of establishing links between brain representations and human behavioral performance,” adds Kriegeskorte, who was not part of the research team.

Model performance

The researchers also tested more than 10,000 other possible models for how the brain might encode object representations. These models varied based on location in the brain, the number of neurons required, and the time window for neural activity.

Some of these models, including some that relied on V4, were eliminated because they performed better than humans on some tasks and worse on others.

“We wanted the performance of the neurons to perfectly match the performance of the humans in terms of the pattern, so the easy tasks would be easy for the neural population and the hard tasks would be hard for the neural population,” Majaj says.

The research team now aims to gather even more data to ask if this model or similar models can predict the behavioral difficulty of object recognition on each and every visual image — an even higher bar than the one tested thus far. That might require additional factors to be included in the model that were not needed in this study, and thus could expose important gaps in scientists’ current understanding of neural representations of objects.

They also plan to expand the model so they can predict responses in IT based on input from earlier parts of the visual stream.

“We can start building a cascade of computational operations that take you from an image on the retina slowly through V1, V2, V4, until we’re able to predict the population in IT,” Majaj says.


Substack subscription form sign up