To learn from experience, it is essential to know whether a past action was associated with a desired outcome. Now, scientists have demonstrated how this information can be coded by a single cell. The research, published in the July 30th issue of the journal Neuron, provides strong support for a neural mechanism that allows reward signals to be combined over time to drive successful learning.
The lateral prefrontal cortex (PFC) and the basal ganglia (BG) have been shown to play a key role in flexible association learning. “We have known for some time that neurons in both areas care about response outcome — when we tell the animals whether they were correct or wrong, those neurons fire strongly. But we found that those responses can be maintained for a long time,” explains lead study author, Dr. Mark H. Histed from the Department of Neurobiology at Harvard Medical School. “In order to learn, you need to remember what you did before and whether that action was beneficial or not. These neurons carry that sort of memory.”
Dr. Histed and his colleagues, Drs. Anitha Pasupathy and Earl K. Miller from the Massachusetts Institute of Technology, studied the responses of neurons in the PFC and BG as animals performed a learning task where they were rewarded for making a correct association between a visual stimulus and an eye movement response. The researchers found that the activity of many of the neurons reflected the delivery (correct response) or withholding (incorrect response) of a reward and that this activity lasted for several seconds, the entire period between trials.
Importantly, the researchers also observed that the outcome of a single trial impacted the neural representation of the learned association. Specifically, the response selectivity was stronger on a given trial if the previous trial had been rewarded and weaker if the previous trial was incorrect and therefore did not earn a reward. “In other words, only after successes, not failures, did brain processing and the monkeys’ behavior improve”, said Dr. Miller.
These results show that cells in the PFC and BG not only exhibit robust and persistent signals about the outcome of behavioral responses, but that their selectivity is modulated based on trial outcome, demonstrating how behavioral outcome signals can shape learning. “Our observations may represent a snapshot of the leaning process — how single cells change their responses in real time as a result of information about what is the right action and what is the wrong one,” conclude the authors.
The researchers include Mark H. Histed, Anitha Pasupathy, and Earl K. Miller, of the The Picower Institute for Learning and Memory, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Boston, MA