Researchers from Radboud University and the UMC Utrecht have succeeded in transforming brain signals into audible speech. By decoding signals from the brain through a combination of implants and AI, they were able to predict the words people wanted to say with an accuracy of 92 to 100%. Their findings are published in the Journal of Neural Engineering this month.
The research indicates a promising development in the field of Brain-Computer Interfaces, according to lead author Julia Berezutskaya, researcher at Radboud University’s Donders Institute for Brain, Cognition and Behaviour and UMC Utrecht. Berezutskaya and colleagues at the UMC Utrecht and Radboud University used brain implants in patients with epilepsy to infer what people were saying.
Bringing back voices
‘Ultimately, we hope to make this technology available to patients in a locked-in state, who are paralyzed and unable to communicate,’ says Berezutskaya. ‘These people lose the ability to move their muscles, and thus to speak. By developing a brain-computer interface, we can analyse brain activity and give them a voice again.’
For the experiment in their new paper, the researchers asked non-paralyzed people with temporary brain implants to speak a number of words out loud while their brain activity was being measured. Berezutskaya: ‘We were then able to establish direct mapping between brain activity on the one hand, and speech on the other hand. We also used advanced artificial intelligence models to translate that brain activity directly into audible speech. That means we weren’t just able to guess what people were saying, but we could immediately transform those words into intelligible, understandable sounds. In addition, the reconstructed speech even sounded like the original speaker in their tone of voice and manner of speaking.’
Researchers around the world are working on ways to recognize words and sentences in brain patterns. The researchers were able to reconstruct intelligible speech with relatively small datasets, showing their models can uncover the complex mapping between brain activity and speech with limited data. Crucially, they also conducted listening tests with volunteers to evaluate how identifiable the synthesized words were. The positive results from those tests indicate the technology isn’t just succeeding at identifying words correctly, but also at getting those words across audibly and understandably, just like a real voice.
‘For now, there’s still a number of limitations,’ warns Berezutskaya. ‘In these experiments, we asked participants to say twelve words out loud, and those were the words we tried to detect. In general, predicting individual words is less complicated than predicting entire sentences. In the future, large language models that are used in AI research can be beneficial. Our goal is to predict full sentences and paragraphs of what people are trying to say based on their brain activity alone. To get there, we’ll need more experiments, more advanced implants, larger datasets and advanced AI models. All these processes will still take a number of years, but it looks like we’re heading in the right direction.’
Berezutskaya, J., Freudenburg, Z. V., Vansteensel, M. J., Aarnoutse, E. J., Ramsey, N. F., & van Gerven, M. A. J. (2023). Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models. In Journal of Neural Engineering. IOP Publishing. https://doi.org/10.1088/1741-2552/ace8be