Seeing sound via crossmodal mappings

A busy kitchen is a place where all of the senses are on high alert–your brain is processing the sound of sizzling oil, the aroma of spices, the visual aesthetic of food arranged on a plate, the feel and taste of taking a bite. While these signals may seem distinct and independent, they actually interact and integrate together within the brain’s network of sensory neurons.

Caltech researchers have now discovered that intrinsic neural connections–called crossmodal mappings–can be used by assistive devices to help the blind detect their environment without requiring intense concentration or hundreds of hours of training. This new multisensory perspective on such aids (called sensory substitution devices) could make tasks that were previously attention-consuming much easier, allowing nonsighted people to acquire a new sensory functionality similar to vision. The work is described in a paper published in the October 22 issue of the journal Scientific Reports.

“Many neuroscience textbooks really only devote a few pages to multisensory interaction,” says Shinsuke Shimojo, the Gertrude Baltimore Professor of Experimental Psychology and principal investigator on the study. “But 99 percent of our daily life depends on multisensory–also called multimodal–processing.” As an example, he says, if you are talking on the phone with someone you know very well, and they are crying, you will not just hear the sound but will visualize their face in tears. “This is an example of the way sensory causality is not unidirectional–vision can influence sound, and sound can influence vision.”

Shimojo and postdoctoral scholar Noelle Stiles have exploited these crossmodal mappings to stimulate the visual cortex with auditory signals that encode information about the environment. They explain that crossmodal mappings are ubiquitous; everyone already has them. Mappings include the intuitive matching of high pitch to elevated locations in space or the matching of noisy sounds with bright lights. Multimodal processing, like these mappings, may be the key to making sensory substitution devices more automatic.

The researchers conducted trials with both sighted and blind people using a sensory substitution device, called a vOICe device, that translates images into sound.

The vOICe device is made up of a small computer connected to a camera that is attached to darkened glasses, allowing it to “see” what a human eye would. A computer algorithm scans each camera image from left to right, and for every column of pixels, generates an associated sound with a frequency and volume that depends upon the vertical location and brightness of the pixels. A large number of bright pixels at the top of a column would translate into a loud, high-frequency sound, whereas a large number of lower dark pixels would be a quieter, lower-pitched sound. A blind person wearing this camera on a pair of glasses could then associate different sounds with features of their environment.

In the trials, sighted people with no training or instruction were asked to match images to sounds; while the blind subjects were asked to feel textures and match them to sound. Tactile textures can be related to visual textures (patterns) like a topographic map–bright regions of an image translate to high tactile height relative to a page, while dark regions are flatter. Both groups showed an intuitive ability to identify textures and images from their associated sounds. Surprisingly, the untrained (also called “naive”) group’s performance was significantly above chance, and not very different from the trained.

The intuitively identified textures used in the experiments exploited the crossmodal mappings already within the vOICe encoding algorithm. “When we reverse the crossmodal mappings in the vOICe auditory-to-visual translation, the naive performance significantly decreased, showing that the mappings are important to the intuitive interpretation of the sound,” explains Stiles.

“We found that using this device to look at textures–patterns of light and dark–illustrated ‘intuitive’ neural connections between textures and sounds, implying that there is some preexisting crossmodality,” says Shimojo. One common example of crossmodality is a condition called synesthesia, in which the activation of one sense leads to a different involuntary sensory experience, such as seeing a certain color when hearing a specific sound. “Now, we have discovered that crossmodal connections, preexisting in everyone, can be used to make sensory substitution intuitive with no instruction or training.”

The researchers do not exactly know yet what each sensory region of the brain is doing when processing these various signals, but they have a rough idea. “Auditory regions are activated upon hearing sound, as are the visual regions, which we think will process the sound for its spatial qualities and elements. The visual part of the brain, when processing images, maps objects to spatial location, fitting them together like a puzzle piece,” Stiles says. To learn more about how the crossmodal processing happens in the brain, the group is currently using functional magnetic resonance imaging (fMRI) data to analyze the crossmodal neural network.

These preexisting neural connections provide an important starting point for training visually impaired people to use devices that will help them see. A sighted person simply has to open their eyes, and the brain automatically processes images and information for seamless interaction with the environment. Current devices for the blind and visually impaired are not so automatic or intuitive to use, generally requiring a user’s full concentration and attention to interpret information about the environment. The Shimojo lab’s new finding on the role of multimodal processing and crossmodal mappings starts to address this issue.

Beyond its practical implications, Shimojo says, the research raises an important philosophical question: What is seeing?

“It seems like such an obvious question, but it gets complicated,” says Shimojo. “Is seeing what happens when you open your eyes? No, because opening your eyes is not enough if the retina [the light-sensitive layer of tissue in the eye] is damaged. Is it when your visual cortex is activated? But our research has shown that the visual cortex can be activated by sound, indicating that we don’t really need our eyes to see. It’s very profound–we’re trying to give blind people a visual experience through other senses.”


Substack subscription form sign up