New! Sign up for our email newsletter on Substack.

Why A Foreign Language Sounds Like A Blur To Non Native Ears

To a non-native listener, a foreign language can feel like one long, breathless sound. Yet in your own language, your brain effortlessly snaps that same kind of stream into crisp, separate words. Two new studies from UC San Francisco show where that magic lives and how the brain learns to pull it off.

The work points to a strip of high level auditory cortex called the superior temporal gyrus, or STG. Long thought to handle basic sound features like consonants and vowels, it turns out to be doing something far more ambitious. It carves continuous speech into word shaped units, tracks how each word unfolds in time, and learns the sound patterns that define each language you know.

The Brain’s Hidden Reset Button For Words

In the Neuron study, the team recorded brain activity directly from the cortical surface of 16 people with epilepsy while they listened to short radio news stories in their native language. Using high density electrocorticography, they could follow neural activity millisecond by millisecond as each sentence played out.

They found that populations of neurons in the STG show a distinctive pattern right at the boundary between words. Activity in the high gamma band briefly drops about one tenth of a second after each word ends, then ramps back up as the next word begins. Crucially, this sharp reset appears at word boundaries but not at syllable boundaries inside a word, even though the acoustic signal can look quite similar.

“This shows that the STG isn’t just hearing sounds, it’s using experience to identify words as they’re being spoken,” said Edward Chang, MD, chair of Neurological Surgery. “This work gives us a neural blueprint for how the brain transforms continuous sound into meaningful units.”

Between these resets, the same neural populations weave together multiple layers of information. The recordings show encoding of acoustic phonetic features like consonants and vowels, prosodic cues such as stress and rhythm, and lexical properties like word frequency and duration. All of that is aligned to the start and end of each word, not just to raw acoustic changes.

When the researchers examined the combined activity of many electrodes at once, a striking pattern emerged. As each word unfolds, the population activity traces out a loop in an abstract state space. Longer words produce larger loops, but the phase of the loop always tracks how far through the word the listener is, from beginning to end. In other words, these neural populations keep a flexible internal clock for each word that respects its relative timing rather than its absolute length.

To test whether artificial systems discover similar solutions, the team probed a self supervised speech model called HuBERT. In its deeper layers, the model also learned to highlight word boundaries and to represent words as cyclic trajectories that track relative elapsed time. Those dynamics were absent in earlier layers, which mainly reflected the acoustic spectrogram. The convergence suggests that both brains and modern speech models end up using similar temporal scaffolding to extract words from continuous sound.

Why Your Native Language Pops And A Foreign Language Smears

The companion Nature study asks a more familiar question from everyday life. Why do words in your native language feel cleanly separated, while speech in an unfamiliar language often collapses into a blur?

Here, the group drew on a rare ten year collection of recordings from 20 monolingual speakers of Spanish, English, or Mandarin, plus additional bilingual and multilingual volunteers. Each participant listened to sentences in their own language and in a foreign language they could not understand, while electrodes on the brain surface captured activity in the STG and neighboring regions.

At first glance, the results seem almost disappointing. Native and foreign speech activated largely the same patches of cortex, and acoustic phonetic tuning for basic speech features looked remarkably similar across languages. The STG responded to vowels, consonants, and rapid envelope changes in broadly the same way whether or not the listener understood what was being said.

The difference emerged when the team modeled how those same neural populations handled higher level structure. In a listener’s native language, STG activity carried extra information about where words begin and end, how frequent each word is, how long it lasts, and how surprising each phoneme is given the ones that came before. In unfamiliar speech, those word level and sequence level signals were much weaker.

“It’s like a kind of reboot, where the brain has processed a word it recognizes, and then resets so it can start in on the next word,” said Matthew Leonard, PhD, associate professor of Neurological Surgery.

That reset, and the richer encoding that follows, also help explain why native speech feels so segmented even when the sound itself is ambiguous. In both Spanish and English, many word boundaries are acoustically hard to distinguish from syllable boundaries inside a word. Yet neural decoders trained on STG activity could tell those cases apart much more accurately in the listener’s native language than in a foreign language, especially when acoustic cues were most misleading.

Bilingual Spanish English participants provided a further twist. For them, word boundaries, phoneme sequences, and word frequency were robustly encoded for both familiar languages, often in the very same STG electrodes. Across a broader group of speakers of Russian, Arabic, Korean, and other languages, the strength of neural word boundary decoding in English scaled with self reported English proficiency. It was knowledge, not language family, that mattered.

From Blurred Sound To Discrete Words

Taken together, the two studies outline a dynamic picture of how the brain turns sound into words. Shared acoustic phonetic processing in the STG operates for any language you hear. On top of that, years of experience with a particular language teach STG populations to track word boundaries, lexical statistics, and phoneme sequences, and to reset their activity at just the right moments. When those experience dependent signals are missing, speech really does become a blur.

That same circuitry also clarifies why damage to specific temporal lobe regions can leave people able to hear sounds but unable to understand speech. The ears are working. The reset loops and word level codes in the STG are not.

Study Details

Auditory word form study
Zhang YZ, Leonard MK, Bhaya Grossman I, Gwilliams L, Chang EF. “Human cortical dynamics of auditory word form encoding.” Neuron, 2025.

Native versus foreign speech study
Bhaya Grossman I, Leonard MK, Zhang YZ, Gwilliams L, Johnson K, Lu J, Chang EF. “Shared and language specific phonological processing in the human temporal lobe.” Nature, 2025.


Quick Note Before You Read On.

ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.

Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.

If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.


Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.