A thin but powerful layer of hidden DNA variation may help explain why one person develops disease while another stays healthy. In a new Genome Biology study, scientists at The Hospital for Sick Children (SickKids) show that subtle sequence changes in short tandem repeats, not just their length, can reshape gene activity across tissues and populations.
Tandem repeats are short pieces of DNA copied again and again in a row, making up about seven per cent of the human genome. When these repeats grow too long, they can break genes and cause disorders such as Huntington disease and fragile X syndrome, and they have been linked to autism spectrum disorder, schizophrenia, cancer, and cardiomyopathies. Because of this history, most tools have treated repeat length as the main thing that matters.
The new work asks a different question. Instead of only counting how many times a motif is repeated, the team looked inside the repeats themselves, asking whether the actual sequence of those repeated units also varies from person to person. By analyzing whole genome data from 3,150 unrelated individuals in the 1000 Genomes Project and the Genotype Tissue Expression (GTEx) project, they found that about seven per cent of short tandem repeats, or STRs, show variation in their internal sequence composition.
Those variable STRs did not appear at random. They were more likely to sit near mobile DNA elements called Alu sequences, especially at the poly(A) rich tail at the 3′ end. There, the team saw a shift from A rich motifs, like AAAG and AAAAT, toward motifs with more G and C bases, such as AGGG and AGAGG. That pattern supports a long standing idea that many disease related STRs arose when the poly(A) tails of Alu elements accumulated mutations and hardened into repeat tracts.
Crucially, variable STRs clustered in parts of the genome that matter for how genes are read and spliced. They showed up more often at splice junctions, in noncoding RNA exons, and in regions already known to be sensitive to tandem repeat changes. When the researchers compared people who carry the most common motif with those who carry an alternative motif at the same STR, they saw consistent differences in expression for dozens of nearby genes across 46 tissues.
The effects were not uniform across humanity. Alternative motifs were especially frequent in individuals of African ancestry and showed distinct patterns in other groups, including South Asian, East Asian, and European populations. At several loci already tied to repeat expansion diseases, such as RFC1, STARD7, and BEAN1, the team found many previously undescribed motif variants in the general population. That diversity hints that some populations may carry unique regulatory signatures in their STRs, even when they do not carry the known pathogenic expansions themselves.
A New Dimension Of Short Tandem Repeats
To find these patterns, the team built on a catalog free approach called ExpansionHunter Denovo and a custom analysis pipeline at The Centre for Applied Genomics at SickKids. This allowed them to detect both the size and motif composition of STRs, even when the repeats were longer than the short reads generated by standard Illumina sequencing. By comparing variable and non variable STRs across cohorts, they showed that sequence variable repeats are more prone to expansion, more enriched near recently active Alu subfamilies, and more likely to sit in regions already implicated in monogenic repeat disorders.
The picture that emerges is that STRs are not simply molecular rulers whose length alone drives disease. Instead, they are dynamic sequences whose internal motifs can flip between A rich and GC rich forms, often right next door to mobile elements. Those internal shifts appear to change how easily repeats expand, how they interact with nearby DNA and proteins, and whether they end up in fragile sites or splice sensitive regions.
“These changes in STR composition aren’t rare, they’re a normal part of human genetic diversity. This is a new dimension of genetic variation that’s been hiding in plain sight,” says study-lead Dr. Ryan Yuen, Senior Scientist in the Genetics & Genome Biology program.
From Brain Development To Ancestry And Treatment
One of the clearest signals was in the brain. Variable STRs were enriched at splice junctions of genes involved in neuron projection, axon guidance, and growth. They also overlapped with tandem repeats already known to influence splicing across human tissues, with especially strong enrichment in hippocampus, hypothalamus, nucleus accumbens, and putamen. In these regions, swapping a dominant motif for an alternative motif often shifted gene expression up or down, suggesting that motif changes can fine tune neural circuits rather than simply breaking them.
Similar stories appeared in other systems. Some genes with alternative motifs showed higher expression in heart and lung tissues, while others were downregulated in adipose tissue, arteries, and frontal cortex. Taken together, these patterns support the idea that STR sequence changes help shape human phenotypic diversity, from brain function to cardiovascular traits, in ways that have been largely invisible to standard genome analyses that track only single nucleotide variants and repeat lengths.
“Our approach lets us see both size and sequence composition,” says Yuen. “We’re still only scratching the surface, but these regions may hold the answers to some of the unknowns in our genome and contain potential targets for future disease studies.”
For now, the work relies on short read sequencing and statistical clustering, which cannot fully resolve the most complex or expanded repeats. As long read technologies like Oxford Nanopore and PacBio become routine in research and clinical genomics, the authors expect to uncover even richer layers of STR diversity, including detailed motif structure within individual alleles. That next wave of data could clarify how sequence variation and repeat length interact, sharpen genetic risk prediction across ancestries, and reveal new levers for precision therapies, especially in neurodevelopmental and other repeat associated disorders.
Genome Biology: 10.1186/s13059-025-03754-9
ScienceBlog.com has no paywalls, no sponsored content, and no agenda beyond getting the science right. Every story here is written to inform, not to impress an advertiser or push a point of view.
Good science journalism takes time — reading the papers, checking the claims, finding researchers who can put findings in context. We do that work because we think it matters.
If you find this site useful, consider supporting it with a donation. Even a few dollars a month helps keep the coverage independent and free for everyone.
