With the human genome in hand, scientists now know the roughly 30,000 words making up the language of the human body. But what do those words mean? Stuart Kim, PhD, associate professor of developmental biology and genetics at the Stanford School of Medicine, has created the first dictionary that defines them. From Stanford University Medical Center:
Stanford researcher finds method to define genetic ‘words’
STANFORD, Calif. – With the human genome in hand, scientists now know the roughly 30,000 words making up the language of the human body. But what do those words mean? Stuart Kim, PhD, associate professor of developmental biology and genetics at the Stanford School of Medicine, has created the first dictionary that defines them.
His work, published in the Aug. 21 advance online version of the journal Science, could help researchers understand the role of newly identified genes. It also provides a glimpse into how a gene’s function has evolved over time. “This tool tells you which genetic words are used together. If I see a new word and I see its context, I know what that word means,” Kim said.
Kim’s method works because scientists already understand the role many proteins play within a cell. Of these known genes, those involved in the same process, such as cell division, all tend to be active at the same time. Relying upon context, scientists can deduce that an undefined gene active at the same time as genes with a known function is probably involved in the same process.
Kim and graduate student Joshua Stuart created their genetic dictionary from gene activity data in four organisms: humans, fruit flies, a roundworm called C. elegans and yeast. Previous experiments at Stanford have yielded a wealth of information about when and in what tissues the genes in these organisms become active.
From these data, Kim and his colleagues figured out which genes happened to be churning out protein at the same time. Their results showed groups of genes with identical patterns of activity. Some genes within these groups have known activities, providing a context for the many genes whose function was previously unknown.
Kim and his colleagues tested their resource using five genes with previously unknown functions. These genes were always active at the same time as a network of genes known to be involved in cell proliferation. These genes also happened to be extremely active in cancer cells, which fail to divide normally, adding credence to the idea that these were cell division genes. To further test the role of one gene, the researchers eliminated its function in C. elegans. Some cells in those worms began rampant division. “This tells us that the five genes really are involved in proliferation,” Kim said.
“People can go to this dictionary and find out how their word is used,” Kim said. His data is available online for researchers who want to learn more about their favorite undefined gene.
Kim envisions numerous uses for this resource. Researchers who study a particular cellular function may seek out novel genes that are activated in concert with their usual genetic suspects. Other researchers may want to know the function of a gene that’s mutated in people with a genetic disease. Kim’s database could pinpoint a role for the disease gene, guiding future research in that disorder.
Kim added that his data also could be used to identify genes that have changed function over time. He said some genes are part of the same network in all organisms from yeast to humans. These genes are often involved in very basic processes such as making proteins. Other genes, such as those involved in the nervous system, may have changed little over time, but they are activated with a different group of genes in each organism. The proteins made by these genes have taken on new roles as evolution has progressed.
Eran Segal, a graduate student in computer science, is a co-first author on the paper. Daphne Koller, PhD, associate professor of computer science at Stanford, shares contributing authorship with Kim.