Scientists looking across human, fly and worm genomes find shared biology

Researchers analyzing human, fly, and worm genomes have found that these species have a number of key genomic processes in common, reflecting their shared ancestry. The findings, appearing Aug. 28, 2014, in the journal Nature, offer insights into embryonic development, gene regulation and other biological processes vital to understanding human biology and disease.

The studies highlight the data generated by the modENCODE Project and the ENCODE Project, both supported by the National Human Genome Research Institute (NHGRI), part of the National Institutes of Health. Integrating data from the three species, the model organism ENCyclopedia Of DNA Elements (modENCODE) Consortium studied how gene expression patterns and regulatory proteins that help determine cell fate often share common features. Investigators also detailed the similar ways in which the three species use protein packaging to compact DNA into the cell nucleus and to regulate genome function by controlling access to DNA.

Launched in 2007, the goal of modENCODE is to create a comprehensive catalog of functional elements in the fruit fly and roundworm genomes for use by the research community. Such elements include genes that code for proteins, non-protein-coding genes and regulatory elements that control gene expression. The current work builds on initial catalogs published in 2010. The modENCODE projects complement the work being done by the ENCyclopedia Of DNA Elements (ENCODE) Project, which is building a comprehensive catalog of functional elements in the human and mouse genomes.

“The modENCODE investigators have provided a valuable resource for researchers worldwide,” said NHGRI Director Eric Green, M.D., Ph.D. “The insights gained about the workings of model organisms’ genomes greatly help to inform our understanding of human biology.”

“One way to describe and understand the human genome is through comparative genomics and studying model organisms,” said Mark Gerstein, Ph.D., Albert L. Williams Professor of Biomedical Informatics at Yale University in New Haven, Connecticut, and the lead author on one of the papers. “The special thing about the worm and fly is that they are very distant from humans evolutionarily, so finding something conserved across all three — human, fly and worm — tells us it is a very ancient, fundamental process.”

In one study, scientists led by Dr. Gerstein and others, analyzed human, fly and worm transcriptomes, the collection of gene transcripts (or readouts) in a genome. They used large amounts of gene expression data generated in the ENCODE and modENCODE projects — including more than 67 billion gene sequence readouts — to discover gene expression patterns shared by all three species, particularly for developmental genes.

Investigators showed that the ways in which DNA is packaged in the cell are similar in many respects, and, in many cases, the species share programs for turning on and off genes in a coordinated manner. More specifically, they used gene expression patterns to match the stages of worm and fly development and found sets of genes that parallel each other in their usage. They also found the genes specifically expressed in the worm and fly embryos are re-expressed in the fly pupae, the stage between larva and adult.

The researchers found that in all three organisms, the gene expression levels for both protein-coding and non-protein-coding genes could be quantitatively predicted from chromatin features at the promoters of genes. A gene’s promoter tells the cell’s machinery where to begin copying DNA into RNA, which can be used to make proteins. DNA is packaged into chromatin in cells, and changes in this packaging can regulate gene function.

“Our findings open whole new worlds for understanding gene expression and how we think about the role of transcription,” said co-senior author Susan Celniker, Ph.D., Head, Department of Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, California. “modENCODE has been transformative,” she added. “It has helped set the standard for the types of data that should be generated and catalogued.”

Another group of scientists investigated how chromatin is organized and how it influences gene regulation in the three species. Using both modENCODE and ENCODE data, scientists compared patterns of modifications in chromatin that are needed for the cell to access the DNA inside, and the changes in DNA replication patterns as a result of these modifications. The investigators discovered that many features of chromatin were similar in all three species.

“We found mostly that the usage of chromatin modification by the three organisms is highly conserved,” said co-senior author Jason Lieb, Ph.D., professor of human genetics, University of Chicago. He noted there is a great deal of interest in chromatin because of its potential connection to some diseases, such as cancer. A number of studies have shown that some cancers may be driven in part by mutations in chromatin-related genes.

In a third study, scientists explored similarities in genome regulation. Scientists focused on transcription-regulatory factors, key protein regulators that determine which progenitor cells eventually become skin cells and kidney cells and eye cells. “These are the key coordinators – they bind to switches that control a cell’s fate. One of the big questions in genomics is to determine what factors work together to turn on which genes,” said co-senior author Michael Snyder, Ph.D., director, Stanford Center for Genomics and Personalized Medicine, Stanford University, Stanford, California.

Investigators found that the transcription factors tend to bind to similar DNA sequences in the three species’ genomes, indicating that “the general properties of how regulatory information is laid out in the genomes are conserved in the three species,” Dr. Snyder noted. “The general principles of regulation are more or less similar.” Still, they found differences as well. The transcription factors bind very few of the same targets across species, and they are mostly expressed at different times.

Including these newly published studies, more than a dozen modENCODE Consortium papers have been or will be published in the journals Nature, Genome Research, Genome Biology, and the Proceedings of the National Academy of Sciences this year. These additional papers report studies involving deeper analyses within one or more species, methods development and investigations of particular biological processes. This collection of papers is the culmination of the modENCODE program, for which funding ended in 2012. More than 100 papers using modENCODE data by groups outside of the program have already been published. It is anticipated that the data and resources produced by modENCODE will continue to be used by the broader research community for years to come.


Substack subscription form sign up