Collaboration generates most complete cancer genome map

An international team has completed the most comprehensive study of whole cancer genomes, significantly improving the fundamental understanding of cancer and indicating new directions for developing diagnostics and treatments.

The discoveries, published today in 23 papers in Nature and its affiliated journals, are an important step toward a map of all major cancer-causing mutations in the genome.

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Project (PCAWG, or the Pan-Cancer Project), a collaboration involving more than 1,300 scientists and clinicians from 37 countries, analyzed more than 2,600 whole genomes of 38 different tumor types — the largest publicly available whole-genome dataset in the cancer genomics field. Fifty-two members of the Broad Institute of MIT and Harvard contributed to this research throughout the six-year long project.

Using the collected data, 16 working groups examined multiple aspects of cancer development, causation, progression, and classification, confirming previous findings and generating new knowledge about cancer biology, including identifying a large diversity of molecular processes that generate cancer-causing mutations. The Pan-Cancer Project also improved and developed new methods for analyzing cancer genomes.

Previous cancer genome studies focused on the 1 percent of the genome that codes for proteins, known as the exome. The Pan-Cancer Project explored the remaining 99 percent of the genome, which includes regions that regulate the activity of genes.

“This large international effort shows the breadth of the types of research and new biological insight that are possible using whole cancer genome data,” said Gad Getz, an institute member and director of the Cancer Genome Computational Analysis Group at the Broad Institute who is also the director of bioinformatics at the Massachusetts General Hospital’s (MGH) Cancer Center and a professor of pathology at Harvard Medical School (HMS). Getz, a member of the PCAWG steering committee, is a co-senior author of three of the papers.

Other institutions represented on the project’s steering committee include the Ontario Institute for Cancer Research in Canada, the Wellcome Sanger Institute in the United Kingdom, the European Molecular Biology Laboratory in Germany and the University of California, Santa Cruz.

“It was heartening that this very large group was able to bring together disparate resources and work to come up with some groundbreaking findings,” said Rameen Beroukhim, an associate member of the Broad Institute, an associate professor at the Dana-Farber Cancer Institute, an associate professor of medicine at HMS, and a co-senior author of two of the papers.

Two scientists looking at a computer screen. — Gad Getz, director of the Cancer Genome Computational Analysis Group at the Broad Institute (left), and Esther Rheinbay, an associate member of the institute, are part of an international consortium of scientists that analyzed more than 2,600 whole cancer genomes.

Credit: Bearwalk Cinema

Who’s driving?

Included in the suite of papers published today is an overview in Nature that describes how the collaborators collated and standardized existing genomic data from the consortium’s hundreds of research groups around the world. Moving these previously disparate datasets onto a common cloud computing platform was a major part of the project’s success. The paper also details some of the consortium’s most striking findings.

For example, the tumor genomes in the study each carry an average of four or five “driver mutations” — mutations that play a large role in driving the growth of cancer.

Prior to these studies, 30 percent of tumors had unexplained genetic causes, but by analyzing the entire tumor genome, the consortium scientists discovered more driver mutations, leaving only 5 percent of tumors with no known drivers.

Getz and the other PCAWG steering committee members are the senior authors of the overview paper.

Driving but not coding

Another paper, published in Nature, focused more closely on driver mutations in the regions of the genome that don’t code for proteins. The scientists were surprised to find so few of these non-coding drivers, given that 99 percent of the genome is non-coding. The team discovered that only 13 percent of drivers identified in this analysis were non-coding.

“When people started sequencing whole genomes, there was an expectation that we would find non-coding drivers on the same order as the protein-coding drivers. It was a bit surprising that we didn’t find as many as we would have expected,” said Esther Rheinbay, an associate member of the Broad Institute and co-first author of the non-coding drivers paper, who is also an assistant professor of medicine at HMS and the MGH Cancer Center. Jeremiah Wala and Ofer Shapira, former Beroukhim lab members, are also co-first authors, and Getz and Beroukhim are senior co-authors of this study.

While non-coding drivers may simply play a smaller role in cancer than previously assumed, they could also be relatively rare and more difficult to find, meaning that the search for these drivers is not over.

“One issue these studies highlight is that we still don’t have enough cancer genome data,” said Rheinbay. “We need many more whole cancer genome sequences — which will be coming in the near future because of decreasing DNA sequencing costs.”

Getz added that knowing where these non-coding drivers reside in the genome — mainly in the regulatory regions — could help focus future studies on these regions. “If we narrow the search enough, we might be able to generate cost-effective assays that would allow us to study a larger number of tumor samples at a lower cost,” he said.

Tumors over time

Another paper, in Nature, followed the development of specific tumors through time and tracked the progression of genetic changes. The scientists found that tumors of the same cancer type often shared the same cancer-initiating mutations. But as the tumors evolved, they acquired additional distinct driver mutations that were caused by different DNA-damaging processes, depending on the cancer type and the patient’s genetics and lifestyle.

“Using computational reconstruction methods, we were able to estimate the order and timing of genetic events that lead to cancer,” said Ignaty Leshchiner, a co-first author of the study, who is a group leader at the Broad Institute and a member of Getz’s lab. “We found that these genetic events often occur many years before the tumor is detected.”

This study suggests that since early-stage mutations are generally consistent within a cancer type, they could be targets for the prevention, early detection, and treatment of the disease. Getz and Beroukhim are co-authors of the paper, with Peter Van Loo of The Francis Crick Institute in the United Kingdom and Moritz Gerstung of European Bioinformatics Institute as corresponding authors.

A mutation’s source

In another study in Nature, researchers studied the molecular processes that cause cancer mutations, including those that damage DNA and others that, when broken, fail to properly repair DNA. These processes are known to generate distinct patterns, or signatures, of cancer mutations across the genome. By looking at these “mutational signatures,”scientists can identify the molecular events that caused these mutations.

The research team used mathematical models to analyze millions of mutations in thousands of cancer genomes in search of these signatures. The scientists discovered many more than were previously known, and found strong associations between the new signatures and specific DNA-damaging processes.

“By analyzing the largest collection of whole cancer genomes studied thus far, we created the most comprehensive catalog of mutational signatures to date,” Getz said. “This catalog can be used to understand the mechanisms that generate mutations and drive cancer in each patient.”

Getz is a senior co-author of the study along with Steven Rozen of Duke-NUS Medical School in Singapore and Michael Stratton of the Wellcome Sanger Institute. Jaegil Kim and Nicholas Haradhvala from the Getz lab are co-first authors of the paper.

“Since every person is exposed to multiple sources of mutations throughout their lives, we need to use mathematical techniques to identify which mutations come from a particular biological source,” Kim said. “Having such a large dataset enabled us to map out these signatures with much greater precision.”

The authors also studied new types of mutation. “DNA can be mutated in a variety of ways, from changing single bases to removing entire sections of genetic code,” Haradhvala said. “This new dataset allowed us to analyze more types of mutation, expanding our understanding of the biological mechanisms of cancer.”

The results of the study will allow other researchers to identify the sources of cancer mutations in newly sequenced patient samples.

Looking ahead

The project has created and made available a comprehensive resource for cancer genomics researchers, including the raw genome sequencing data, software for cancer genome analysis, and multiple interactive websites exploring various aspects of the Pan-Cancer Project data.

“This global pan-cancer project demonstrates that we can learn a lot from studying whole cancer genomes,” Getz said. “We will continue learning about cancer biology and clinical applications from much larger genome datasets as they are generated in the future.”

He added that the data, findings, and methods reported in today’s 23 papers will help researchers and clinicians standardize cancer genome analysis, now that decreasing sequencing costs are making it possible to sequence more tumor genomes.

“We expect that this collection of papers will become the standard for analyzing whole cancer genomes,” Getz said.

The studies are also a major step toward personalized care for every cancer patient, by moving closer to a comprehensive list of cancer-causing mutations that oncologists could one day use to pinpoint the cause of a patient’s cancer.

“I want to be at a stage where, for every patient who comes to the doctor with cancer, we will be able to figure out what’s actually driving the tumor and how we can treat it,” Getz said.

PCAWG is a collaboration between the International Cancer Genome Consortium (ICGC), an umbrella organization that launches and coordinates cancer genomic research projects, and The Cancer Genome Atlas (TCGA), the cancer genomics program that was jointly managed by the National Cancer Institute and the National Human Genome Research Institute.