We always knew that we each had our own, individual copy of The Book of Life, where the spellings of our genetic code differed ever so slightly. But a series of scientific studies published today show that it’s not only single letters but sentences, paragraphs, and even whole pages that can be missing or duplicated. In the leading publication in Nature, an international team has produced a map of such changes among 270 copies of the human genetic code that is already revealing new routes for finding genes involved in disease.
The Human Genome Project delivered a reference sequence for a human genome. To identify genes involved in disease, many focused studies, including the HapMap Project, have mapped single-letter differences (called single nucleotide polymorphisms or SNPs) between individuals and compared them to the human reference DNA sequence.
But the reference sequence has also provided the foundation for an entirely new search for variation, one that was not readily identifiable before. This is the search, not for single differences, but for larger regions that are absent from, or duplicated in different individuals. With this analysis of copy number variation (CNV), a whole new vista of genetic variation with dramatic implications for disease studies has been revealed.
“Each one of us has a unique pattern of gains and losses of complete sections of DNA,” said Dr Matthew Hurles, one of the projects leaders at the Wellcome Trust Sanger Institute, “and one of the real surprises of these results was just how much of our DNA varies in copy number. We estimate this to be at least 12% of the genome, similar in extent to SNPs. This has never been shown before.”
“The copy number variation that researchers had seen before was simply the tip of the iceberg, while the bulk lay submerged, undetected. We now appreciate the immense contribution of this phenomenon to genetic differences between individuals.”
The new map will change the way in which scientists search for genes involved in disease. While the SNP maps produced by the HapMap and other work are invaluable, most CNVs are missed by these maps. One striking example is resistance to infection by HIV, which is determined in part by multiple copies of the gene CCL3L1, and is essentially invisible to SNP-based maps of genomic variation.
“Many examples of diseases resulting from changes in copy number are emerging,” commented Charles Lee, one of the projects leaders from Brigham and Womens Hospital and Harvard Medical School in Boston, USA. “A recent review lists 17 conditions of the nervous system alone – including Parkinson’s Disease and Alzheimer Disease – that can result from such copy number changes.”
“Indeed, medical research will benefit enormously from this map, which provides new ways for identifying genes involved in common diseases.”
n comparing their results with the authoritative database of disease-related genes Online Mendelian Inheritance in Man, the team found that 10% of these genes were associated with CNVs. Genes that are involved in the immune system and in brain development and activity – two functions that have evolved rapidly in humans – tend to be enriched in CNVs. By contrast, genes that play a role in early development and some genes involved in cell division, both critical to fundamental biology, tend to be spared.
The conclusions are dramatic: “I believe this paper will change forever the field of human genetics,” commented Professor James R. Lupski, Vice Chair, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas. “One can no longer consider human traits as resulting primarily from single base-pair changes or influenced only by SNPs. With all due respect to Watson and Crick, many Mendelian and complex traits, as well as sporadic diseases, may indeed result from structural variation of the genome.”
The global CNV map is transforming medical research in four areas. The first and major area is in hunting genes underlying common diseases, which have not looked at CNVs to date. Second, the CNV map is being used in study of familial genetic conditions. Third, there are thousands of severe developmental defects caused by chromosomal rearrangements. The CNV map is being used to exclude variation found in unaffected individuals, helping researchers to home in on the region that might be involved. Finally, as with HIV, it will be possible to find variants that protect against other infectious diseases, such as malaria.
“In some ways, the methods we have used are molecular microscopes,” explained Dr Nigel Carter, another of the projects leaders at the Wellcome Trust Sanger Institute, “which have transformed the techniques used since the foundation of clinical genetics, where researchers used microscopes to look for visible deletions and rearrangements in chromosomes.”
“With these new tools, we and our clinical colleagues are able to find previously undetectable deletions or duplications of the genome in a patient. The CNV map now allows us to identify which of these changes are unique to the disease.”
To increase the value of the map to researchers, the Wellcome Trust Sanger Institute and its partners have developed a database of CNVs associated with clinical conditions. The database, called DECIPHER, allows researchers around the world to submit clinical information of patients with CNV details using the internet. This patient information is then mapped onto the human genome in the public ENSEMBL browser, which enables collaborative investigations of these rare disorders. In this way, DECIPHER has already helped in the identification of new syndromes with subsequent improvements in care and genetic advice for affected individuals and families.
“The wide variation between individuals in the number of repeated or deleted portions of our DNA has not been appreciated until now,” commented Dr Mark Walport, Director of the Wellcome Trust. “This important work will help identify genetic causes of many diseases. All of the new data is in the public domain emphasizing the commitment of research funders in making the results of research accessible to all.”
Copy number variation is the result of several different mechanisms, some of which remain poorly understood. Many studies to date suggest that larger CNVs occur in regions of the human genome that contain, or are flanked by, duplicated or repeated DNA sequences. Such regions are prone to errors when chromosomes are shuffledbefore being passed on from parent to child. Some smaller CNVs are not to be dependent on these repeated sequences. The new research identifies many more of these smaller CNVs and will greatly advance our understanding of what is perhaps the most poorly understood mutational process operating in the human genome.
The map also tells us something of our shared history. As a result of our recent common origin in Africa, the vast majority of copy-number variation – around 89% – is shared among the diverse human populations studied.
Nevertheless, the pattern of CNV that each of us inherits subtly reflects our ancestry and can be used to infer in which of the three continental populations our recent ancestry lies.
Striking differences in regions of our genome between different continental populations will define variants that have allowed different populations to adapt to their different environments. One example is the strikingly increased copy number of the HIV-related CCL3L1 gene in African populations. An understanding of how genetic variation is distributed among populations not only tells us about human prehistory but also improves our ability to find disease genes.