California researchers help map human genetic variation across populations
New study makes whole-genome association studies possible
San Diego and Berkeley, CA, February 17, 2005 -- Computer scientists at two research centers affiliated with the University of California have teamed with biologists from Perlegen Sciences, Inc., to map key genetic signposts across three human populations. Their study – published in the Feb. 18 issue of Science – could make widely accessible the analysis of human variation based on whole-genome data, and speed efforts to pinpoint DNA variations that are associated with disease or with how patients respond differently to drugs.
"This project sets a new milestone in the search for genetic elements linked to complex genetic diseases such as Alzheimer's, cancer and multiple sclerosis," said co-author David R. Cox, Chief Scientific Officer at Mountain View, CA-based Perlegen. "Genome-wide analysis may soon become a standard methodology in the search for more effective, individualized treatments."
Researchers at Perlegen sequenced the single-letter variations (called single-nucleotide polymorphisms, or SNPs) in the DNA of 71 individuals of European American, African American, and Han Chinese American ancestry. Subsequently, scientists at the California Institute for Telecommunications and Information Technology (Calit2) at the University of California, San Diego, and the UC Berkeley-affiliated International Computer Science Institute (ICSI) helped analyze the set of over 100 million genotypes from the over 1.5 million SNPs sequenced in each sample by Perlegen.
"This is the first time that a SNP data set of that scale is being sequenced," said Eran Halperin, a research scientist at Berkeley-based ICSI. "For each of the 23 pairs of chromosomes in human DNA, the resulting data set consisted of 71 genotypes, which mix together the information from both copies of the chromosome. To see a clearer picture of a variation, we really want to know the variation on each chromosome, and we can do that by inferring haplotypes – the sequences of nucleotide bases in each copy of the chromosome."
Halperin and Calit2 researcher Eleazar Eskin, who co-authored the study with Perlegen scientists, have pioneered a method for translating genotypes into haplotypes, using the HAP software tool they co-developed For this study, the bioinformatics researchers had to process more than 190 million data points. "Using other programs, haplotyping would require at least a few months of CPU time," said Eskin, an assistant professor in Computer Science and Engineering at UC San Diego's Jacobs School of Engineering. "Using HAP on a regular laptop, this work would take only 200 CPU hours. But we were able to use a cluster of computers from Calit2's OptIPuter project, and that allowed us to perform our final entire analysis in less than 12 hours."
Until now, due to the high cost of sequencing technology, disease association studies have traditionally been performed over short genomic regions. The Science study indicates that genome-wide association studies will now be possible for a considerably reduced budget, as scientists build on the publicly-available data and tools made available by Perlegen, ICSI and Calit2.
The researchers in San Diego and Berkeley also used the HAP tool to partition the human genome into 'blocks', or regions, of limited diversity. These are regions where only a few common patterns account for the majority of the variation in the population. The resulting haplotype 'maps' across the three populations appeared qualitatively similar to the maps compiled by Perlegen using a different technique called 'linkage disequilibrium' (LD). LD involves correlations of DNA variants in physical proximity along a chromosome, and results from a combination of processes including mutation, natural selection, and genetic drift. Linkage disequilibrium is complex and varies from one region of the genome to another, as well as between different populations. According to the study, "LD maps and haplotype maps represent somewhat different aspects of the local structure of genetic variation."
"The partitioning of genomes into highly correlated regions may be extremely useful for geneticists worldwide," added ICSI's Halperin. "They could choose to sequence a small subset of SNPs in each region, and use the high correlations between the different SNPs in order to predict the SNPs that were not sequenced."
The HAP study found substantially more blocks in the African American map than in the European American and Han Chinese maps, indicating that the greatest genetic diversity was in samples of African American descent (a finding consistent with previous studies).
Other findings in the Science paper, titled "Whole Genome Patterns of Common DNA Variation in Three Diverse Human Populations," include:
Most functional human genetic variation is not population-specific;
The majority of the 1.58 million SNPs with high-quality genotypes were common in all three populations; and
"Private SNPs" – those SNPs segregating in only one population sample – were only 18% of the total.
Maps of the haplotype structure and the variants that are common in each region can be downloaded from the Calit2 HAP site, which is hosted by the National Biomedical Computational Resource at UCSD (see Related Links below). "We hope that researchers interested in specific regions of the genome will use this site to obtain information on the human variation in those regions," said Calit2 director Larry Smarr. "This is a great example of the revolution in computational biology and its potential benefits to society in the study of cardiovascular disease, mental illness and other conditions thought to result from a complex interplay of multiple genetic and environmental factors."
The SNPs analyzed in the Science study represent only a fraction of the more than 10 million common SNPs expected to exist in the human genome. But researchers at Perlegen developed a mathematical algorithm to identify so-called 'tag SNPs' that provide guideposts for finding common variants in the human genome. "This study and software tools mean that you no longer have to wait to do whole-genome association studies," said Perlegen scientist David A. Hinds, lead author on the study. "We've effectively figured out how to reduce the genotyping burden by identifying a reduced set of tag SNPs, thus decreasing the difficulty and cost of association studies. That said, even when reducing to tag SNPs, we still need to be able to genotype at least several hundred thousand SNPs to have a comprehensive whole-genome association study."
"This research provides a tool for exploring many questions remaining regarding the causal role of common human DNA variation in complex human traits and for investigating the nature of genetic variation within and between human populations," the Science paper concludes.
Perlegen is also cooperating with the public-sector International HapMap Project, which is expected to release more detailed descriptions of genetic variations later this year. "We see these two efforts as complementary," said Perlegen's Hinds. "The HapMap project will yield a denser map, with more SNPs across a deeper set of individuals." HapMap will describe variation across individuals of Japanese, Chinese, Nigerian and European ancestry.
About ICSI The International Computer Science Institute (ICSI) is an independent, nonprofit research center affiliated with the University of California campus in Berkeley, California. Founded in 1986, ICSI provides a vibrant, international environment for approximately eighty scientists pursuing leading-edge research in networking, algorithms, bioinformatics, artificial intelligence, computational linguistics and spoken language processing. ICSI research is sponsored by a mix of government contracts, commercial partnerships and international visitor programs. www.icsi.berkeley.edu
About Calit2 The California Institute for Telecommunications and Information Technology (Calit2) is one of four California Institutes for Science and Innovation created in late 2000 by California to ensure that the state maintain its leadership in cutting-edge technologies and industries. Its mission: to extend the reach of the Internet throughout the physical world – enabling anywhere/anytime access to the Web. More than 200 faculty members from UC San Diego and UC Irvine are collaborating on interdisciplinary projects, with funding and other support from more than 50 industry partners. www.calit2.net