Mapping and sequencing of soybean genome paves the way for improved soybean crops

Soybean, one of the most important global sources of protein and oil, is now the first major crop legume species with a published complete draft genome sequence. This sequence, which essentially provides a parts list of the soybean genome, will help scientists use the plant’s genes to improve its characteristics. The soybean sequencing study appears as the cover story of the January 13 edition of Nature.

Value of the new soybean sequence

Scientists will use the new sequence to identify which genes are responsible for particular plant characteristics, and then target specific genes to produce desired characteristics. These desired characteristics may include increases in the plant’s oil content to promote the use of soybean oil as a biofuel; bigger crops; improved resistance to pests and diseases that currently claim large percentages of soybean crops; improvements in the digestibility of soybeans by animals and humans; and reductions in contaminants present in the manure of soybean-fed swine and poultry that may pollute farm runoff.

The research team plans to identify which soybean genes warrant targeting by:

Comparing the genomes of different varieties of soybean plants to one another.
Resequencing 20,000 soybean lines that are currently stored in the National Plant Germplasm System to identify desired variances of genes that are not currently captured by domesticated soybean lines.

“When soybeans were domesticated, they were selected for seed size and other traits, but there were a lot of potentially valuable genes left behind,” said Scott Jackson of Purdue University–the corresponding author on the soybean genome paper. “There may be valuable genes associated with protein content or disease resistance in the stored lines that are not currently in the cultivated lines.”

Having the new soybean sequence as a reference will significantly speed and reduce the costs of resequencing the 20,000 stored soybean lines.

A critical prerequisite to sequencing

The sequencing of the soybean genome was initiated by the production of a physical map of the soybean genome by a research team that was funded by the National Science Foundation (NSF). Production of this map was complicated by the complexities of the soybean genome. These complexities include duplicate copies of genes that account for 70 to 80 percent of the genome’s 46,000 genes. These gene copies are scattered throughout the genome and so are particularly difficult to locate.

In addition, the soybean genome contains large numbers of transposable elements, also known as TEs. TEs are mobile DNA pieces that may impact gene expression, but are difficult to distinguish from genes.

The research team conquered the complexities of the soybean genome and produced the map of the soybean genome, which has a lower resolution than the sequence, as a critical prerequisite to the study’s sequencing component. The map helped the researchers sequence the genome by enabling them to: 1) distinguish between TEs and genes during sequencing; and 2) break apart and then accurately reassemble the soybean genome as if it were a huge puzzle–as necessary to sequence the genome via the whole genome shotgun strategy.

A closely coordinated project

Because of the importance of the mapping project to the sequencing project, these two components of the study were closely coordinated. “The close coordination of support for this project,” said Jane Silverthorne of NSF, “was facilitated by the National Plant Genome Initiative, which is managed by the Interagency Working group on Plant Genomes, whose members include DOE, USDA and NSF.” Funding for the mapping/sequencing study was also provided by the United Soybean Board.

A complicated genome

Containing so many TEs and gene duplicates, the soybean genome is “the most complicated genome sequenced to date,” said Jackson. And some of the same complexities that complicated the mapping and sequencing of the genome may also complicate the targeting of soybean genes. “If I’m selecting for a gene, I may have difficulty locating all of the necessary duplicates of that gene, explains Jackson. “It has a lot of backup copies.”

Confident that such difficulties will be overcome, Silverthorne describes the new soybean sequence as “a valuable tool that will enable research towards a deeper understanding of the impacts of multiple genome copies on genome organization and function.” Indeed, Jackson says that techniques developed to map and sequence the soybean genome are already being applied to other sequencing projects.

What’s more, the results of the sequencing project have already provided grist for a second paper, which will appear in The Plant Cell on January 15, 2010. Jianxin Ma of Purdue University and a member of the sequencing team says that this second paper will explain how TEs thrive in the host genome: “We found that some ‘dead’ TEs can actually be revivified by swapping with their active TE partners, and thus restore or even enhance their ability to proliferate using the amplification machinery encoded by their partners. Although TEs are ubiquitous, what we discovered has not been seen in any other organisms yet.”

Related