In the midst of a heated public debate about genetically modified food, scientists have quietly reached a significant milestone in plant biology analogous to the recent sequencing of an entire human chromosome. Despite its status as a diminutive relative of the mustard plant, Arabidopsis thaliana has emerged as a powerful tool in plant molecular biology and genetics. The short generation time and relatively compact genome of Arabidopsis make it an ideal model system for understanding numerous features of plant biology, including ones that are of great pharmaceutical or agricultural value. Now, researchers involved in an international effort to sequence the entire genome of Arabidopsis have obtained—for the first time—the complete DNA sequence of chromosomes from a plant species.
The sequencing studies, reported in the December 16, 1999, issue of the journal Nature, provide a great deal of new information about chromosome structure, evolution, and gene organization in plants. Among the many new genes discovered were several involved in disease resistance and intracellular signaling, as well as homologs of a number of human disease genes. One surprising result of the studies is the extent to which vast chromosomal regions have been duplicated in the Arabidopsis genome.
“We are three or four years ahead of schedule,” says Cold Spring Harbor Laboratory scientist W. Richard McCombie, referring to the progress of the international Arabidopsis Genome Initiative toward its goal of completing the sequencing project. “This is due largely to the fact that throughout this endeavor, all of the groups involved have worked hard to share information.”
The Arabidopsis genome contains an estimated 130 million base pairs of DNA (130 Mb) distributed among five chromosomes. McCombie lead a U.S. consortium that determined the DNA sequence of chromosome 4 in collaboration with The European Union Arabidopsis Genome Sequencing Consortium lead by Michael Bevan of the John Innes Centre (Norwich, UK). Cold Spring Harbor Laboratory scientist Robert Martienssen was instrumental in organizing the international sequencing effort at its outset in 1996, and played a major role in interpreting the chromosome 4 results.
“This is a landmark achievement for the Arabidopsis Genome Initiative,” says Martienssen. “The implications for plant biology are profound.” In addition to McCombie and Martienssen, Ellson Chen of Perkin Elmer Biosystems based in Foster City, California, and Richard Wilson of the Washington University Medical School Genome Sequencing Center in St. Louis, Missouri, were principal investigators in the U.S. consortium.
A team of scientists at The Institute for Genomic Research in Rockville, Maryland, lead by J. Craig Venter determined the DNA sequence of Arabidopsis chromosome 2. The complete sequences of chromosome 2 (19 Mb) and chromosome 4 (17 Mb) represent roughly one-third of the plant’s genome. McCombie predicts that sequencing of the entire Arabidopsis genome by will be completed by the end of 2000. The U.S. sequencing effort is being funded by the National Science Foundation, the U.S. Department of Agriculture, and the U.S. Department of Energy.
Analysis of the chromosome 4 sequence and comparison of this sequence to that of chromosome 2 revealed several interesting features. Perhaps the most striking feature is the extent to which individual genes and entire blocks of chromosomal regions have been duplicated in the Arabidopsis genome. For example, a very large stretch of DNA is duplicated on chromosomes 2 and 4. This duplication represents approximately one-fifth of the total length of each of these chromosomes, and its existence in plants supports the emerging view that large-scale intragenomic duplications may significantly affect genome evolution in many organisms.
The potential function of approximately 60 percent of the 3,744 proteins encoded by chromosome 4 of Arabidopsis can be predicted based on their similarity to other proteins of known function in Arabidopsis or other organisms. However, the functions of the remaining 40% of the proteins encoded by chromosome 4 of Arabidopsis are unknown. Future studies of the Arabidopsis genome and the proteins it encodes (particularly those with no known function) will be greatly facilitated by combining the new DNA sequence information with a multitude of existing genetic and molecular biological strategies and resources that are available to Arabidopsis researchers.
McCombie says that the pace of the Arabidopsis sequencing project was accelerated by a first-of-its-kind effort to use high-throughput “whole-genome random BAC fingerprint analysis” to map a large eukaryotic genome in its entirety and provide an ordered set of DNA clones for sequencing (BAC, bacterial artificial chromosome). This analysis of the Arabidopsis genome was completed by Wilson, Marco Marra, and their colleagues at the Washington University Medical School Genome Sequencing Center with assistance from McCombie, Martienssen, and Larry Parnell of Cold Spring Harbor Laboratory. The random BAC fingerprinting technique has rapidly become the method of choice for mapping and sequencing the comparatively large genomes of other eukaryotic organisms, including humans. The human genome contains an estimated 3.2 billion base pairs of DNA, roughly 25 times more than Arabidopsis. Francis Collins, director of the National Human Genome Research Institute, predicts that 90 percent of the human genome sequence will be available by the spring of 2000, with a complete human DNA sequence available in 2002 or 2003.
More information about the Arabidopsis Genome Initiative.
Written by: Communications Department | publicaffairs@cshl.edu | 516-367-8455