Cold Spring Harbor, NY — An international effort to sequence the entire genome of the plant species Arabidopsis thaliana is now complete. This first-ever complete genome sequence from a plant has many implications for biology, medicine, agriculture, and the environment because it will enable detailed studies of the entire genetic structure of plants to be carried out. Such studies will yield a great deal of new information about the gene products that are involved in many aspects of plant growth and development, and how these gene products carry out their functions.
Despite its status as a diminutive relative of the mustard plant, Arabidopsis thaliana is a powerful tool in plant molecular biology and genetics. The short generation time and relatively compact genome of Arabidopsis (a flowering plant) make it an ideal model system for understanding numerous features of plant biology, including ones that are of great pharmaceutical or agricultural value.
The sequencing studies, reported in the December 14, 2000, issue of the journal Nature, provide new information about chromosome structure, evolution, and gene organization in plants. Among the many new genes discovered were several involved in disease resistance and intracellular signaling, as well as homologs of a number of human disease genes. Perhaps the most surprising result of these studies, and related studies published last year (see below), is the extent to which vast chromosomal regions have been duplicated in the Arabidopsis genome. In fact, the new study indicates that the evolution of Arabidopsis involved a whole-genome duplication, followed by gene loss and additional, extensive local gene duplications.
The Arabidopsis genome was found to contain 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of the fruit fly Drosophila and the soil nematode worm C. elegans-two other multicellular organisms whose genomes have been completely sequenced. However, Arabidopsis has many plant-specific families of proteins (e.g. transcription factors) and lacks several kinds of proteins common to vertebrates, Drosophila, and C. elegans (e.g. the signalling pathway proteins Wingless/Wnt, Hedgehog, Notch/lin12, JAK/STAT, TGF-beta/SMADs).
“We are several years ahead of schedule,” says Cold Spring Harbor Laboratory scientist W. Richard McCombie, referring to the progress that the international Arabidopsis Genome Initiative has made toward its goal of completing the sequencing project. “Throughout this endeavor, all of the groups involved have worked hard to share information, and that has made all the difference.”
The Arabidopsis genome contains approximately 125 million base pairs of DNA (125 Mb) distributed among five chromosomes. One year ago, a U.S. consortium lead by McCombie reported the DNA sequence of chromosome 4 in collaboration with The European Union Arabidopsis Genome Sequencing Consortium lead by Michael Bevan of the John Innes Centre (Norwich, UK). Cold Spring Harbor Laboratory scientist Robert Martienssen was instrumental in organizing the international sequencing effort at its outset in 1996, and played a major role in interpreting the chromosome 4 results (see section entitled “Plant Biology at Cold Spring Harbor Laboratory” below).
“The completion of the Arabidopsis genome sequence has profound implications for human health as well as plant biology and agriculture,” says Martienssen. In addition to McCombie and Martienssen, Ellson Chen of Perkin Elmer Biosystems based in Foster City, California, and Richard Wilson of the Washington University Medical School Genome Sequencing Center in St. Louis, Missouri, were principal investigators in the U.S. consortium that reported the chromosome 4 results last year.
A team of scientists at The Institute for Genomic Research (TIGR) in Rockville, Maryland, lead by J. Craig Venter determined the DNA sequence of Arabidopsis chromosome 2, which was reported in Nature last year together with the chromosome 4 results. The complete sequences of chromosome 2 (19 Mb) and chromosome 4 (17 Mb) represented roughly one-third of the plant’s genome.
Today, the Arabidopsis Genome Initiative announces that it has completed the DNA sequence of the remaining chromosomes, which represent two-thirds of the entire genome. The principal teams in the new report are:
Chromosome 1 TIGR; Stanford Genome Technology Center; Plant Sciences Institute, University of Pennsylvania; Plant Gene Expression Center, UC Berkeley
Chromosome 3 European Union Arabidopsis Genome Sequencing Consortium; TIGR; Kazusa DNA Research Institute
Chromosome 5 Kazusa DNA Research Institute; The Cold Spring Harbor and Washington University in St. Louis Sequencing Consortium; European Union Arabidopsis Genome Sequencing Consortium
The major supporters of the U.S. sequencing effort were the National Science Foundation, the U.S. Department of Agriculture, and the U.S. Department of Energy.
The potential function of approximately 70 percent of the 25,498 genes of Arabidopsis can be predicted based on their similarity to other genes of known function in Arabidopsis or other organisms. However, the functions of the remaining 30% of Arabidopsis genes are unknown, and only 9% of Arabidopsis genes have been characterized experimentally. Future studies of the Arabidopsis genome and the proteins it encodes (particularly those with no known function) will be greatly facilitated by combining the new DNA sequence information with a multitude of existing genetic and molecular biological strategies and resources that are available to Arabidopsis researchers (for example, see the “gene trap” transposable element strategy described in the section entitled “Plant Biology at Cold Spring Harbor Laboratory” below).
McCombie says that the pace of the Arabidopsis sequencing project was accelerated by a first-of-its-kind effort to use high-throughput “whole-genome random BAC fingerprint analysis” to map a large eukaryotic genome in its entirety and provide an ordered set of DNA clones for sequencing (BAC, bacterial artificial chromosome). This analysis of the Arabidopsis genome was completed by Wilson, Marco Marra, and their colleagues at the Washington University Medical School Genome Sequencing Center with assistance from McCombie, Martienssen, and Larry Parnell of Cold Spring Harbor Laboratory. The random BAC fingerprinting technique has rapidly become the method of choice for mapping and sequencing the comparatively large genomes of other eukaryotic organisms, including humans. The human genome contains an estimated 3.2 billion base pairs of DNA, roughly 25 times more than Arabidopsis.
For streaming video about this story, visit:
B-roll is also available
For more information about the Arabidopsis Genome Initiative, visit: