 |

|
Bioinformatics
On February 15, 2001, some 20 genome research centers around the world, including W. Richard
McCombie's group at Cold Spring Harbor Laboratory, jointly published the first draft of the complete
human genome sequence and made it freely available to researchers throughout the world.
The finished sequence is scheduled to be released in 2003, coincident with the 50th anniversary
of the discovery of the structure of DNA. The complete human genome sequence can be used in
a great many ways (see below) and has already fueled an explosion of biological and biomedical
research much of it directed toward improving human health.
Analysis of the raw DNA sequence through the use of various bioinformatics (computer-based)
methods revealed several interesting features of the human genome, including the preliminary finding
that our genome contains approximately 30,000Ð 40,000 protein-coding genes. A similar analysis
of restricted raw data by a private concern revealed most of the same features.
In a companion study that accompanied the publication of the human genome first draft,
Lincoln Stein, his CSHL colleagues, and other members of the International SNP Map Working
Group identified some 1.4 million single nucleotide polymorphisms (or SNPs)
distributed throughout the human genome. The high-density map of human
DNA sequence variation will become useful for identifying disease-related
genes, for tailoring therapies to those patients who are most likely to respond,
and for several other applications. Importantly, this project was jointly funded by
a consortium of pharmaceutical companies, private foundations, and the federal
government, providing a model for future ways of funding large projects and
making the data freely available to the broader scientific community.
To study and manipulate genes for a wide variety of research, diagnostic, or
therapeutic purposes, scientists need to determine the precise fine structure of
genes against a backdrop of what is frequently a vast and complex genetic
landscape. Conventional bioinformatics software fails when it comes to detecting
two important features of genes the very first segments of genes, and the
nearby "on" switches of genes called promoters.
Michael Zhang and his colleagues have developed a computer program called First Exon Finder
(or FirstEF) that is especially good at finding these first segments and "on" switches of genes. The
program is tailored toward detecting these features in the human genome sequence, but it is also
useful for annotating other mammalian genomes. Although the total number of genes in an organism's
genome depends on subtle, nonuniversal definitions of what constitutes a gene, on the basis of his analysis of the human genome using FirstEF, Michael believes that there are 50,000Ð 60,000 fundamental protein-coding human genes and that the pre-liminary estimate of 30,000Ð 40,000 human genes is too low. We will test these computer predictions and have established a joint research initiative involving the laboratories of Michael Zhang, Greg Hannon,
and Dick McCombie. These laboratories will experimentally verify the new pre-dicted genes. Greg will incorporate the findings into a project whose goal is to determine the function of human genes that have heretofore not been assigned
a function.
|