Search 
Home Research Education Meetings & Courses CSHL Press Contact Us
 


2000 Annual Report Index
Officers & Trustees
Director's Report
Highlights of 2001
Administration



Bioinformatics

On February 15, 2001, some 20 genome research centers around the world, including W. Richard McCombie's group at Cold Spring Harbor Laboratory, jointly published the first draft of the complete human genome sequence and made it freely available to researchers throughout the world. The finished sequence is scheduled to be released in 2003, coincident with the 50th anniversary of the discovery of the structure of DNA. The complete human genome sequence can be used in a great many ways (see below) and has already fueled an explosion of biological and biomedical research much of it directed toward improving human health.

Analysis of the raw DNA sequence through the use of various bioinformatics (computer-based) methods revealed several interesting features of the human genome, including the preliminary finding that our genome contains approximately 30,000Ð 40,000 protein-coding genes. A similar analysis of restricted raw data by a private concern revealed most of the same features. In a companion study that accompanied the publication of the human genome first draft, Lincoln Stein, his CSHL colleagues, and other members of the International SNP Map Working Group identified some 1.4 million single nucleotide polymorphisms (or SNPs) distributed throughout the human genome. The high-density map of human DNA sequence variation will become useful for identifying disease-related genes, for tailoring therapies to those patients who are most likely to respond, and for several other applications. Importantly, this project was jointly funded by a consortium of pharmaceutical companies, private foundations, and the federal government, providing a model for future ways of funding large projects and making the data freely available to the broader scientific community.

To study and manipulate genes for a wide variety of research, diagnostic, or therapeutic purposes, scientists need to determine the precise fine structure of genes against a backdrop of what is frequently a vast and complex genetic landscape. Conventional bioinformatics software fails when it comes to detecting two important features of genes the very first segments of genes, and the nearby "on" switches of genes called promoters.

Michael Zhang and his colleagues have developed a computer program called First Exon Finder (or FirstEF) that is especially good at finding these first segments and "on" switches of genes. The program is tailored toward detecting these features in the human genome sequence, but it is also useful for annotating other mammalian genomes. Although the total number of genes in an organism's genome depends on subtle, nonuniversal definitions of what constitutes a gene, on the basis of his analysis of the human genome using FirstEF, Michael believes that there are 50,000Ð 60,000 fundamental protein-coding human genes and that the pre-liminary estimate of 30,000Ð 40,000 human genes is too low. We will test these computer predictions and have established a joint research initiative involving the laboratories of Michael Zhang, Greg Hannon, and Dick McCombie. These laboratories will experimentally verify the new pre-dicted genes. Greg will incorporate the findings into a project whose goal is to determine the function of human genes that have heretofore not been assigned a function.



Next Page


Copyright © 2008, Cold Spring Harbor Laboratory
  Privacy Policy