In 1990, the effort to sequence the human genome began, led, in part, by Cold Spring Harbor Laboratory’s then-director Jim Watson. Scientists anticipated the project would reveal about 100,000 genes occupying the 3.3 billion bases of DNA. The human genome sequence would help us understand and treat thousands of diseases with genetic contributions. Sequencing cost between $2 and $5 per base, and advanced labs were capable of reading a couple thousand bases per day. And in 1990, Maria Nattestad was born in Copenhagen.
Today, we know the complete sequences of many individual human genomes, including Jim Watson’s. And we know that humans have only about 22,000 genes, meaning most of our DNA is doing something other than making proteins. We have genome sequences for fruit flies, tsetse flies, platypuses, oil palms, rice, wolves, pine trees, panda bears, and bananas, among many other species. The cost of sequencing has plummeted to about 5 cents per one million bases; at Cold Spring Harbor Laboratory’s Woodbury Genome Center, Dick McCombie’s group can sequence hundreds of billions of bases per day. And sequencing capacity worldwide is still growing. But we still haven’t made full use of the information in the genome. The limitation today is not reading the sequences but rather making sense of the billions and billions of bases.
Maria is now a graduate student in the Watson School at Cold Spring Harbor Laboratory. When she was 13 years old her family moved from Denmark to Las Vegas and then, three years later, to northern California, where she later majored in biology at the University of the Pacific. After studying gene expression in the lab as an undergraduate, Maria has decided to study how computer science can help us learn more about the genome. She is interested in developing methods that will provide the massive computational power needed for genome analysis. Sequencing capacity is currently far greater than computing capacity. Unless quantitative biologists, like Maria, can figure out more efficient computational methods, an enormous amount of DNA sequence data will go unused. That would mean losing critical information about how to best treat patients and missing out on ways to improve crops.
In Maria’s rotation project with Watson School Associate Professor Mike Schatz, she wanted to come up with a fast and accurate way of putting short DNA sequences back together into the long, linear sequences that make up the genome. In particular, Maria was interested in repetitive DNA segments that make up large chunks of the genome. These sequences appear in many contexts, over and over, but it’s impossible to know where they go without knowing the unique sequences that they’re next to. Despite the improvements in DNA sequencing technologies, it’s not always possible to get those sequences directly. So to find the sequences nearby, Maria relied on what happens when DNA wraps up inside the cell nucleus—nearby bits of DNA end up close enough together that they can be chemically linked to each other. Then the linked DNA can be sequenced, revealing both the repeated segment and the unique bit. Maria’s computational method uses DNA’s wrapping properties to accurately predict where these repetitive sequences sit within the entire genome. With this information, an important but poorly understood part of our DNA becomes accessible to study.
When the human genome project was initiated in the ‘90’s, Jim Watson insisted it should be completed in 15 years because he wanted to be around to benefit from the biological insight the sequence would bring. Almost a quarter century later, researchers are still developing the quantitative tools to understand the information stored within those billions and billions of bases of DNA. Maria now joins this major effort in genome analysis, at a time when computational methods are leading biological discovery. And Jim Watson is still looking forward to seeing how life works.