In Kbase, cloud computing brings disparate data streams into focus
Cold Spring Harbor, NY — In the decade that has passed since the completion of the first draft sequence of the human genome, biologists have grown increasingly aware of a problem ironically generated by the success of their work. Biological experiments in the age of genomics—including DNA sequencing, gene expression profiles, studies of cell-signaling pathways, protein binding, and other information-rich inquiries—generate quantities of raw data so immense that they threaten to overwhelm researchers’ ability to make sense of them.
Two Cold Spring Harbor Laboratory (CSHL) investigators are among the leaders of a multi-institutional effort announced this week by the U.S. Department of Energy (DOE) to address the problem in one particular area of research involving plant and microbial life. The team has been awarded funding to create out of many separate streams of biological information a single, integrated cyber-“knowledgebase” (called Kbase, for short) focused specifically on these two fundamentally important forms of life.
A knowledgebase is an essential tool of systems biology—an approach to the study of life that depends on integrating multiple information types and bringing them into meaningful relation, providing a basis to measure and model biological activity within an organism or across groups of organisms.
A particularly exciting aspect of the project is that it will enable scientists to discover currently unknown relationships that exist between species and between groups of species and the surrounding environment—interrelated and interdependent communities of microbes and plants, in this case.
“In contrast to a conventional database, a knowledgebase is really an entire body of knowledge,” explains Doreen Ware, Ph.D., of the U.S. Department of Agriculture and a CSHL Adjunct Associate Professor. “In Kbase we will focus on a specific assortment of plants and microbes that the Energy Department hopes to exploit to produce biofuels, to sequester carbon in the ecosystem, and to clean up environmental pollution.” Ware has been named principal investigator of the portion of Kbase devoted to plant life.
Quantitative biologist Michael Schatz, Ph.D., a CSHL Assistant Professor, is a co-investigator on Kbase whose work explains a key dimension of the project. “It’s not as if we have been asked to go out and grow or collect plants and microbes,” he says. “What we’ve really been challenged to do by the Department of Energy is to find ways of integrating different kinds of data and different kinds of tools that can be used to analyze those data.”
Schatz offers the analogy of Google, which enables anyone with internet access “to tap into all of human activity, all of human knowledge,” to the extent it has been recorded in digital form. Today, he notes, there is no portal like Google for scientists who work with plants and microbes. “There are many different ‘silos’ of information that have been painstakingly collected; and there are a number of existing tools that bring some strands of data into relation. But there is no overarching tool that can be used across silos,” Schatz says.
“We think by creating such a collection of tools and data sources, we’re going to be able to facilitate question-asking about huge datasets. It is our hope that this will help us make progress on improved ways to generate biofuels or on how to get the maximum yield out of plants even when the climate is very hot, dry, or wet. All of this knowledge is extractable from data that has already or is now being generated. The challenge is how, in a sense, to liberate it, so it can be put to use.”
Thanks to the power of cloud computing, scientists across institutions will be able to query Kbase in a highly flexible fashion, and on a democratized basis, since Kbase will be accessible to scientists everywhere. This will eliminate the need for science teams to separately gather and store essentially similar data sets, as a condition for conducting experiments.
The entire Kbase effort, spanning plants, microbes, and metacommunities (microbes in the context of the vast communities in which they live, both in the environment and within other living things) will be led by Adam Arkin of Lawrence Berkeley National Laboratory. Co-principal investigators include Rick Stevens of Argonne National Laboratory, Robert Cottingham of Oak Ridge National Laboratory, and Sergei Maslov of Long Island’s Brookhaven National Laboratory, who, in concert with CSHL’s Ware, will be deeply involved in the plant section of Kbase.
Founded in 1890, Cold Spring Harbor Laboratory has shaped contemporary biomedical research and education with programs in cancer, neuroscience, plant biology and quantitative biology. Home to eight Nobel Prize winners, the private, not-for-profit Laboratory employs 1,100 people including 600 scientists, students and technicians. The Meetings & Courses Program annually hosts more than 12,000 scientists. The Laboratory’s education arm also includes an academic publishing house, a graduate school and the DNA Learning Center with programs for middle and high school students and teachers. For more information, visit www.cshl.edu