Written by Michael Hübner, Postdoctoral researcher in Spector Lab
Today, we welcome guest blogger Michael Hübner, a postdoctoral researcher in Professor David Spector’s lab. Dr. Hübner co-founded the Bioscience Enterprise Club, a resource for students and postdocs to explore science careers outside of academia.
300,000,000 sequence fragments. 20,000 megabytes of data. That is the amount of data that a CSHL researcher generates when reading the 3 billion letters in the human genome. Such experiments will reveal which of our more than 20,000 genes are turned on or off in a particular cell type, and in diseases such as cancer, Alzheimer’s disease or autism. This information will allow us to better understand the causes for these diseases, and ultimately, help us design therapies for them. Multiply this amount of data by the number of experiments run labwide at CSHL each day, and you have Big Data.
Sequencing the human genome or the active genes in a particular cell type used to be a major effort until a few years ago. Today, it has almost become a routine experiment that a single scientist, together with the state-of-the art sequencing facility at CSHL (pdf), can accomplish in 2 weeks. Generating data is not the problem anymore—it is how to store, manage and analyze the massive amounts of data. This is not a task that any home computer or office software can handle. Biologists now realize they must meet the challenge of learning the computer and programming skills needed to analyze their data on the CSHL supercomputer framework.
Together with other members of the Bioscience Enterprise Club (BEC), I wanted to help scientists learn these critical skills and provide basic bioinformatics training for the scientific community at CSHL. So BEC joined with the iPlant initiative and Software Carpentry, to organize a 2-day computational workshop. The course was fully booked with 40 scientists not only from CSHL, but also from The New York Genome Center, Stony Brook University, City University of New York, and the New York Botanical Garden (which sequences plants). The feedback we received after the course told us that many scientists want to continue learning these skills, and we started a Bioinformatics Working Group to provide regular workshops and training. With genome-wide sequencing projects becoming more and more feasible—and affordable—biomedical research and computational data analysis will continue to merge.
If you are interested in Big Data at CSHL, and how it affects many areas of our life, you are invited to a public lecture by CSHL Assistant Professor Mike Schatz on Wednesday evening, June 18th, at 7pm at Grace Auditorium. Mike’s talk is titled: “Big Data: How biological data science can improve our health, foods and energy.” We hope to see you there!