Big Data meets DNA

June 17, 2014
Michael Hübner | Postdoctoral researcher in Spector Lab

Today, we welcome guest blogger Michael Hübner, a postdoctoral researcher in Professor David Spector’s lab. Dr. Hübner co-founded the Bioscience Enterprise Club, a resource for students and postdocs to explore science careers outside of academia.

300,000,000 sequence fragments. 20,000 megabytes of data. That is the amount of data that a CSHL researcher generates when reading the 3 billion letters in the human genome. Such experiments will reveal which of our more than 20,000 genes are turned on or off in a particular cell type, and in diseases such as cancer, Alzheimer’s disease or autism. This information will allow us to better understand the causes for these diseases, and ultimately, help us design therapies for them. Multiply this amount of data by the number of experiments run labwide at CSHL each day, and you have Big Data.

Sequencing the human genome or the active genes in a particular cell type used to be a major effort until a few years ago. Today, it has almost become a routine experiment that a single scientist, together with the state-of-the art sequencing facility at CSHL (pdf), can accomplish in 2 weeks. Generating data is not the problem anymore—it is how to store, manage and analyze the massive amounts of data. This is not a task that any home computer or office software can handle. Biologists now realize they must meet the challenge of learning the computer and programming skills needed to analyze their data on the CSHL supercomputer framework.

Together with other members of the Bioscience Enterprise Club (BEC), I wanted to help scientists learn these critical skills and provide basic bioinformatics training for the scientific community at CSHL. So BEC joined with the iPlant initiative and Software Carpentry, to organize a 2-day computational workshop. The course was fully booked with 40 scientists not only from CSHL, but also from The New York Genome Center, Stony Brook University, City University of New York, and the New York Botanical Garden (which sequences plants). The feedback we received after the course told us that many scientists want to continue learning these skills, and we started a Bioinformatics Working Group to provide regular workshops and training. With genome-wide sequencing projects becoming more and more feasible—and affordable—biomedical research and computational data analysis will continue to merge.

If you are interested in Big Data at CSHL, and how it affects many areas of our life, you are invited to a public lecture by CSHL Assistant Professor Mike Schatz on Wednesday evening, June 18^th, at 7pm at Grace Auditorium. Mike’s talk is titled: “Big Data: How biological data science can improve our health, foods and energy.” We hope to see you there!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

LabDish Blog

Contact

Connect with CSHL

Stay informed

DISCOVER: Related stories

DNA Learning Center takes New York

Making sense of the genome…at last

Portrait of a Neuroscience Powerhouse