Ph.D., University of Maryland, 2010
DNA sequencing has become a critical and ubiquitous tool in biological research, but recent improvements in sequencing technologies are challenging our capacity to store and analyze the huge volume of data being generated. Moving forward, one of the main challenges facing computational biologists is the creation of analysis systems whose efficiency can match the dramatic improvements in sequencing throughput.
For example, the 1000 Genomes project aims to catalog the genomes of 1000 individuals from all regions of the globe. Each genome requires analysis of 100GB of data captured in billions of short DNA sequences called reads, and takes hundreds or thousands of hours to analyze with conventional algorithms. The scale of this project dwarfs previous studies, but is a mere milestone towards realizing the much larger goals of personal genomics.
If we have not yet reached the breaking point for traditional models of computation for biology, it is just over the horizon. The only long-term solution is to combine research in computational biology with advances from high performance computing. My research explores the best uses of this combination, including applying the MapReduce distributed programming model and massively parallel graphics processing units, towards problems in genomics.
Please visit Michael's Lab home page.
Schatz, M.C., Delcher, A.L., Salzberg, S.L. 2010. Assembly of large genomes using second-generation sequencing. Genome Res. 20: 1165-1173.
Langmead, B., Schatz, M.C., Lin, J., Pop, M., and Salzberg, S.L. 2009. Searching for SNPs with cloud computing. Genome Biol. 10: R134.
Schatz, M.C. 2009. CloudBurst: Highly Sensitive Read Mapping with MapReduce. Bioinformatics. 25:1363-1369.
Phillippy, A.M., Schatz, M.C., Pop, M. 2008. Genome Assembly forensics: finding the elusive mis-assembly. Genome Biol. 9: R55.Carlton, J.M., Hirt, R.P., Silva, J.C., Delcher, A.L., Schatz, M., et al. 2007. Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis. Science. 315: 207-212.