Quantitative Biology

Siepel LabThe Simons Center for Quantitative Biology (SCQB) is Cold Spring Harbor Laboratory’s home for mathematical, computational, and theoretical research in biology. Research at the SCQB focuses broadly on revealing how genomes work, how they evolve, and what makes them go wrong in disease. Members of the SCQB also develop computational tools and genomic technologies that are broadly useful to the community.

The SCQB is supported by a generous endowment from the Simons Foundation. Additional funding has been provided by the Starr Foundation and Lavinia and Landon Clay.

We are a growing group with positions at various levels.

SCQB non-faculty jobs

Simons Center for Quantitative Biology Annual Reports

2019 Annual Report (pdf)
2018 Annual Report (pdf)

Follow us

SCQB Twitter

Our faculty are experts in the mathematical and physical sciences who address open problems in biology, often in close collaboration with experimentalists. Most research in the center falls in the general areas of gene regulation, evolutionary genomics, disease-related human genomics, and genomic technology development. However, our work also touches on neuroscience, immunology, and plant biology, among other fields.

Members of the SCQB maintain close collaborative ties across CSHL and with many other New York area groups, including Stony Brook University and the New York Genome Center.



Adam Siepel, Ph.D.

QB Curriculum

Justin Kinney, Ph.D.

QB Seminar Series

Molly Hammell, Ph.D.

Center Staff

Sr. Scientific Administrator & Assistant to the Chair

Irene Gill

Sr. Scientific Administrator

Idee Mallardi

QB Science Manager

Katie Brenner

Quantitative Biology External Advisory Committee

This Simons Center for Quantitative Biology External Advisory Committee meets annually to provide strategic advice and general guidance.

Andrew G. Clark, Ph.D.
Professor, Department of Molecular Biology and Genetics
Cornell University

David L. Donoho, Ph.D.
Professor, Department of Statistics
Stanford University

Molly Przeworski, Ph.D.
Professor, Biological Sciences
Columbia University

Eric D. Siggia, Ph.D.
Head of Laboratory of Theoretical Condensed Matter Physics
The Rockefeller University

Eero P. Simoncelli, Ph.D.
Investigator, Howard Hughes Medical Institute
Silver Professors, New York University

Steven L. Salzberg, Ph.D. (Chair)
Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics
Director, Center for Computational Biology
Johns Hopkins University

As data generation has grown increasingly efficient and inexpensive, the interpretation of large data sets has emerged as a limiting step for advances in biology. Researchers at the SCQB aim to make sense of this “big data” through the development of innovative modeling, algorithmic, and machine-learning methods, drawing broadly from techniques in mathematics, computer science, and physics. Research in the center is diverse but is permeated by the following four major themes: Gene Regulation, Evolutionary Genomics, Genomic Disease Research, and Genomic Technology.

Gene Regulation

Kinney and McCandlish are interested in developing both theoretical and experimental methods, along with computational and mathematical tools, for elucidating the relationship between biological sequences and biological functions ranging from gene expression to protein function.

Hammell studies several topics related to gene regulation, including the behavior of small non-coding RNAs, inference of gene regulatory networks, and the impact of transposable elements on gene expression. She has also developed methods for the analysis of single-cell RNA-seq data.

Siepel is broadly interested in modeling the regulation of gene expression in mammals, ranging from transcription factor binding and chromatin accessibility, to transcription initiation and elongation, to the determination of RNA stability.

Fluidigm C1 scRNA-seq data
Highly expressed and variable genes were used to classify Fluidigm C1 scRNA-seq data. Higher levels of heterogeneity can be observed among 451Lu cells as compared to A375 cells. Genome Research, https://genome.cshlp.org/content/28/9/1353

Evolutionary Genomics

McCandlish develops theory and mathematics to address a number of open questions in evolutionary genetics, including the dynamics of evolution when mutation is rate-limiting or exhibits biased patterns, and the evolutionary implications of epistasis, i.e. interactions between mutations or genes.

Siepel uses evolutionary methods to identify regulatory elements, to reconstruct early human history, including interbreeding events with Neandertals, and to estimate the fitness consequences of new mutations in the human genome. He is also applying similar methods to agriculturally important plants such as maize and rice.

In addition, Iossifov uses evolutionary signatures to aid in the identification of genes associated with autism spectrum disorder, and Krasnitz uses phylogenetic methods to study the evolution of tumors.

evolutionary tree of ancient human demography
Gronau I, Hubisz MJ, Gulko B, Danko CG, and Siepel A. 2011. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet., 43: 1031–1034. https://www.nature.com/articles/ng.937

Genomic Disease Research

Iossifov aims to understand the genetics of autism spectrum disorder (ASD) through the analysis of large genomic data sets, in close collaboration with Mike Wigler’s research group and the New York Genome Center.

Krasnitz develops mathematical and statistical tools to characterize the cellular composition, genomic disruptions, evolutionary history, and invasive capacity of malignant tumors, often in collaboration with clinical oncologists.

Hammell studies the role of transposable element activation in neurodegenerative diseases, particularly amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD).

Atwal is interested in diverse modeling and statistical inference problems having to do with cancer and immunology, often through consideration of single-cell sequencing data.

diagnostic prostate biopsy results
Screen shot from a new genome-viewing program (single cell genomic viewer, or SCGV) that displays results of CSHL’s innovative method for analyzing diagnostic prostate biopsy results. Arrayed in columns from left to right are genomic profiles of each of several hundred prostate cells sampled from an individual’s 13 biopsy cores. In the uppermost section note the phylogenetic trees (green) that reflect each cell’s copy-number profile. Most of the cells are normal, but two areas of interest are evident (arrows). These are the locations of 2 clones and subclones in the biopsy cores, a strong signal of the presence of cancer. https://www.ncbi.nlm.nih.gov/pubmed/29180472

Genomic Technology Development

Levy, Krasnitz, and Iossifov work closely with the Wigler laboratory in the development of new DNA and RNA sequencing methods, single-cell genomic technologies, and cancer diagnostics.

Kinney is a pioneer in the development of massively parallel reporter assays for characterizing the relationship between regulatory sequences and gene expression, including both transcription and RNA splicing.

mpra fluoro
The sort-seq style of approach to massively parallel reporter assays (MPRAs). https://www.ncbi.nlm.nih.gov/pubmed/20439748. Illustration by Talitha Forcier

More detailed information about research at the SCQB is available from the faculty websites of the SCQB members and associate members.

In addition to its research activities, the SCQB serves as a hub for education, training and research in the quantitative life sciences.

For more information please contact SCQB@cshl.edu.

QB seminar event


SCQB Seminar Series

The SCQB Seminar Series is a weekly symposium featuring a rotating roster of graduate students, postdocs and invited guests. Seminars are held most Wednesdays at noon during the academic calendar year.

See a list of previously invited speakers (pdf).

QB Meetings and Conferences

Members and Associate Members of the SCQB faculty organize relevant QB Meetings and Conferences hosted at CSHL and around the NY area.

QB Scientific Tea

The SCQB community which includes faculty, postdocs, graduate students, staff and special guests are invited to attend weekly catered informal gatherings to discuss their research and other relevant topics.

Journal Clubs

Members of the SCQB host a bi-weekly Sequence/Function Journal club and a monthly Deep Learning journal club during the academic calendar year.

QB postdocs 2018

Opportunities for Postdoctoral Researchers

The Simons Center for Quantitative Biology Fellows Program

The CSHL Simons Center for Quantitative Biology Fellows Program supports research fellows, who function independently but with mentoring from the senior faculty. The program is designed for exceptional quantitative biologists who have recently received their Ph.D. or M.D. degree and who are sufficiently talented and experienced to forgo standard postdoctoral training.

Interdisciplinary Scholars in Experimental and Quantitative Biology Program (ISEQB)

The Interdisciplinary Scholars in Experimental and Quantitative Biology (ISEQB) is an innovative funding opportunity for postdoctoral research open to applications in all areas of research at CSHL, including genetics, cancer, plant biology and neuroscience. The ISEQB is designed to help recruit new postdocs or fund existing CSHL postdocs who are interested in both wet-lab and dry-lab research. This program aims to catalyze collaborative research as well as promote the growth of the QB community at CSHL. For details on how to apply, affiliates of CSHL can visit the ISEQB flyer.

QB coursework

Course Work

Watson School QB Bootcamp at CSHL

The Watson School QB Bootcamp is a 2.5-day rapid introduction to Python and the computer cluster at CSHL taught each Fall by the SCQB faculty to provide incoming Watson School students with working knowledge in programming in preparation for the full-semester Specialized Discipline Course in Quantitative Biology.

Specialized Discipline Course in Quantitative Biology at CSHL

The Specialized Discipline Course in Quantitative Biology is a 16-week course that aims to equip incoming Watson School students with basic training in computer programming, modern statistical methods and physical biology. Using a probabilistic and Bayesian approach, the course covers probabilities, statistical fluctuations, Bayesian inference, significance testing, fluctuations, diffusion, information theory, neural signal processing, dimensional reduction, Monte Carlo methods, population genetics and DNA sequence analyses.

Advanced Coursework in Quantitative Biology

The Simons Center for Quantitative Biology (SCQB) provides Advanced Coursework in Quantitative Biology to graduate students, postdocs and scientific staff. With support from the Watson School for Biological Sciences, we are currently offering Coursera’s Machine Learning through a Massive Open Online Course with onsite teaching assistance provided by SCQB postdoctoral researchers

Alexander J, Kendall J, McIndoo J, Rodgers L, Aboukhalil R, Levy D, Stepansky A, Sun G, Chobardjiev L, Riggs M, Cox H, Hakker I, Nowak DG, Laze J, Llukani E, Srivastava A, Gruschow S, Yadav SS, Robinson B, Atwal G, Trotman L, Lepor H, Hicks J, Wigler M, Krasnitz A. Utility of single-cell genomics in diagnostic evaluation of prostate cancer. Cancer Res. 78(2):348-358, 2018.

Buja A, Volfovsky N, Krieger AM, Lord C, Lash AE, Wigler M, Iossifov I. Damaging de novo mutations diminish motor skills in children on the autism spectrum. Proc. Natl. Acad. Sci. U.S.A., 115(8):E1859-E1866, 2018.

Danko CG, Choate LA, Marks BA, Rice EJ, Zhong W, Chu T, Martins AL, Dukler N, Coonrod SA, Wojno EDT, Lis JT, Kraus WL, Siepel A. Dynamic evolution of regulatory element ensembles in primate CD4+ T-cells. Nat. Ecol. Evol., 2(3):537-548, 2018.

Bienvenu F, Akcay E, Legendre S, and McCandlish DM. The genealogical decomposition of a matrix population model with applications to the aggregation of stages. Theor. Popul. Biol., 115:69–80, 2017.

Huang Y-F, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49(4):618–624, 2017.

Kato M, Vasco DA, Sugino R, Narushima D, Krasnitz A. Sweepstake evolution revealed by population-genetic analysis of copy-number alterations in single genomes of breast cancer. R. Soc. Open Sci., 4(9):171060, 2017.

Kumar V, Rosenbaum J, Zihua W, Forcier T, Ronemus M, Wigler M, Levy D. Partial bisulfite conversion for unique template sequencing. Nucleic Acids Res., 46(2):e10, 2017.

Stoltzfus A and McCandlish DM. Mutational Biases Influence Parallel Adaptation. Mol. Biol. Evol., 34(9):2163–2172, 2017.

Adams RM, Mora T*, Walczak AM*, Kinney JB*. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves. eLife 5:e23156, 2016.

Atwal GS, Kinney JB. Learning quantitative sequence–function relationships from massively parallel experiments. J Stat Phys. 162(5):1203–1243, 2016.

Kuhlwilm M, Gronau I, Hubisz MJ, de Filippo C, Prado J, Kircher M, Fu Q, Burbano HA, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovic D, Kucan Z, Gusic I, Marques-Bonet T, Andres AM, Viola B, Paabo S, Meyer M, Siepel A, and Castellano S. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature, 530(7591):429–433, 2016.

Wang Z, Andrews P, Kendall J, Ma B, Hakker I, Rodgers L, Ronemus M, Wigler M, Levy D. SMASH, a fragmentation and sequencing method for genomic copy number analysis. Genome Res. 26(6):844–851, 2016.

Members of the SCQB have created a number of freely available software tools and web resources for the research community. Here is a list of all the available software tools.


SUFTware logo

Developed by the Kinney lab, SUFTware (Statistics Using Field Theory) provides fast and lightweight Python implementations of Bayesian Field Theory algorithms for low-dimensional statistical inference. SUFTware currently supports the one-dimensional density estimation algorithm DEFT (Density Estimation Using Field Theory)


phast logo
Developed by the Siepel lab, Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package for comparative and evolutionary genomics. Best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes tools for phylogenetic modeling, and for manipulating alignments, trees and genomic annotations.


insight logo
Developed by the Siepel lab, INSIGHT is a method for inferring signatures of recent natural selection from patterns of polymorphism and divergence across a collection of short dispersed genomic elements.


argweaver logo
Developed by the Siepel lab, ARGweaver is a software package for sampling ancestral recombination graphs (ARGs) from multiple aligned genome sequences. It also provides tools for examining local genealogies, times to the most recent ancestor, allele ages, local coalescence rates, and other statistics that can be derived from the ARG.


Cold Spring Harbor Laboratory
Developed by the Hammell lab, TEToolkit is a package for including transposable elements in differential enrichment analysis of sequencing data sets.


Cold Spring Harbor Laboratory
Developed by the Hammell lab, Single-Cell RNA-seq Analysis and Klustering Evaluation (SAKE) is a robust platform for scRNA-seq analysis that provides quantitative statistical metrics at each step of the analysis pipeline.


egad logo
Developed by the Gillis Lab, EGAD is a package that implements a series of highly efficient tools to calculate functional properties of gene networks.


metaneighbor logo

Developed by the Gillis Lab, MetaNeighbor allows users to quantify cell type replicability across single-cell RNA data sets. For a broader array of software (including those packages outside of R) from the Gillis lab please visit: https://github.com/gillislab

Alexander Dobin

Alexander Dobin

Computational genomics; transcriptomics; epigenomics; gene regulation; big data; precision medicine

Tatiana Engel

Tatiana Engel

Computational and theoretical neuroscience; machine learning; statistical physics

Jesse Gillis

Jesse Gillis

Gene networks; gene function prediction; guilt by association; neuropsychiatric; hub genes; multifunctionality; computational genomics

Alexei Koulakov

Alexei Koulakov

Theoretical neurobiology; quantitative principles of cortical design; computer science; applied mathematics

W. Richard McCombie

Richard McCombie

Genomics of psychiatric disorders; genomics of cancer; computational genomics; plant genomics

Partha Mitra

Partha Mitra

Neuroscience and theoretical biology

Doreen Ware

Doreen Ware

Genomics; genome evolution; genetic diversity; gene regulation; plant biology; computational biology

Seungtai (Chris) Yoon

Seungtai (Chris) Yoon

Autism, SFARI, AGRE, SNVs, CNVs, whole-genomeexome sequencing, single-cell sequencing and bulksingle RNA sequencing

Molly Hammell

Molly Hammell

To ensure that cells function normally, tens of thousands of genes must be turned on or off together. To do this, regulatory molecules - transcription factors and non-coding RNAs – simultaneously control hundreds of genes. My group studies how the resulting gene networks function and how they can be compromised in human disease.

Ivan Iossifov

Ivan Iossifov

Every gene has a job to do, but genes rarely act alone. Biologists have built models of molecular interaction networks that represent the complex relationships between thousands of different genes. I am using computational approaches to help define these relationships, work that is helping us to understand the causes of common diseases including autism, bipolar disorder, and cancer.

Justin Kinney

Justin Kinney

From regulating gene expression to fighting off pathogens, biology uses DNA sequence information in many different ways. My research combines theory, computation, and experiment in an effort to better understand the quantitative relationships between DNA sequence and biological function. Much of my work is devoted to developing new methods in statistics and machine learning.

Peter Koo

Peter Koo

Deep learning has the potential to make a significant impact in biology and healthcare, but a major challenge is understanding the reasons behind their predictions. My research develops methods to interpret this powerful class of black box models, with a goal of elucidating data-driven insights into the underlying mechanisms of sequence-function relationships.

Alexander Krasnitz

Alexander Krasnitz

Many types of cancer display bewildering intra-tumor heterogeneity on a cellular and molecular level, with aggressive malignant cell populations found alongside normal tissue and infiltrating immune cells. I am developing mathematical and statistical tools to disentangle tumor cell population structure, enabling an earlier and more accurate diagnosis of the disease and better-informed clinical decisions.

Dan Levy

Dan Levy

We have recently come to appreciate that many unrelated diseases, such as autism, congenital heart disease and cancer, are derived from rare and unique mutations, many of which are not inherited but instead occur spontaneously. I am generating algorithms to analyze massive datasets comprising thousands of affected families to identify disease-causing mutations.

David McCandlish

David McCandlish

Some mutations are harmful but others are benign. How can we predict the effects of mutations, both singly and in combination? Using data from experiments that simultaneously measure the effects of thousands of mutations, I develop computational tools to predict the functional impact of mutations in protein coding sequences.

Hannah Meyer

Hannah Meyer

A properly functioning immune system must be able to recognize foreign invaders among the multitude of cells in the body. This ability is essential to both fight infection and prevent autoimmune diseases. We study how a specific type of immune cells, known as T cells, are educated to make this distinction during development.

Saket Navlakha

Saket Navlakha

Biological systems must solve problems to survive, and their solutions can be viewed as “algorithms.” Our goal is to uncover these algorithms, translate them to improve computer science, and use them to spark new hypotheses about biological function and dysfunction.

Adam Siepel

Adam Siepel

I am a computer scientist who is fascinated by the challenge of making sense of vast quantities of genetic data. My research group focuses in particular on questions involving human evolution and transcriptional regulation.