CSHL Menu

Quantitative Biology

Siepel LabThe Simons Center for Quantitative Biology (SCQB) is Cold Spring Harbor Laboratory’s home for mathematical, computational, and theoretical research in biology. Research at the SCQB focuses broadly on revealing how genomes work, how they evolve, and what makes them go wrong in disease. Members of the SCQB also develop computational tools and genomic technologies that are broadly useful to the community. The SCQB is supported by a generous endowment from the Simons Foundation. Additional funding has been provided by the Starr Foundation and Lavinia and Landon Clay.

Announcements

The SCQB is a growing group with positions at various levels. View openings and apply »

We are accepting applications for our new postdoc training program in machine learning. Learn more and apply »

Follow us

SCQB Twitter

Our faculty are experts in the mathematical and physical sciences who address open problems in biology, often in close collaboration with experimentalists. Most research in the center falls in the general areas of gene regulation, evolutionary genomics, disease-related human genomics, and genomic technology development. However, our work also touches on neuroscience, immunology, and plant biology, among other fields.

Members of the SCQB maintain close collaborative ties across CSHL and with many other New York area groups, including Stony Brook University and the New York Genome Center.

Leadership

Chair

Adam Siepel, Ph.D.

QB Curriculum

Justin Kinney, Ph.D.

QB Seminar Series

Molly Hammell, Ph.D.

Center Staff

Sr. Scientific Administrator & Assistant to the Chair

Susan Fredricks

Sr. Scientific Administrator

Idee Mallardi

QB Science Manager

Katie Brenner

Quantitative Biology External Advisory Committee

This Simons Center for Quantitative Biology External Advisory Committee meets annually to provide strategic advice and general guidance.

Andrew G. Clark, Ph.D.
Professor of Molecular Biology and Genetics
Cornell University

David L. Donoho, Ph.D.
Anne T. and Robert M. Bass Professor of Humanities and Sciences
Professor of Statistics
Stanford University

Molly Przeworski, Ph.D.
Professor of Biological Sciences and Systems Biology
Columbia University

Eric D. Siggia, Ph.D.
Viola Ward Brinning and Elbert Calhoun Brinning Professor
Head of Laboratory of Theoretical Condensed Matter Physics
The Rockefeller University

Eero P. Simoncelli, Ph.D.
Silver Professor of Neural Science, Mathematics, Data Science, and Psychology | New York University
Scientific Director | Center for Computational Neuroscience, FlatIron Institute, Simons Foundation

Steven L. Salzberg, Ph.D. (Chair)
Bloomberg Distinguished Professor of Biomedical Engineering, Computer Science, and Biostatistics
Director, Center for Computational Biology
Johns Hopkins University

Simons Center for Quantitative Biology Annual Reports

As data generation has grown increasingly efficient and inexpensive, the interpretation of large data sets has emerged as a limiting step for advances in biology. Researchers at the SCQB aim to make sense of this “big data” through the development of innovative modeling, algorithmic, and machine-learning methods, drawing broadly from techniques in mathematics, computer science, and physics. Research in the center is diverse but is permeated by the following four major themes: Gene Regulation, Evolutionary Genomics, Genomic Disease Research, and Genomic Technology.

Gene Regulation

Kinney and McCandlish are interested in developing both theoretical and experimental methods, along with computational and mathematical tools, for elucidating the relationship between biological sequences and biological functions ranging from gene expression to protein function.

Hammell studies several topics related to gene regulation, including the behavior of small non-coding RNAs, inference of gene regulatory networks, and the impact of transposable elements on gene expression. She has also developed methods for the analysis of single-cell RNA-seq data.

Siepel is broadly interested in modeling the regulation of gene expression in mammals, ranging from transcription factor binding and chromatin accessibility, to transcription initiation and elongation, to the determination of RNA stability.

Meyer studies central T cell tolerance induction and gene regulation in the thymus. Her work combines genomics studies with in silico models to understand the fundamental principles of thymus biology.

Koo studies the functional impact of genomic mutations through a computational lens using data-driven  machine learning solutions. He is broadly interested in applications for studying gene regulation and protein (dys) function.

 

Fluidigm C1 scRNA-seq data
Highly expressed and variable genes were used to classify Fluidigm C1 scRNA-seq data. Higher levels of heterogeneity can be observed among 451Lu cells as compared to A375 cells. Genome Research, https://genome.cshlp.org/content/28/9/1353

Evolutionary Genomics

McCandlish develops theory and mathematics to address a number of open questions in evolutionary genetics, including the dynamics of evolution when mutation is rate-limiting or exhibits biased patterns, and the evolutionary implications of epistasis, i.e. interactions between mutations or genes.

Siepel uses evolutionary methods to identify regulatory elements, to reconstruct early human history, including interbreeding events with Neandertals, and to estimate the fitness consequences of new mutations in the human genome. He is also applying similar methods to agriculturally important plants such as maize and rice.

In addition, Iossifov uses evolutionary signatures to aid in the identification of genes associated with autism spectrum disorder, and Krasnitz uses phylogenetic methods to study the evolution of tumors.

Navlakha works at the interface of theoretical computer science, machine learning, and systems biology. He primarily studies how collections of molecules, cells, and organisms process information and solve interesting computational problems critical for survival.

evolutionary tree of ancient human demography
Gronau I, Hubisz MJ, Gulko B, Danko CG, and Siepel A. 2011. Bayesian inference of ancient human demography from individual genome sequences. Nat. Genet., 43: 1031–1034. https://www.nature.com/articles/ng.937

Genomic Disease Research

Iossifov aims to understand the genetics of autism spectrum disorder (ASD) through the analysis of large genomic data sets, in close collaboration with Mike Wigler’s research group and the New York Genome Center.

Krasnitz develops mathematical and statistical tools to characterize the cellular composition, genomic disruptions, evolutionary history, and invasive capacity of malignant tumors, often in collaboration with clinical oncologists.

Hammell studies the role of transposable element activation in neurodegenerative diseases, particularly amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD).

diagnostic prostate biopsy results
Screen shot from a new genome-viewing program (single cell genomic viewer, or SCGV) that displays results of CSHL’s innovative method for analyzing diagnostic prostate biopsy results. Arrayed in columns from left to right are genomic profiles of each of several hundred prostate cells sampled from an individual’s 13 biopsy cores. In the uppermost section note the phylogenetic trees (green) that reflect each cell’s copy-number profile. Most of the cells are normal, but two areas of interest are evident (arrows). These are the locations of 2 clones and subclones in the biopsy cores, a strong signal of the presence of cancer. https://www.ncbi.nlm.nih.gov/pubmed/29180472

Genomic Technology Development

Levy, Krasnitz, and Iossifov work closely with the Wigler laboratory in the development of new DNA and RNA sequencing methods, single-cell genomic technologies, and cancer diagnostics.

Kinney is a pioneer in the development of massively parallel reporter assays for characterizing the relationship between regulatory sequences and gene expression, including both transcription and RNA splicing.

mpra fluoro
The sort-seq style of approach to massively parallel reporter assays (MPRAs). https://www.ncbi.nlm.nih.gov/pubmed/20439748. Illustration by Talitha Forcier

More detailed information about research at the SCQB is available from the faculty websites of the SCQB members and associate members.

In addition to its research activities, the SCQB serves as a hub for education, training and research in the quantitative life sciences.

For more information please contact SCQB@cshl.edu.

QB seminar event

Events

SCQB Seminar Series

The SCQB Seminar Series is a weekly symposium featuring a rotating roster of graduate students, postdocs and invited guests. Seminars are held most Wednesdays at noon during the academic calendar year.

QB Meetings and Conferences

Members and Associate Members of the SCQB faculty organize relevant QB Meetings and Conferences hosted at CSHL and around the NY area.

  • Probabilistic Modeling in Genomics
  • Biological Data Science
  • NY Populations Genomics Workshop

QB Scientific Tea

The SCQB community which includes faculty, postdocs, graduate students, staff and special guests are invited to attend weekly catered informal gatherings to discuss their research and other relevant topics.

Journal Clubs

Members of the SCQB host a bi-weekly Sequence/Function Journal club and a monthly Deep Learning journal club during the academic calendar year.

QB postdocs 2018

Opportunities for Postdoctoral Researchers

The CSHL Fellows Program

The CSHL Fellows Program supports research fellows, who function independently but with mentoring from the senior faculty. The program is designed for exceptional quantitative biologists who have recently received their Ph.D. or M.D. degree and who are sufficiently talented and experienced to forgo standard postdoctoral training.

Interdisciplinary Scholars in Experimental and Quantitative Biology Program (ISEQB)

The Interdisciplinary Scholars in Experimental and Quantitative Biology (ISEQB) is an innovative funding opportunity for postdoctoral research open to applications in all areas of research at CSHL, including genetics, cancer, plant biology and neuroscience. The ISEQB is designed to help recruit new postdocs or fund existing CSHL postdocs who are interested in both wet-lab and dry-lab research. This program aims to catalyze collaborative research as well as promote the growth of the QB community at CSHL.

QB coursework

Course Work

School of Biological Sciences QB Bootcamp at CSHL

The School of Biological Sciences QB Bootcamp is a 2.5-day rapid introduction to Python and the computer cluster at CSHL taught each Fall by the SCQB faculty to provide incoming students with working knowledge in programming in preparation for the full-semester Specialized Discipline Course in Quantitative Biology.

Specialized Discipline Course in Quantitative Biology at CSHL

The Specialized Discipline Course in Quantitative Biology is a 16-week course that aims to equip incoming students with basic training in computer programming, modern statistical methods and physical biology. Using a probabilistic and Bayesian approach, the course covers probabilities, statistical fluctuations, Bayesian inference, significance testing, fluctuations, diffusion, information theory, neural signal processing, dimensional reduction, Monte Carlo methods, population genetics and DNA sequence analyses.

Advanced Coursework in Quantitative Biology

The Simons Center for Quantitative Biology (SCQB) provides Advanced Coursework in Quantitative Biology to graduate students, postdocs and scientific staff through independent study programs and online coursework.

Chen WC, Zhou J, Sheltzer JM, Kinney JB, McCandlish DM. Non-parametric Bayesian density estimation for biological sequence space with applications to pre-mRNA splicing and the karyotypic diversity of human cancer. bioRxiv, 2020.

Chorbadjiev, L., J. Kendall, J. Alexander, …, A. Krasnitz (2020). “Integrated Computational Pipeline for Single-Cell Genomic Profiling.” JCO Clin Cancer Inform 4: 464-471.

Hejase, H. A., A. Salman-Minkov, L. Campagna, …, A. Siepel (2020). “Genomic islands of differentiation in a rapid avian radiation have been driven by recent selective sweeps.” Proc Natl Acad Sci USA.

Hubisz, M. J., A. L. Williams and A. Siepel (2020). “Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph.” PLoS Genet 16(8): e1008895.

Ireland, W. T., S. M. Beeler, E. Flores-Bautista, …, J. B. Kinney and R. Phillips (2020). “Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time.” Elife 9.

Koo PK, Ploenzke M. Improving representations of genomic sequence motifs in convolutional networks with exponential activations. Nat. Machine Intel. 3:258-266, 2021.

Koo PK, Ploenzke M. Deep learning for inferring transcription factor binding sites. Curr Opin Syst Biol. 19:16- 23, 2020

Li, S., J. Kendall, S. Park, …, A. Krasnitz, D. Levy and M. Wigler (2020). “Copolymerization of single-cell nucleic acids into balls of acrylamide gel.” Genome Res 30(1): 49-61.

McCandlish, DM and G. I. Lang (2020). “Evolution of epistasis: small populations go their separate ways.” J Mol Evol 88(5): 418-420.

Meyer, HV, T. J. W. Dawes, M. Serrani, W. Bai, et al. (2020). “Genetic and functional insights into the fractal structure of the heart.” Nature 584(7822): 589-594.

O’Neill K, Brocks D, Gale Hammell M. Mobile genomics: tools and techniques for tackling transposons. Philos Trans R Soc Lond B Biol Sci 375(1795):20190345, 2020.

Shen, Y., S. Dasgupta and S. Navlakha (2020). “Habituation as a neural algorithm for online odor discrimination.” Proc Natl Acad Sci USA 117(22): 12402-12410.

Tam, O. H., N. V. Rozhkov, R. Shaw, …, M. Gale Hammell (2019). “Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia.” Cell Rep 29(5): 1164-1177.e1165.

Zhou, J. and D. M. McCandlish (2020). “Minimum epistasis interpolation for sequence-function relationships.” Nat Commun 11(1): 1782.

Members of the SCQB have created a number of freely available software tools and web resources for the research community. Here is a list of all the available software tools.

SUFTware

SUFTware logo

Developed by the Kinney lab, SUFTware (Statistics Using Field Theory) provides fast and lightweight Python implementations of Bayesian Field Theory algorithms for low-dimensional statistical inference. SUFTware currently supports the one-dimensional density estimation algorithm DEFT (Density Estimation Using Field Theory)

PHAST

phast logo
Developed by the Siepel lab, Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package for comparative and evolutionary genomics. Best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes tools for phylogenetic modeling, and for manipulating alignments, trees and genomic annotations.

INSIGHT

insight logo
Developed by the Siepel lab, INSIGHT is a method for inferring signatures of recent natural selection from patterns of polymorphism and divergence across a collection of short dispersed genomic elements.

ARGweaver

argweaver logo
Developed by the Siepel lab, ARGweaver is a software package for sampling ancestral recombination graphs (ARGs) from multiple aligned genome sequences. It also provides tools for examining local genealogies, times to the most recent ancestor, allele ages, local coalescence rates, and other statistics that can be derived from the ARG.

TEToolkit

Cold Spring Harbor Laboratory
Developed by the Hammell lab, TEToolkit is a package for including transposable elements in differential enrichment analysis of sequencing data sets.

SAKE

Cold Spring Harbor Laboratory
Developed by the Hammell lab, Single-Cell RNA-seq Analysis and Klustering Evaluation (SAKE) is a robust platform for scRNA-seq analysis that provides quantitative statistical metrics at each step of the analysis pipeline.

EGAD

egad logo
Developed by the Gillis Lab, EGAD is a package that implements a series of highly efficient tools to calculate functional properties of gene networks.

MetaNeighbor

metaneighbor logo

Developed by the Gillis Lab, MetaNeighbor allows users to quantify cell type replicability across single-cell RNA data sets. For a broader array of software (including those packages outside of R) from the Gillis lab please visit: https://github.com/gillislab

Alexander Dobin

Alexander Dobin

Computational genomics; transcriptomics; epigenomics; gene regulation; big data; precision medicine

Tatiana Engel

Tatiana Engel

Computational and theoretical neuroscience; machine learning; statistical physics

Jesse Gillis

Jesse Gillis

Gene networks; gene function prediction; guilt by association; neuropsychiatric; hub genes; multifunctionality; computational genomics

Alexei Koulakov

Alexei Koulakov

Theoretical neurobiology; quantitative principles of cortical design; computer science; applied mathematics

W. Richard McCombie

Richard McCombie

Genomics of psychiatric disorders; genomics of cancer; computational genomics; plant genomics

Partha Mitra

Partha Mitra

Neuroscience and theoretical biology

Doreen Ware

Doreen Ware

Genomics; genome evolution; genetic diversity; gene regulation; plant biology; computational biology

Seungtai (Chris) Yoon

Seungtai (Chris) Yoon

Autism, SFARI, AGRE, SNVs, CNVs, whole-genomeexome sequencing, single-cell sequencing and bulksingle RNA sequencing

Ivan Iossifov

Ivan Iossifov

Every gene has a job to do, but genes rarely act alone. Biologists have built models of molecular interaction networks that represent the complex relationships between thousands of different genes. I am using computational approaches to help define these relationships, work that is helping us to understand the causes of common diseases including autism, bipolar disorder, and cancer.

Justin Kinney

Justin Kinney

Research in the Kinney Lab combines mathematical theory, machine learning, and experiments in an effort to illuminate how cells control their genes. These efforts are advancing the fundamental understanding of biology and biophysics, as well as accelerating the discovery of new treatments for cancer and other diseases.

Peter Koo

Peter Koo

Deep learning has the potential to make a significant impact in basic biology and cancer, but a major challenge is understanding the reasons behind their predictions. My research develops methods to interpret this powerful class of black box models, with a goal of elucidating data-driven insights into the underlying mechanisms of sequence-function relationships.

Alexander Krasnitz

Alexander Krasnitz

Many types of cancer display bewildering intra-tumor heterogeneity on a cellular and molecular level, with aggressive malignant cell populations found alongside normal tissue and infiltrating immune cells. I am developing mathematical and statistical tools to disentangle tumor cell population structure, enabling an earlier and more accurate diagnosis of the disease and better-informed clinical decisions.

Dan Levy

Dan Levy

We have recently come to appreciate that many unrelated diseases, such as autism, congenital heart disease and cancer, are derived from rare and unique mutations, many of which are not inherited but instead occur spontaneously. I am generating algorithms to analyze massive datasets comprising thousands of affected families to identify disease-causing mutations.

David McCandlish

David McCandlish

Some mutations are harmful but others are benign. How can we predict the effects of mutations, both singly and in combination? Using data from experiments that simultaneously measure the effects of thousands of mutations, I develop computational tools to predict the functional impact of mutations and apply these tools to problems in protein design, molecular evolution, and cancer.

Hannah Meyer

Hannah Meyer

A properly functioning immune system must be able to recognize diseased cells and foreign invaders among the multitude of healthy cells in the body. This ability is essential to both prevent autoimmune diseases and fight infections and cancer. We study how a specific type of immune cells, known as T cells, are educated to make this distinction during development.

Saket Navlakha

Saket Navlakha

Biological systems must solve problems to survive, and their solutions can be viewed as “algorithms.” Our goal is to uncover these algorithms, translate them to improve computer science, and use them to spark new hypotheses about biological function and dysfunction.

Adam Siepel

Adam Siepel

I am a computer scientist who is fascinated by the challenge of making sense of vast quantities of genetic data. My research group focuses in particular on questions involving molecular evolution and transcriptional regulation, with applications to cancer and other diseases as well as to plant breeding and agriculture.