They say one is the loneliest number. But when it comes to DNA sequencing, today many scientists are appreciating the importance of assessing one cell, rather than many.
Take, for instance, Robert Aboukhalil, a fourth-year Watson School of Biological Sciences student, who is using his skills as a computational biologist to find a way to make single-cell genomic data easy for both scientists and clinicians to access, understand and use. Robert’s aim is to find a way to better diagnose and treat human illnesses: chiefly, cancer.
“My research centers around the use of single-cell DNA sequencing technologies to unravel tumor evolution,” said Robert, who works primarily with Cold Spring Harbor Laboratory professors Dr. Michael Wigler and Dr. Mickey Atwal. “Specifically, my work focuses on building software tools for the analysis and visualization of single-cell copy-number variants, and developing algorithms to identify clonal populations in tumors and retrace their evolution.”
For years, it’s been customary for scientists working in genomic laboratories to scrutinize a tissue’s genomic makeup using bulk DNA sequencing methods. Thousands or even millions of cells may be analyzed in the typical bulk-sequencing experiment.
From an isolated sample of biologic tissue, DNA is extracted from the many cells, broken into fragments, placed in a DNA sequencer and then the sequences produced are assembled to give a common, consensus sequence.
On the other hand, analyzing the genetic data of one cell, a process termed “single-cell DNA sequencing,” is a relatively new technology. It has an especially powerful role in cancer research. Focusing on one as opposed to many cells allows scientists to study heterogeneous tissues, such as tumors; rare cells, such as circulating tumor cells; and recombination in germ cells.
But it’s not an easy process. A single cell is isolated mechanically or by using an automated cell sorter, often a challenging task. Then its DNA is extracted and amplified to create many copies of its DNA sequence—a process that predictably introduces errors—and the amplified DNA is run through a sequencing machine. Next, as a result of the errors introduced into the DNA during the process of amplification, the sequenced DNA must be pieced together computationally.
To make single-cell DNA sequencing technology more accessible, Robert has worked with fellow WSBS Ph.D. candidate Tyler Garvin in the lab of Dr. Michael Schatz, seeking an in silico solution. Together they have developed “Ginkgo,” an open-source, web-based software tool for single-cell DNA analysis and visualization.
“Tyler and I decided to start working on Ginkgo because, although single-cell DNA sequencing was increasingly becoming a popular tool, it is a complex procedure, both experimentally and computationally,” said Robert.
Data science has always been appealing to Robert, who fondly recalls a childhood memory of wresting open a computer case and finding himself face-to-face with panels of intricate circuitry. He is a self-taught computer programmer whose fascination of computers inspired him to pursue a bachelor of engineering degree in Computer Engineering at McGill University in Canada.
Also intrigued by biology, upon his graduation from McGill, Robert decided to combine his interests by studying computational biology at WSBS. In addition to his PhD coursework and research, Robert is heavily involved in science writing, running two magazines, CSHL’s Current Exchange and Technophilic, as well as giving talks, presentations and blogging about science and his research; and runs his own event solutions business, 11factorial. Robert also recently published a book: step-by-step lessons in data analysis, aimed at anyone who wants to analyze large amounts of data—their own or someone else’s—quickly.
“I find data science fascinating because it combines data munching, software engineering, machine learning and statistics,” said Robert. “It also involves sifting through the noise, identifying key trends and oddities in data, and answering questions about those trends. For my PhD, I chose to apply data science to biology because of the vast amounts of data being generated in this field, and because it is filled with exciting questions that need answers.”
According to Robert, two types of scientists use Ginkgo: those lacking a strong background in computational biology who have single-cell DNA data but are unsure how to best analyze it, and those who Robert has termed “power users”—that is, scientists with a computational background who seek a quick, automated method of data analysis, the results of which they can download and do conduct further analyses offline.
“Using Ginkgo makes it very easy [to understand single-cell DNA data] because it not only does the analysis but also gives you tools to better visualize and explore the data,” said Robert.
Those tools include single-cell copy-number variant (CNV) and phylogenetic analysis, data visualizers, and the ability to share data with collaborators.
Together, these features make scientists’ research more efficient, said Robert. “Furthermore,” he added, “Ginkgo lets users share their data or results with other scientists, and that facilitates collaborations.”
Besides collaborations among scientists, thanks to Ginkgo, scientist-clinician collaboration is also enabled. This could vastly improve the outcome of patients suffering from disease.
“In the near future, you could imagine using single-cell DNA sequencing for early diagnosis of cancer,” said Robert. “Although the sequencing would be done in a lab, the results would be made available to pathologists and oncologists to look at via the web. Instead of just a static number or graph, this would give the clinicians greater power to zoom into different facets of the data, and ultimately provide better treatment.”