Newsstand Menu

Using “guilt by association” to classify cells

illustration of big data
Cold Spring Harbor Laboratory Associate Professor Jesse Gillis and his lab developed a computational tool called MetaNeighbor that can analyze large amounts of data, combining and standardizing findings. They are using it to compare and catalog cell types and functions. Image: sunward5/Adobe Stock

Biologists are trying to figure out what makes a cell unique in form and function. But they are not certain which components are key to making similar cells behave differently. Cold Spring Harbor Laboratory (CSHL) Associate Professor Jesse Gillis and his lab are tackling this problem with a new statistical method: they are analyzing the cell’s many components. Their technique is analogous to figuring out how a kitchen works by looking at a detailed list of all its component parts, including sinks, cabinets, screws, nails, and hinges, and comparing the components lists of millions of kitchens. With careful analysis, one can see which components are used in sets, which parts on the list are common to all kitchens, and which ones are unique only to some kitchens. 

Gillis’ group developed a computer tool called MetaNeighbor to perform this task. The program uses RNA transcripts, which are copies of DNA that contain instructions on how to build proteins. Using statistical methods, the program figures out which sets of transcripts in what amounts are most significant to a cell’s function and identity. MetaNeighbor tracks hundreds of sets of transcripts to profile each cell’s function, then groups cells based on how similar their profiles are to each other.

MetaNeighbor: guilt by association

Maggie Crow, a former postdoc in Gillis’ lab, originally developed MetaNeighbor in 2018 to define a set of standardized parts for cells. The team is continuing to expand the tool. The program analyzes a portfolio of transcripts per cell to characterize its unique profile, known as a transcriptome. The transcriptome defines what is needed to build the cell’s anatomical features. The key is measuring the amount of each transcript.

For example, in the brain, neurons have protruding axons and dendrites that transmit signals (A). A cell with five times as many dendrites as another will have five times the amount of dendrite-related transcripts, in turn creating five times as many associated proteins (B).

illustration of brain cells
(A) Two brain cells have common structural features, including the nucleus, dendrites, and an axon. (B) The common structural features are made of proteins of different sizes, shapes, and locations. For example, proteins that build a dendrite are represented here as one green rectangle and three black circles per dendrite. These proteins are created with instructions from RNA transcripts. Illustration: Ben Wigler

By looking at the pattern of transcript levels, scientists can infer what the cell probably looks like: a neuron with five times more transcripts related to building dendrites probably has five times as many dendrites. In some cases, they may also be able to infer the function. A cell with transcripts for a certain neurotransmitter receptor probably reacts to that neurotransmitter.

illustration of brain cells
Knowing what transcripts are being used in what amounts allows a scientist to infer the shape and function of a cell from its component parts. In this example, with one green box and 3 black circles per dendrite, we can infer that the cell on the right would have five times as many dendrites as the one on the left. If the neuron had receptors for a certain neurotransmitter (the dark pink triangles), then it probably reacts to that neurotransmitter. Image: Ben Wigler.

Ben Harris, a graduate student in Gillis’ lab, points out that the relationship between transcripts within each cell is key to how MetaNeighbor analyzes a transcriptome. When the transcripts are used in groups, one transcript is always present in the same ratio to another transcript, such as 3 to 1. Harris discovered that these ratios are intact across all cell types, whether the associated transcripts are being used at low or high levels, to build one dendrite or five dendrites.

Once MetaNeighbor establishes the ratios of transcripts to each other within a given cell, a statistical “guilt by association,” Gillis’ lab can collect large amounts of data from many different cells. The program can group together cells that have similar transcriptomes, and thereby similar functions and/or shapes.

From building blocks to the whole organism

The same types of screws, nails, and boards that are found in a kitchen could also be part of living room or bedroom furniture. Once scientists know the building blocks for one type of cell, they can figure out how the rest of an organism works using the same strategy.

Researchers that are part of the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative – Cell Census Network (BICCN), funded by the National Institutes of Health, are using MetaNeighbor to categorize mammalian brain cells. The Network brings together many labs and computational scientists to create a standard reference set of brain cells for mice, primates, and humans. The advantage of MetaNeighbor is that it can group cells even if the methods of data collection vary. It can even combine data from different labs into one comprehensive data set.

Once MetaNeighbor establishes the ratios of transcripts to each other within a given cell, a statistical “guilt by association,” Gillis’ lab can collect large amounts of data from many different cells.”

MetaNeighbor is also helping scientists understand plants better, enabling them to design better and more sustainable crops. For example, Gillis collaborated with CSHL Professor David Jackson to create an anatomical map of the activity of key developmental genes in baby corn. Using Gillis’ methods, they tagged these genes to determine when and where they turn on and off as the corn grows.

Gillis also assisted in a study on tomatoes by CSHL Professor and Howard Hughes Medical Institute Investigator Zachary Lippman and CSHL Adjunct Associate Professor Michael Schatz. The researchers sequenced and distinguished genetic relationships between 100 tomato varieties, revealing 230,000 large-scale differences in DNA between them.

Squeaky hinges and loose screws

MetaNeighbor gives researchers better insight into what cells do, where they do it, and when they start or stop doing it. The program makes it possible to standardize research across labs. Armed with the right parts lists, researchers can map out how individuals differ within a species or how species differ from each other. Scientists can look for “loose screws” or “squeaky hinges” when comparing a new cell to a standard model. These standards and analysis tools will allow researchers to work with each other more easily and speed the pace of biological discovery.

MRI image of the human brain
The brain is made of layers of many different kinds of cells as seen in this colorful MRI of the human brain. Defining cell types and circuits is critical to understanding disease, yet to date, nobody has done a catalog of all the brain’s cell types. By standardizing the definition of every cell type, researchers of the Brain Research through Advancing Innovative Neurotechnologies Initiative – Cell Census Network hope to build an accurate atlas of the human, marmoset, and mouse brains. Image: highwaystarz/Adobe Stock

Written by: Jasmine Lee, Content Developer/Communicator | | 516-367-5940

Stay informed

Sign up for our newsletter to get the latest discoveries, upcoming events, videos, podcasts, and a news roundup delivered straight to your inbox every month.

  Newsletter Signup


Jesse Gillis

Jesse Gillis

Associate Professor
Cancer Center Member
Ph.D., University of Toronto, 2007