A “behind the screens” look at how biology is addressing its “most wonderful problem”—too much data. Associate Professor Gurinder S. “Mickey” Atwal joins us to explain the essential enigma that is quantitative biology.
Read the related story: Biology, behind the screens
(Chatter sounds trickle in, growing steadily louder)
BS: Hey guys… I’m Brian
AA: And I’m Andrea…
BS: And we’re needing to almost shout here because well… this noise. This raucous crowd you’re hearing behind us… that’s hundreds of high schoolers.
AA: 312 high school students to be exact.
BS: And despite this happening at the cusp of summer, this isn’t a clip from day camp. What you’re hearing is a scientific poster session where these high schoolers – many of them freshmen – got to present their own experimental findings. These are ambitious kids, so I expected to be impressed. But what really blew me away wasn’t the amount of work they did—it was how little it resembled the kind of biology I learned in high school.
BS: So what do we have going on here? What am I looking at?
GP: For me… scientifically… I had to learn how to code in Python for that which was REALLY INTERESTING to say the least. But also how to analyze the data we obtained. Which wasn’t just simple graphs. We had thousands and thousands of sequences that we had to go through!
BS: That was Giovanna Prucia, a 17-year-old Junior from Connetquot High School, and the project she and Chris Paciello showed me was SUPER different from what you’d see at your average grade-school science fair.
AA: Yea! I’ve heard of Python before… and that’s a programming code, isn’t it? That’s not exactly something you normally learn in highschool, right?
BS: I definitely didn’t. And Victoria DeAmbrosia, a science teacher from William Floyd High School, was also pretty shocked about what she’s got kids learning these days.
VD: And a lot of the students go to the next level. Those who did barcoding last year, they got to do microbiomes this year, in which they’re doing these really complex statistical analyses – both types of projects get to use bioinformatics tools…
AA: This is crazy! Brian, not too long ago, I too was an aspiring biologist… and computer coding… statistical analyses?! These were NOT the kinds of things I focused on.
BS: Well… biology is changing! It’s looking that more and more scientists are going to spend as much time behind a computer as at a lab bench. And that’s for a REALLY good reason…
FC: “We’re all STRUGGLING with the wonderful problem of having too much data. Big Data, as it’s featured on the cover of Nature magazine. Big Data as we talk about around the table at institute director meetings on Thursday mornings. Big Data as I am even now being asked by people in the White House ‘what are you going to do about this?’ as now everybody recognizes that we are in a circumstance about needing to be very thoughtful and creative about how we handle the very large quantities of biological data that are pouring out of many different approaches… toward understanding how life works and how disease occurs.”
BS: That’s Francis Collins –the man who has been directing the National Institutes of Health for nearly a decade. The clip I just played was from 2012, when he was serving under President Obama, but Collins’ goals and concerns have hardly changed since.
AA: I can understand why. Collins was also the head of the Human Genome Project – that massive scientific undertaking that, once accomplished, left the world with a WHOLE lot of data and very little idea of what it all meant. I mean, we just wrapped up a two-part series that was all about how we’re still sifting through the sequenced human genome, slowly but surely making sense of it all. One could even hazard the guess that Collins feels responsible for all the new work that needs to be done!
BS: Sure! But the other big issue is that for centuries, traditional biologists have been able to get by on their own – mostly. In this way, the cycle of observe-hypothesize-experiment-observe has been a closed system. And that’s not just for biology, but for lots of sciences! Here’s Kafui Dzirasa of the NIH’s BRAIN Initiative really bringing that point home while chatting with Collins at a meeting called “Faster Cures” held last year.
K: “Big data has gotten bigger right? So 500 years ago big data was staring out at the galaxy and mapping out the planets and how they were orbiting around the Sun. So then there was a role of an individual investigator sitting and observing and framing things. The problems today are SO complex that one person CAN’T handle all of that at all!”
AA: Ok. I can see that. For last year’s season finale of Base Pairs, we talked about how the problem of mapping the brain is so complex that it can’t be done by hand –
BS: Or by eye, er– microscopy, so to speak
AA: Right, so instead Neuroscientist Tony Zador is recruiting RNA sequencing to map the brain computationally, and THEN neuroscientists can pick specific neurons and circuits to investigate more traditionally.
BS: It’s an elegant solution. But what they have yet to really work out is how to identify which neurons are significant for any one problem. Likewise in genomics, biologists have countless genes to choose from when investigating biological function or disease
AA: – Thanks to the Human Genome project –
BS: but they struggle to select key targets for study… And why is this? CSHL Associate Professor Mickey Atwal suggest that it may just have to do with the fact that people are really bad at making predictions.
MA: In a way I feel like we’re almost hard-wired to get things wrong. There’s one example I can remember. Where, I went to a roulette table. And the most common thing to do at a roulette table is to bet either red or black and it’s roughly 50% (except for the one or 2 green ones)…
And these roulette table managers are smart. So what they figured out is if you show the customers what other players have bet in the past, that’s gonna offset how people think about random. Meaning, that if a person sees that there were 6 reds played in the past, there is a compulsion, within them, to bet that the next one is gonna be black!
And you know, there is almost a primitive part of me that felt that urge! “of course it’s gonna be black, there have been 6 reds in a row!” but if you’re grounded in understanding probability theory you understand that it makes no difference. Each one is an individual instance! And yet, I saw this time and time again. So, we’re really bad at understanding whether something is a statistical fluke or a real signal.
BS: What’s wild is that even in this roulette example, the data we’re dealing with is very small. Just 36 numbers, 3 colors, and some results. And YET, even Mickey – who is a trained physicist and quantitative biologist – even he feels the urge to bet irrationally – to feel that those results actually influence the next outcome – even when he KNOWS that in reality, they are what academics call statistical noise. It’s no wonder we can’t make heads-or-tails of Big Data!
AA: But Brian. Isn’t that what quantitative biologists DO? Help biologists filter out that noise from big data sets?
BS: In part. Yeah.
(interview clip) BS: So what IS quantitative biology?
MA: Yeah…. Really good question (laughter) so I think there are as many answers to that question as there are quantitative biologists. It’s not really answered to anyone’s satisfaction. And I think there’s a really good reason for that!
In most areas of science—chemistry, geography) you are defined by your object of study. And this is especially true in biology. So neuroscience is defined by the neural system. And plant biology is studying plants, right? Quantitative biology isn’t. The role of a quantitative biologist is to ask certain kind of questions and to ask for certain kinds of solutions… so what that means is that our domain of study can cut across many fields of biology.
And the kind of questions we ask… are different form the usual question a biologist would ask. Can we simplify what we see into something abstract so that we can make a predictive model of that? Can we make a model of the phenomenon we observe? And can we test those predictions? And to do this you really have to formulate the problem differently than how a traditional biologist would.
(interview clip) BS: would this fall in the lines of say, Punnet squares?
BS: You might remember these little charts from high school biology called Punnet Squares, and even today, students use them to predict the outcome of a basic genetic cross.
MA: So that’s a really good example! I think that’s arguably the first time most biologists experience an equation… And that’s a really simple example because it gives you a prediction of what is the expected observation given a set of hypotheses.
BS: You see, Andrea. Quantitative biologists are often the reinforcements that traditional biologists need in this age of big data. They’re essentially that outside help that Dr. Dzirasa was talking about in his discussion with Director Collins.
AA: So, they provide a new perspective, allowing predictions and observations to be made on a concrete statistical level. That way biologists can then formulate new hypotheses and experiments based on what is learned.
BS: Right! Right now, Mickey is working in collaboration with a number of specialists in trying to better understand breast cancer, and his lab here at CSHL is bringing that essential QB perspective to the table.
MA: Now, immuno-therapy is a buzzword and you may have read about this in popular press, and there’s been some really exciting developments in the treatment of lung cancer and skin cancer, melanoma, but it hasn’t fared so well in breast cancer. So one of the things that keeps me up at nighttime is trying to understand why not? Why aren’t the immune cells, which we know are found in breast cancers, why aren’t they doing their job and killing the cancer cells? What is it about the cancer cells that somehow tricks the immune cells into not attacking them?
So, the research team that we’ve built and grouped together is really focused on understanding the communication between the different kinds of cells that you find in a growing tumor. So we have actual biopsies from patients in a clinic based in Los Angeles, actually shipped here to Cold Spring Harbor. And with our DNA sequencing facilities, we are able to measure the activity of thousands of genes in individual cells.
AA: Oh my. THAT’S a lot of data. And how each cell expresses those genes can differ wildly. One could even think of the environment around a tumor as a neighborhood. You’ve got your behaving cells expressing their genes in one way – they’re good citizens. And then there’s cancer cells acting badly. But there’s also lots of other cell “personalities,” if you will, who also might act strangely.
BS: That chatter alone creates a lot of statistical noise
AA: Right, and when everyone is talking with everyone…
MA: it’s a bit like a needle-in-a-haystack problem. So we have to develop algorithms that can sift through mountains of data and try to find out which genes are really important, and more importantly for this project, which genes are really important for the cells to bypass the immune system and actually allow the cancer cells to grow without the immune system killing them off.
BS: Essentially, breast cancer cells are really good at conning their neighborhood. Those “good-citizen” cells Andrea mentioned can’t tell that their nasty neighbors are ruining the neighborhood and are happy to communicate with them. And because the cancer cells are acting so darn “neighborly,” the immune system – or the local police in this metaphor – don’t realize that they’re criminals.
AA: But if Mickey and his collaborators are successful, the hope is that they can identify ways to quiet those problematic cell-to-cell conversations, putting a stop to cancer’s neighborly act.
BS: Mickey’s project is one of MANY so-called “Big Science” collabs – this is one funded by the group “Stand Up to Cancer” – and it shows the power of various scientific disciplines all aiming their efforts at one objective. However, Mickey argues that for these projects to truly move science forward in this age of Big Data, everyone needs to become a little more familiar with QB.
MA: You certainly don’t want to be in a position where you’re shuttling off your data to somebody else and you’re treating them like a black box. They somehow perform their magic. And they say “these are probably the targets for your disease.” Right? You want to have some sort of conversation. You want to be more intelligent than that.
Even if you’re not going to do the experiments themselves, I still think it’s really important for the experimentalists and the classical biologist to at least be able to understand what are the state-of-the-art techniques that are required to sift through mountains of statistical data.
I really do think that it’s going to be the next generation of biologists – undergraduates, graduates, and postdocs – all need to be trained in computational and quantitative skills.
AA: Hmm, well… there is good news. Here at Cold Spring Harbor Laboratory, the students of our Watson School of Biological Sciences
BS: – that’s our Ph.D. Program –
AA: every student admitted to the program is required to take what can be best described as a computational “boot camp” where they learn PYTHON – that premiere programming language for MANY important scientific databases.
BS: And amazingly, it’s mostly being taught to students who have only ever known text books and a lab bench.
MA: We say “hey! This a computer. This is what a computer does. This is what it doesn’t do! This is how we can write code to perform basic commands. And by the second day they’re actually analyzing next generation sequencing data by themselves!
AA: Mickey teaches this boot camp and as you might expect, he (and his subject) are not exactly popular with new students.
BS: Can you blame them? They came here to do science… and instead Mickey’s got them sitting behind a computer! Doing code!
AA: And yet… we know that this is exactly how lots of science gets done.
MA: I’m sure there’s a whole bunch of them who hate me here when they first arrive… (laughter)
MA: And what’s interesting is that, because it’s such a new skillset, such a new concept, and it’s so immersive on the first day that by the end of the first day they usually end up… dreaming… well, thinking about Python obsessively. And you can get to this state where you can just spend hours in front of a computer coding away! It can become quite addictive.
BS: According to Mickey, it’s rare to have a true convert – a student who actually leaves their well-paved bio-science path to wade into the unknown of quantitative biology. However, he did explain that by the end, the idea that you can frame a theoretical scientific question mathematically becomes pretty popular among the students.
AA: So popular in fact, that for three years now, Mickey has been elected to receive the Winnship Herr Teaching Award by the Watson School’s freshman class.
BS: It’s basically a teacher popularity contest, and each year, Mickey acts like he has no idea why he’s won it.
(graduation ceremony clip) MA: I don’t know how and who decides these awards uhh… but whoever you are… Russian hackers included… your check is in the mail (laughter).
AA: (laughing) I spoke with Mickey about this for a story on our LabDish blog, but it’s easy to see why his course is actually popular. While other instructors are simply reinforcing old skillsets and scientific methods, Mickey is teaching these students something fresh! It’s practically a new way to think for many of these young scientists.
BS: And that’s what I actually found most surprising during my chat with Mickey. While this strategy – this way of approaching theoretical problems seems rather new to many scientists – it’s actually been around for decades.
MA: What’s not appreciated enough in biology I think is the Watson and Crick paper, their famous paper, THAT’S A THEORY PAPER! There’s no new data that’s reported. It’s basically a theoretical conjecture on solving some equations based off crystallography.
BS: And yet, in that paper – in the 1953 Nature paper in which Watson and Crick proposed that the DNA molecule was shaped like a double-helix – there is a single line that many scientists can recite.
AA: “It has not escaped our notice,” it begins, “that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”
BS: This bon mot is famous probably as much for its coyness and understatement as for the significance of its prediction… and that’s a good embodiment of what quantitative biology really is. It’s not a field of a study, but a strategy and even a way of thinking (pause) with predictive prowess that are too-often under appreciated.
AA: Mickey is but one of our scientists employing quantitative biology in a quest to answer some really important questions, so like always, this won’t be the last you hear of this subject.
BS: Cancer, autism, neuroscience, the evolution of humanity, and SO much more – all are subjects that employ QB, and all are things we have or will talk about in episodes of Base Pairs.
AA: So stay with us! And as always… “more science stories soon!”