Mail-order genetic testing—more accurately known as genotyping—has become a growing trend. Simply spit into a tube, and companies claim they can tell you about where your family comes from, what conditions and illnesses you might be predisposed to, and what your family tree looks like. One of the more surprising uses of this data is finding criminals, like the Golden State Killer.
The Golden State Killer investigation makes you wonder: how public or private is your genetic data? And what could someone tell about you if they had access to your genome?
Read the related story: Genomes, justice, and the journey here
AA: Hey all, I’m Andrea
BS: And I’m Brian
AA: And this, as you know… is Base Pairs.
BS: But maybe you don’t know about Base Pairs! Maybe you just clicked on this podcast to give us a try, and that’s cool too, because we’re not about to throw you into unfamiliar territory.
AA: Well, we ARE going to talk about a huge trend in genetics in a minute. But we’re going to start with something LOTS of people have been talking about throughout this summer of 2018.
BS: That would be the unmasking – so to speak – of the Golden State killer. The cracking of a decades-old cold case is a subject of wonderment in itself, and we’ll get into the details of that case in a second, but HOW this case was solved… how a notorious serial killer was finally brought to justice… THAT’s very relevant to what we’re going to talk about in this episode… which is about personal genetic testing, and what having that data actually means.
AA: but first, a timeline. And this gets unsettling, so listener discretion is advised.
[archive clip 1978] It’s so warm in Concord tonight that people have their windows and doors open, but Sacramento police are saying “lock up tight.” Sacramento’s east area rapist may still be in town. He raped a 29-year-old housewife near the Agnosio Valley shopping center around 5:30 this morning. Her Husband was tied up nearby and had to listen… [fade]
BS: That’s a clip from 1978 out of ABC 7’s broadcast archives… a clip that describes one of nearly 50 rapes committed between 1976 and 1986. In addition to these abhorrent crimes, Sacramento and the east Bay Area were terrorized by a string of burglaries and murders that could all be tied back to the same assailant – the East Area Rapist – later to be known as the Golden State Killer.
[archive clip 1979] there is a real sense of frustration among women tonight… despite all their locked doors and bolted windows, they are all still very much afraid tonight of the East Area Rapist, and that makes them all in a sense, his victims. [fade]
AA: No one knows exactly why the crimes stopped in 1986, but for the three decades that followed, little more was discovered about the notorious man that had terrorized California.
BS: It was only this year, on April 25, 2018, that an arrest was made, pinning the crimes on 72-year-old Joseph James DeAngelo. He was fired from the Exeter, California police department in 1976, just before the crimes began, and has lived in the Sacramento area ever since. Amazingly, the purportedly damning case against DeAngelo is rooted not in witness testimony, but instead in the DNA from a 37-year-old rape kit.
CR: I didn’t know about it right away. …The first [news report] that came out didn’t mention us at all, and I wondered if we were involved in that. … Then I forgot about it. The next day, they had… another meeting with the press, and in that one, they announced GEDmatch was the big reason why they were able to … find this guy
BS: That’s Curtis Rogers, the co-founder of the open source database called GEDmatch that police used to track down the Golden State Killer through genome matching. When I spoke with him, it was also about to be his birthday.
CR: tomorrow is my 80th birthday. That’s frightening … It just keeps happening! The years keep coming! Anyway, I started GEDmatch– Well, I’ve been in touch with genealogy ever since I was a teenager lightly, but as I got older, I got more involved in it.
BS: Curt had originally been part of the Rogers Surveying Project, for Family Tree DNA – a sort of genealogical scavenger hunt for ancestry that all Rogers family members could access. According to Curt, ancestry is often how a lot of citizens become interested in genealogy.
CR: everyone started writing emails back and forth. “Do you have McGillicutty in your family tree?” “No, but do you have a McCarthy?” “No, but do you have …” Blah, blah, blah. Go on for hours and hours…
and I met this guy by computer who is now my partner, John Olson, and… I asked him if we could do a computer comparison of family trees so we wouldn’t have to do all this back and forth, and he came up with it. He came up with a great algorithm, and it was just too much for my little Rogers Project, so based on that, we started GEDmatch…
There’s now the criminal thing that we didn’t anticipate at all, but certainly law enforcement’s trying to use it for that.“
AA: It’s incredible that some site dedicated to comparing family trees could be used by the police to track down a killer! How exactly does GEDmatch work? What is it?
BS: Well, I’m sure you know about direct to consumer genome sequencing. Spit in a tube. Send it off. Get a whole bunch of colorful graphs back. That sort of thing.
AA: Sure! You’re describing services like 23andMe, MyHeritageDNA, or AncestryDotCom. A few services have even cropped up exclusively for dog pedigrees!
BS: Right, but when these companies send back their results, they’re working within a limited pool of data. That’s where GEDmatch comes in.
CR: we are not in competition with the testing companies. We are helping them. We are supplementing them.
BS: GEDmatch basically is a hub where anyone can upload their genetic data and cross-compare with results from other services. They even provide a LOT of different tools for amateur and professional genealogists alike. And since Curt and John basically started their site as hobbyists for hobbyists, they’ve left the site entirely open-source. In this way, the family trees its users compile – which are lists of potential relatives based upon similarities in genetic data – are accessible to anyone with a login.
AA: Even law enforcement.
BS: Exactly. While detectives behind the Golden State case have not revealed their exact process, what’s understood is that they uploaded DNA data from that 37 year old rape kit. They located a close match in the GEDmatch database – someone like a first or second cousin – and then they contacted that individual to find out if they had any relatives around a target age who lived in the Sacramento Area between 1976 and 1986.
AA: Ah. And since these genetic results are public domain, investigators didn’t even have to file a warrant. That’s… pretty game-changing.
BS: It’s been estimated that since the Golden State Killer’s case, GEDmatch has been used by law enforcement in eight other criminal cases – ranging from identifying unidentified victims to zeroing in on murderers.
CR: I was really concerned for a long time that, “Is this invasion of privacy? Is it being used for something it shouldn’t?” I came to the conclusion that we really didn’t have a choice…
We could put a policy up there, “Hey, we’re only requiring that we get a warrant from the courts before we give up any information.” So what? We would never know. We would never be able to enforce it really. What we did decide is that we really have to educate our users, let them know, give them as much warning as we can of some of the uses and especially of law enforcement.
AA: Well, I can definitely get behind the idea of educating these genome-curious consumers. So let’s talk about what exactly this genetic information they’re uploading CAN actually be used for. And to do that, we need to talk about what the data we’re discussing really is.
BS: What it really is? What do you mean? It’s information in your genetic code—the sequence of letters in your DNA.
AA: You do get information about your genetic code, but there’s an important distinction here: most direct-to-consumer tests don’t exactly sequence your genome. Sequencing means going along the DNA and reading out every letter along the way—either all 3 billion of them or some subset—and that’s still relatively expensive.
BS: If they don’t sequence the whole genome, then how are the direct-to-consumer testing companies figuring out what’s in people’s DNA?
AA: For the most part, these consumer companies use a faster, more economical approach called genotyping.
DM: Genotyping looks at some hundreds of thousands of spots in the genome…
AA: That’s CSHL Professor Dick McCombie.
DM: …and you can infer a lot from that, because DNA, parts of the chromosome tend to go in chunks.
BS: So, genotyping involves looking at particular spots in the genome that you’ve decided you’re interested in and then trying to fill in the blanks based on genome sequences that researchers already have from other people.
AA: Yeah, that’s part of how these testing companies like 23andMe are able to make their tests so affordable.
BS: They don’t have to sequence a person’s entire genome, because they can look at existing human genome sequences and figure out some of the most important or interesting spots to check.
AA: Dick didn’t have that luxury, though, when he was getting his career started back in the 1980s.
AA [in clip]: Where was genome sequencing technology at when you first started in this field?
DM: I want to say it was a hope, but I’m not even sure it was that.
BS: So, this was before the Human Genome Project had even started—the historic effort to put together the first full human genome sequence.
AA: It was. Dick is one of the sequencing pioneers who was involved in the Human Genome Project from its early days. He was also part of some of the first major discussions about publicly releasing human genetic data.
DM: I was at the original Bermuda meetings back in the mid 90s for open release of data. There were a series of meetings in the mid 90s, I think three of them, in Bermuda, organized by NIH and the Wellcome Trust in England, where 30 people or so got together at each one — and worked out the data release protocols for the Human Genome Project, which were that when we were actually working on the human reference genome, we had scripts on the computers that would go through our directories every Friday and take everything more than a certain size —and automatically submit it to the public repository for public access.
BS: Wow, they not only made all of the genetic data public, but they did it in real time. What made them decide to do that?
AA: One important factor is that the Human Genome Project used genetic data from about a dozen different people—who had all consented to this use, of course—to create a sort of mosaic that could serve as what scientists call a “reference genome.” It essentially serves as a model or template for sequencing and analyzing the genomes of other individuals of the same species.
BS: So, it’s not like they were putting out any one person’s entire genome. They were putting together a more general representation of what a human genome looks like.
AA: Right, and then the really big question was, would each lab get patents on the genes they sequenced and sell them for profit, or would they make all of the data publicly available to advance scientific research?
BS: That is really important.
AA: Access to this first human genome sequence has helped countless scientists conduct human genome studies in their own labs. And yet, when Dick was telling me about how he was a part of these huge discussions about the open release of data, he also told me something that surprised me.
DM: I could say personally, when we were first doing the whole genome sequencing with the new instruments, and the company was giving us free reagents and we were running tests and stuff, I did think about putting some of my own DNA on there and sequencing myself. I decided not to for a variety of reasons.
BS: I’m surprised that someone who’s devoted his career to this genome sequencing hasn’t looked at his own genome.
AA: He’s holding back partially because he knows himself.
DM: I know I worry. There’s 3 billion bases for me to look at and say, “I don’t like that,” but not know what it means.
AA: But he’s also holding off in part because he doesn’t know how other people might use his data, if they were to get access to it.
DM: I’m a big believer in public data access in general. However — I do worry about data privacy a lot. — The example of catching a murderer is a great example, and everyone says, “It’s great they caught a murderer,” including me, but I worry that there’s the possibility of using data like that for inappropriate purposes. By that, I mean totally publicly available data.
BS: Right. Inappropriate purposes like discrimination. Say, a potential employer takes a look at your genome, sees that you have a mutation that causes some terrible disease that will force you to retire early, and doesn’t hire you as a result.
AA: That’s one big concern, and there’s been some progress in protecting against this type of discrimination. There’s legislation known as GINA—the Genetic Information Nondiscrimination Act—that was signed into law in the United States in 2008 and makes it illegal for employers to use genetic information in the way you described. It also prevents health insurance companies from using information from your genome to make decisions about your eligibility for insurance, what you pay, or what you’ll be covered for.
BS: But a lot has changed in the last 10 years. For one thing, I know that I wasn’t hearing about these direct-to-consumer genetic tests nearly as much back then.
AA: There’s still a lot of work to do to make sure that genetic information is used for good.
DM: I think despite the GINA legislation, which I think was a big step forward, the genetic privacy legislation that passed a few years ago, that doesn’t cover everything. — I don’t think it’s a solved issue. It’s one that is evolving hopefully at reasonably close to the same speed as the technology’s evolving, because technology is evolving very, very fast.
BS: This is a conversation that we all need to be having. Genotyping has already made it much easier for people to access a small fraction of their own genetic information, and look at how powerful that information has already proven to be. It allowed law enforcement to crack a cold case that’s over three decades old. I would imagine that as sequencing technology continues to become cheaper, it will be more common, maybe even routine, for people to have their whole genomes sequenced.
AA: I thought that too—that the only reason we aren’t already getting sequenced as a routine part of our healthcare must be that it’s too expensive. But Dick told me that’s not really the issue.
DM: I actually think the biggest hold up isn’t the cost of the sequencing, but the … being able to understand what the sequence means. For instance, radiology has dealt with this issue of, well, what if you do an X-ray for one thing and you see something else? They deal with that every day. Sequencing hasn’t. What if you sequence someone to find out if they have a mutation in this gene, but you find they have a mutation in this [other] gene, and you think it may be bad, but you don’t know what it means. That’s really a problem. I mean, I was actually really sick a few years ago and had multiple MRIs done, and MRIs cost more than a genome sequence I think.
AA: I looked it up, and Dick’s right. While prices vary, an MRI can cost more than a whole genome sequence.
BS: You’re kidding me. I would have never guessed that. But isn’t Dick working on ways to further improve sequencing methods? If the cost is no longer holding back medical use of genetic information, then why is he still working on that?
AA: The motivation behind improving sequencing technologies isn’t just cost. It’s also quality. Dick told me that a key method that researchers have long used to put together sequences tends to miss some really important stuff present in many genomes—and especially in cancer genomes. Sometimes, fairly large chunks of DNA move from one part of a chromosome to another part, or even a to different chromosome. These mutations are called structural variations.
BS: Cancer cells have really screwed up genomes, so it makes sense that they would have more of these big rearrangements.
AA: Moving entire sections of DNA around is problematic. Dick was looking into this with his colleagues, including CSHL Adjunct Associate Professor Michael Schatz—who is also an Associate Professor at Johns Hopkins University—anyway, Dick and Mike knew of a particular breast cancer cell line that had a lot of this rearranging going on.
DM: We picked it because we knew it was a messed-up breast cancer cell line, chromosome wise. It was really rearranged.
AA: They mapped this cancer’s genome using newer sequencing technology and recently published a paper that reveals about 20,000 never before seen structural variations in this breast cancer cell line alone.
BS: 20,000 new structural variations?! Ok, now I’m starting to see why there’s still a need for better sequencing technology. But still, if these are big chunks of the DNA, how are they getting missed?
AA: Before I learned about how DNA sequencing really works, I kind of assumed that it was a lot more straightforward than it actually is. I imagined a machine unwinding the double helix of DNA and reading out the ‘letters’ one by one until it reaches the end of that piece of DNA.
BS: That seems like the easiest way to do it.
AA: While that would be nice, it turns out to be hard to do. Instead, most sequencing machines so far have been able to read only short pieces of DNA at a time.
BS: I’ve heard genome researchers talk a lot about “long reads” and “short reads”—and I know they’re not referring to their summer book list. They’re talking about volumes of the genome.
AA: In the world of literature, a short read refers to something that’s generally quicker to get through, but it has kind of the opposite meaning in genome research. Once scientists get these short or long reads from the sequencing machine, they have to put them in the correct order.
BS: And with long reads they have fewer pieces to manage.
AA: Exactly. It’s just like a puzzle, really.
DM: A puzzle with four pieces versus a puzzle with 400 pieces.
AA: Researchers use software to compare the sequence of each piece to the reference genome—what we were talking about earlier—and then figure out where in the genome it came from. Everyone at least has little variations that make us unique, but the software can work around those. It’s really difficult to figure out where these bigger structural variations are coming from using short reads, however. And that’s partly because they often happen in DNA that’s really repetitive.
BS: Yeah, I’m not sure how many people know it, but our DNA is chock full of sequences of letters that appear several times in a row.
AA: About half of the human genome is repetitive sequences! So it’s problematic that short reads aren’t so good at mapping these areas.
DM: The short reads basically can’t be mapped back uniquely to repetitive regions because they’re all the same. They can’t tell if it’s this repeat on this chromosome, or this one on another chromosome. Whereas the long reads get the repeat, but they also get the flanking regions that are unique, and so you can unambiguously map them back to the genome.
BS: Since long reads are like big puzzle pieces instead of little ones, the software has more context to work with. It can see more of the full picture of the puzzle on that one piece, so it’s easier to figure out where it’s supposed to go—even if it’s like one of those really difficult puzzles where the image is something repetitive, like a big crowd of people at a sports game.
AA: There’s a catch, though.
DM: Unfortunately, a puzzle that’s got four pieces is a lot more expensive. The prices on those keep coming down, though. While it was 100,000 four or five years ago [AA: 100,000 dollars, that is] we’re still trying to figure out exactly what the price is, but it’s probably in the area of 10 to 15,000 now.
AA: Because of the expense, a lot of researchers haven’t yet adopted long read sequencing methods. But the breast cancer findings that Dick and his colleagues recently published really show the benefits of using long reads to sequence the genome.
DM: We want to do two things. One, see what we’re missing, and secondly — we’re trying to do combinations of technology to drive the cost down.
BS: So, this technology is a lot more expensive, but it gives you a higher quality genome sequence and isn’t as laborious to produce.
AA: Right now, Dick and other researchers are working to make it more affordable, and to provide the knowledge necessary to interpret the information within people’s genomes.
BS: And that makes me think we’re headed toward more people getting their whole genomes sequenced – like I said before – and more genetic information seems like it would spell more ethical issues – like the job and healthcare discrimination issues which have already come up with genotyping.
AA: I thought it might too, and I asked Dick about whether he thinks that we need to be even more protective over whole genome sequence information than genotyping data.
DM: I think the issues are the same for either. Both have a lot of information in them that should be handled in a way respectful of reasonable privacy concerns I think. I don’t think that whole genome drastically changes that in most cases.
AA [in clip]: Okay. That’s interesting. In terms of the ethical risks, we kind of, we’re there.
DM: Yeah, yeah. I think we are, yeah.
BS: That makes sense, actually. We need to protect people from having their genetic information used against them, period. If we are careful about that, having access to even more genetic information shouldn’t be an issue. And as sequencing technology continues to improve, there are undoubtedly more benefits to be gained on the medical side, like the discoveries that Dick’s team made in breast cancer.
AA: Plus, the more people who get sequenced and allow their data to be used for research, the better the chances are that researchers will be able to pinpoint which genetic variations contribute to which diseases—even very genetically complicated ones like psychiatric disorders.
DM: In theory, if you sequenced everyone in the world and you have a detailed phenotype of everyone, you could write some computer program and come back 10 years later and it would tell you what it all means. That’s kind of a glib answer, but in general, that’s true. If you could compare lots of, if you looked at — people with schizophrenia say, and say, “These are the regions of the genome that seem to be associated with schizophrenia.”
BS: More genetic information means more ways to learn about yourself and to help reveal new ways to keep everyone healthy. I asked Curtis, the gentleman from GEDmatch who we heard from earlier, about where he saw this personal genetic information thing going in the future.
CR: I suspect that, at some point, everyone is going to have their DNA done. It may be done at birth, and they will then have this to help them health-wise or whatever else purpose they want. There will be this whole genome that they will have. They will own it.
AA: Ownership is an important thing to emphasize. These days, a lot of people have kind of forgotten that their personal information, genetic or otherwise, is something that they own and that is valuable. Billions of people around the world have willingly given intimate personal details to companies like Facebook for free. What will they do with their personal genetic information?
BS: Right. This is just a start for one heck of a deep discussion. It’s going to be important to make sure that consumers understand how powerful their genetic information can be. So talk to your friends, family – really to measure what YOU want to do with this – and this will sound familiar – this “power of genetic information.”