Some of the most sought-after gifts this holiday season are at-home DNA tests, but there maybe more to personal genotyping than simply learning more about ourselves. We sat down with Dr. Yaniv Erlich, chief scientific officer at MyHeritage DNA, to get his unique perspective on the use of personal genetic information and privacy concerns, using genetics for justice, and the pros and cons for finding out about your genetic code.
BS: Hey everybody, I’m Brian and this is Base Pairs, the podcast about the power of genetic information. Now, I first wanted to thank everybody for their patience thus far, we’ve been on somewhat of a hiatus but this is one of those special episodes I mentioned at the very end of the last episode in Season 3. Now, that last episode, that was episode 17 which we called “Genomes, Justice and the Journey Here”–lot of alliteration going on–and in that episode, we talked a lot about what a lot of you have probably been thinking about, which is personal genetic testing services like 23 and Me or MyHeritage DNA.
So just about a month after that episode aired, we’re talking now mid-October 2018, a new paper came out in the journal Science that talked a lot more about this subject. The name of that paper is “Identity Inference of Genomic Data Using Long-Range Familial Searches”. According to the authors, the purpose of the paper was really to see the power of what’s being called a “genomic triangulation” which is the same strategy that was used in an open-source genomic database to track down the Golden State Killer and has since been used to identify Jane and John Doe victims and people behind other violent crimes. In our podcast, we talked a little about the privacy implications of searches like this and the paper follows those same lines, discussing who could be most likely to be implicated by these searches and, more importantly, mitigation strategies that companies and individuals can both take to protect everyone’s privacy.
I was able to catch up with one of the authors of that paper. This is Yaniv Erlich, he’s an alumnus of Cold Spring Harbor Laboratory and he’s also the chief scientific officer of MyHeritage DNA. Genetic privacy is a subject that has always been a little near and dear to Yaniv. Back in 2014, he and Arvind Narayanan from Princeton University, explored this issue and later were recognized by Science Magazine as the guys who really predicted what police would wind up using to track down criminals like the Golden State Killer. That’s what we first talk about in our discussion and then we transition into talking about his newer paper a bit later. So enjoy the conversation and I’ll be back in a little bit.
YE: Yeah, we had the paper this is like 2014 paper in Nature Reviews Genetics, where we mapped all the different strategies to breach genetic privacy. We wanted to create kind of like for direct custodians, and just researchers that are interested in a domain, a summary of all the methods that we think could be used to reach genetic information and to learn or to infer some private things from this information. And the point was to create a taxonomy of different attacks so we can actually, when we talk about different things we know exactly where we are in this taxonomy and we can communicate between these different disciplines.
One of the routes that we identified was to use genealogical triangulation. And Previous to that study nature of genetics, we published another study in science where we show that you can infer the surname of individuals or males from the Y chromosome. And if you have the surname, then we’ve a bit more identifies, you can really zoom in and get to the person.
And one of the suggestions when we talked with the NIH and other data custodians was, let’s just remove the Y chromosome and this will solve the problem, and we were like, doesn’t sound right, you start to remove pieces from the genome, and then we can maybe execute the attack with other pieces of the genome. And this is why we thought about it and we thought that there is this idea that you can, if you can find a second cousin or a third cousin or some distant relative of a person using GED matching.
We mentioned GEDMatch, this nature of use genetics manuscript. This will give you kind of like a much smaller search space to nail down the person. And that one we found that this is how the Golden State Killer was captured. This idea for conducting the study that we published a week or so ago in science, was for a long time, was on the to do list of my lab. In fact, we had a summer student that was working on that, I just moved to MyHeritage … and put it on hold, yeah.
BS: I’m curious, now we’re talking specifically about the Golden State Killer case but I know you’ve been keeping a list of other cases in which this similar triangulation has been going on. What kind of crimes are these and how many so far have you found?
YE: I think the list currently has 19 individuals, 15 out of them are a law enforcement agencies that identifying criminals. Most of which we talked about a murder and rape crimes. Other four cases are bodies that were never claimed and unidentified, such as the Buckskin girl and a few more bodies of individuals. In fact, it’s quite interesting before the Golden State Killer, the Golden State Killer received all the news but just to make sure that we give the right acknowledgement to everyone.
Three weeks before that and not for profit project called DNA Doe Project, show that they identified the Buckskin girl using this technique. They had a body, the police had the body, yeah, before that, three weeks before that. I was on their Facebook page and I saw that and I was like that’s interesting. And then three weeks later, the Golden State Killer but this one was not reported as widely as the Golden State Killer.
They had the body, this is some, a young lady in their 20s that her body found and it was a violent death, was not like some sort of accident. And they tried to identify hair, there is also a wiki page for missing people that try to gather information and nothing really worked. So they genotyped hair, and I think they have enough DNA, for a body have quite a lot of DNA. They genotyped hair and uploaded the data to Gedmatch. I think they found a first cousin once removed and which is close enough and with few more identifies, they were able to reach out to the family of this person. It’s a very sad story. Her father died just about a year ago. And her mother is some like she’s not, she’s bit sick. The mother thought, she said that her daughter was comfort very free spirit and she was comfort, she thought she just like when she disappeared in a way, she knew that she’s going like to some long trip or something like that. So when she disappeared, she thought it’s probably one of her things that, she just want to have contact with the family and at some point in her life she will come back, so she stayed at the same house, she never changed her phone number, and she never reported to the police because she thought this is what she wants. She didn’t know that she died like like 30 years ago, but at least, they were able to identify her.
BS: It sounds like, thanks to these genotyping databases were able to give closure to people.
YE: Yeah, let’s zoom out the conversation and remember that this database is in general, not just GEDMatch but direct to consumer genomics. It’s all about connecting people, it’s all about … And we at MyHeritage.. were able to help hundreds if not thousands of cases of adoptees looking for their birth families. For we had cases, so it needs where we have this case of the Jews that babies were lost and were adopted without good records and were able to unify the families.
We have cases of Holocaust survivors finding each other after years, or just regular genealogies looking for another branch in your family and using these databases. These databases general we should remember that they serve for public purpose and I think increase the happiness in the world in general. And I want also to emphasize that people sometimes, people that didn’t speak with adoptees, they don’t understand the void that these people have in their past, that they don’t know where they come from.
And there’s hopeless search in many cases because the paper records are not good, sometimes are forged, sometimes they just they cannot locate a family. The ability to help these people it’s something that even if you look at the UNICEF Declaration of Human Rights or children rights, article eight says that the right for identity and accurate government records is like from birth and it’s one of the articles there. This ability to go and tell and someone that this is where you’ve been in your life and from the very moment you were born, all your life it’s something that it’s human rights.
We should also remember that because we’re going to say some, this also talk about the conflict maybe, less these are outcomes when we start the basis but we should keep in mind that also the 99.9% of the searches or for public benefit of people just finding their families.
BS: That’s amazing. It’s an important thing to note because even during our podcast, we mentioned mostly the medical benefits of this kind of information. And also of course the fact that it can be involved in police searches and people just trying to fill the holes in their family tree almost as a hobby. This is a whole different human interest side of it that really cuts close to home, it’s hard for me to think of who I would be if I didn’t know who my family were.
YE: Yeah, and for some adoptees this is no and I walked with some of them, and it just something that is, it’s hard to explain I guess if you’re not an adoptee, but to just see the amount of suffering that they have. The inability also, in some cases even the story that they have about themselves. I work with one person that is close, very close person and this person his paper record were just forged, is from Brazil, an adoptee from Brazil and his paper record was just not accurate. He tried to find his mother, couldn’t get, he flew to Brazil couldn’t get anywhere.
But then, and one of the things that he thought is like maybe someone kidnapped me from for my crib, Brazil is not like always, it’s not like first country, first world country. Maybe someone just capture him and his mother is still looking for him. But then we were able actually to find his half-sister using our database, and she’s also an adoptee, she’s also from Brazil. This person is from Israel, this person that his half-sister is from New Zealand and she’s also older than him so he knows that he’s the second one that was adopted from the same mother, and it’s a different story suddenly, it’s not this like maybe my mother is looking for me and someone just abducted me as a baby but maybe it’s a different story.
Okay, maybe it’s a person under some stress like socially and financially, and she needed to give the babies for adoption but it’s very different suddenly story that he can tell about for himself.
BS: It’s interesting because if you’re able to stitch their story together just by putting these two people together-
BS: It goes beyond just the, oh, yeah you were related to this person and you might be, so many cousins removed from-
YE: From a genetic perspective is quite neat, we know that their half-sisters, half-brother and sister. and then Yoav Naveh which is the DNA director of MyHeritage.., he suggested, you should look at the X chromosome because they should share the X chromosome if they are from the same mother right, he’s a male so he should share with the X chromosome of his mother, and she should get some of the X chromosome from other, or some parts of the X chromosome that he got.
BS: And it’s worth mentioning this wouldn’t be possible if we had kept it, I know before genotyping, before that it was mostly just Y chromosome tracing, right?
YE: Exactly, yeah, then it where we cannot find if she doesn’t have a Y chromosome and the mitochondria will match so many people in the world with the same mitochondria, so we tested that, it took us like, it was before lunch after lunch I got back to the computer and found that they share the same X chromosome. I called him told him you’re from the same mother.
BS: And then from there I hope that they’re still looking into their story but at least now they are in contact.
YE: Exactly, and also it’s a different story for himself now, just to understand where we come from. It’s a totally different story, I’m alone in this world, I and again, who knows what happened, if all my paper, every piece of document that I look and it tries to validate it with the authorities in Brazil, it’s like now, we’d like to hear this like birth certificate. And he went to the hospital that he was supposed to, where he was born. It’s a very small hospital in a rural town in Brazil with 30,000 people. It’s called The hospital, is a huge compliment.
It’s a few rooms connected or something that and they look at the record, it’s like no, not a single baby was born at a day that is written on your birth certificate. Your puzzled, what’s going on right now, some of the details are this. Oh, this person that signed it here is the adoption paperwork. It’s a judge that used to be here like 30 years ago. Something kind of like, it’s like a maze of murals. Some things like look, like as if every lab or some or not, but the DNA is the only piece of information that got that for sure, that’s we know for sure.
BS: It builds a solid foundation-
BS: To do almost any investigation maybe something was going wrong at the hospital.
YE: I think it’s more than to build an identity, it’s about identity forming and we’re helping these people to feel part of their identity and form the foundations of their identity in some places.
BS: That’s amazing. Now, obviously there’s the other side of it. We can talk about the benefits of knowing medically certain markers. But in the podcast itself, we mentioned that there is a difference between a full genome sequence and this genotyping that is done by most services, MyHeritage.. DNA being one of them. Can we just, I just want to go over that very quickly, What is the difference?
YE: The difference is that in whole genome sequencing, you look for quotes, everything. It’s nearly everything, but you basically look without any prior knowledge of what you’re looking for. With genotyping, we already, we focus on specific areas in the genome that we know that they are polymorphic, and have been documented as polymorphic at least in European populations before. This is genotyping, now the trick is that from these polymorphic areas, you could impute back quite a lot of the genome. Although, you didn’t sequence entire genome, and you just got a snapshot of 700,000 markers, you can impute back the status of about 40 million markers that are segregating in European populations.
BS: You’re comparing it back to almost a sample size genome?
YE: Yeah, and you do that and the concept here is to think about, I know you probably have this game in the States, that they show you a word and some of the letters are blanked.
BS: I know what you’re talking about.
YE: Yeah, there’s a TV show in Israel, I forget what they call in the States.
BS: It could be like Hangman.
YE: Yeah, it’s like Hangman, yeah. Something that, and the concept is similar. The genotypes that we obtain are like a sign later in the Hangman game. And all the places that we didn’t genotype are like this blanks that you need to fill. Now the thing is that, there are so many possibilities, write 26 letters in English and then you have multiple positions so it goes exponentially the number of potential words that you could think of. But here is a thing, we know that in English, there are certain letters that usually don’t come next to each other, there’s some covariance like Q and Q will not come next, is not a valid English word, or I know an X is like something that is quite rare to find anyhow.
You use all these hints, your brain can do it very fast and also the sentence need to make some sense. So you use all the scenes and then quite quickly, you can actually get back to the word. Now, how do you do that? Because you have like a mental dictionary of the words in English at what makes sense and whatnot. Same thing in genomics, you genotype these samples, and now you need to fill back all the blank pieces but since you saw already, many genomes in the past, so you can fill this, like you have a mental dictionary, the mental dictionary you can go back and now get these like a completion, with have some accuracy. It’s a bit error prone, it’s not perfect, but it can be quite accurate for especially common variations.
BS: Is that the difference between, say, MyHeritage.. DNA and 23andMe, or is it different?
YE: No, we use nearly the same platform. Actually, everyone, like nearly every company, there are four big companies in this community, we have Ancestry, 23andMe, MyHeritage.. and FTDNA. And we usually use the same, nearly the same platform, all these companies, genotype and don’t sequence and the point is that it just much, much cheaper to genotype, we talk about 10s of dollars versus whole genome sequencing, which is the order of hundreds of dollars. And this is an end customer product, it’s not something that you sell to businesses. It’s very sensitive to the price.
BS: At the same time this genotyping, we’ve talked about previously that the genotyping itself is not nearly as dangerous for people to get their hands on because they don’t have the whole sequence, they might not be able to find some information.
YE: I actually think that the genotype in fact, can give you quite a lot. Yeah, let’s think about it. Let’s take an example from Cold Spring Harbor, Jim Watson. He had whole genome sequencing, right?
BS: Right, during the Human Genome Project.
YE: Yeah. Now, the thing is that he wanted, he didn’t want to know, and also to disclose its APOE status. APOE is the gene that encodes that, if you have a certain little combination, you get very high risk for Alzheimer. And Jim thought, and that time he was like around his 80th birthday. He thought, “I’m getting old now, I don’t want to know, and I don’t want anyone to know. So, let’s release my entire genome because I don’t care about any other trait, except of Alzheimer. So just cut this space for my genome, this APOE region, just remove it from a genome.”
The thing is that you can impute back although it’s not genotype now, suddenly, you can impute back the APOE from the common variance it’s co-segregate with it from the rest and actually someone published a paper, this was in European Journal of Human Genetics 2009. Peter Visual Group, that they were able to impute back his APOE status and they surely they want to disclose Of course his status but as a positive control they took the Craig Venter genome, cut out the APE the same way that the Jim Watson genome, imputed back and show that they go the same results in his genome.
Now we talked about whole genome sequencing but the thing is that they use the same markers for invitation as the genome wide you typing away markers. You can use this technique even though you have genotyping arrays to know APOE the status. You can also calculate their response for various drugs because these are common variance. You can also calculate the progenic we score for different types of diseases, we talked today about how disease, there is a recent walk by the Broad Institute that show that if you fall within a specific like 2.5% of the people that the genotype fall in the category that is as was as familiar hypercholesterolemia for heart disease. And they can get it from the common SNPs, not from the whole genome data, just from arrays.
BS: That’s big. But I know there are restrictions in place at least in the US where you can’t provide certain medical advice based upon somebody’s genome typing, right?
YE: Sure, but just like what is dangerous and when we talk about dangerous, we don’t talk about essentially people that followed the rules.
BS: Right, I guess could you provide an example of, if I said does work versus advice that you can’t give or what would be dangerous and not following the rule just to clarify?
YE: It’s like, if it was a normal research … Not a research, just a clinical setting, your ability to give back the results of let’s say-
BS: The BRCA1 gene.
YE: BRCA1 gene or something like that then this results you need either to get clearance from the FDA for a product or if it’s under some sort of like physician patient relationships you could get it under what’s called lob developed test which the FDA basically does not regulate it, I can love this, it’s a chain of information without prioritizing to approve this product and there’s other regulate you need to have to test in a clear lab so on but that’s like more new ones.
That will be like in a regular setting, when you if you want to hack to someone genome or something that, then you just get the Jim Watson genome, impute back the APOE so now you learn something that you don’t give it even back to Jim Watson, use information for whatever benefits that you want to get from it.
BS: When I was talking Curtis Rogers co-founder of GEDMatch, he was mentioning that his dream for the future would be that everybody specifically owns their genotype or their genome and with permission you could give it to your clinician at any time and say, hey, I want to know what’s going on? That’s his dream for the future. He wants everybody to be genotyped at birth, and so on. And that would be your, it would almost be like your social security number or anything else would be this secret, you own it, but it is tied to important information. What do you see for the future?
YE: I would separate two things, I would separate, I do see the value for genotyping individuals. I want to give option for people before they’re genotyped, I want to educate people and let then choose not to do it in a mandatory way, that oh, now, the moment that my son was born was one of the most magical moments of my life and, the first few days were quite stressful and I don’t want this moment now to be think about, all is APOE status eight years from now and his BRCA or whatever and all these things, he was born he was healthy so we don’t want to kind of over diagnose him for anything at that point.
I don’t see the value of this information at birth. And also now let’s think about it, he’s seven years old I have another daughter that is four years old, both of them asked me to do, they hear me talk about DNA all the time, right? They’re curious and asked me to run a DNA test on them if MyHeritage.. for me it’s like super easy, right? I have these kits at home, I don’t even need to like by somewhere and so forth. I argued against I told him, you should grow up understand the meaning of this information. I don’t think even I understand the meaning like totally, like working in this field for decades. But it’s be more educated about this information to think for yourself if you want to do it, sure, yeah.
All my family nearly took a DNA test because they wanted to know where they are and all of that, but I do want to give my kids a choice and to be more strategic about it. I do share with Curtis, I think the value of information, my execution plan will be very different.
Also, DNA… It’s not a secret, my son shares half of the secret with me, my daughter share another half it’s not exactly, like there is some overlap between the two. But, if I had like about the thinking 11 kids, you already could get my genome because this is like, 99% of my genome. Yeah. I don’t think it’s like a great idea to use it as a secret or password but it’s a great idea to integrate it more into clinical care.
BS: I guess that’s an important question then, a lot of people would make the argument, okay, well, I don’t want to be involved in this, because of what people can learn about me. But it might be unavoidable in the sense that somebody can actually get your genome from if they pick up cup and swapped it and what have you. It reminds me Gattaca.
YE: It’s funny that you mentioned Gattaca because I think we go to the point, in Gattaca they had the DNA of this person that was not supposed to be there, right?
YE: And they couldn’t get who is this person? Now today, we are beyond this point already…triangulation all of that, forget about it, we are not there, they had these predictions if there is a nice thing in movie that the baby’s born and they say, life expectancy, 33 point something years and we’re not, of course, we’re not variant, it will never be there because they were too big of life expectancy is not that high but and it’s a good thing that we will not be able to be there. But in terms of forensic capabilities, we are better than Gattaca today, it’s only as fast Gattaca that you just you punch your other way the study from our lab, you sure that you can do it in one hour from saliva all the way to sequencing in MinION sequencer to identify someone one hour, we have to study the life 2017, we’ve even a movie to show that.
BS: These are those little, for lack of a better word it looks like an oversized SD card, or SD stick.
YE: Yeah, or a miniature a phone or whatever something in between, it’s something that weighs 100 grams.
BS: Can fit inside of a computer.
YE: We’re sure that even if this tech thing and we’ve it’s called Bento lab it’s second Bento box size centrifuge and then if you like a play Twitter.
BS: Is it worth mentioning a Bento box would be a box for almost a Japanese lunch. So it’s like a lunch box?
YE: Yeah, it’s a lunch box. This is a called the Bento lab because it’s modeled after Bento box you just have like a centrifuge, a heater and a few more things. So we use just this Bento lab plus mean iron sequencer I took a swab for my cheek give it to my postdoc we have everything like recorded like we’ve o’clock right. To make sure from the beginning all the way to the end on the roof of the new ob-genome center and very quickly were able to genotype and then to know whether a person is in the database or not. Not to know familiar, not to do familiar searches but just if whether the person. We’re getting, for generating information we’re not there will respect to Gattaca. But we’ve after you generate information we avoid Gattaca all ready. We passed this point, we passed this point probably like six, seven years ago.
BS: We’re talking a little bit about the future here. Gattaca is scifi future here, you talked about the almost as it is right to know your background so many people adopt these might have a better idea of where they’re from. In that future is that your ideal future, then people that are given almost the free right to know who and where they are, is that something maybe the government would help people find their family.
YE: I’m not sure that the government needs to be involved in this, like intimate things like that. There are the government needs like people that people can create businesses that can serve people and ethically and operates. But I do think the technology is getting better and better every year. Like, look even at the prices of direct to consumer companies. I purchase my test this was before even my heritage offer this nice day. So I purchased from 23, I mean, 2012 and it wasn’t an offering DNA day and I think it costs like $300 or 200 it was that fair.
And now you can get these tests like when we are like in a more conservative human genetics, we sold these like boxes for $55, right and we started with $79 two years ago. The price for sure we’ll go down because it’s just getting cheaper and cheaper to genotype people. And also, we kind of like, it’s without going into the specific details. It’s not just the genotyping, there is also a supply chain that you need to tune. But every year, all the companies can keep adapting their supply chain so they can squeeze, or we can shave another dollar from here, we can shave another dollar from there and then you can keep lowering the price.
It’s kind of like how do you do the shipping in the most cost effective way? How do you collect DNA and how do all these things that you can now keep reducing the price so it will be cheaper and cheaper and cheaper for sure.
BS: We’ve talked a little bit about the future everything’s getting cheaper and cheaper. Is there anything in particular that you wanted to talk about that we haven’t mentioned so far?
YE: I think you know we published this paper that showed that, you can catch criminals and you can connect that adoptees which is great but also these services especially GEDmatch it’s open to everyone with an internet connection so the same way that the police can find criminals which is I guess everyone is… know the day that the Golden State killer was captured was a happy day for humanity with the exception of his family and then himself. So it’s very happy day and we can…that’s a good thing, but also the things websites the same strategy can use to identify other people. Now we calculated and we found that we think that today 60% of US individuals with European heritage are subject like this technique, can work and identify third cousin for them.
BS: It’s at 60%?
YE: 60% of the US.
BS: That’s like two thirds of the US.
YE: Of the US like Europeans, it’s like it’s there already to two third of the population so two third of two third. Although the other third chance it works also for Afro-Americans, it’s not the same level but I think it should be based on calculation we talked about 30% chance. And it works for other types of ethnicities I think we found the lowest success rate for Asian in our database but you know that it will change as also more people take the test.
BS: More people of that heritage for a larger pool of data.
YE: This future means here’s the thing and this is something that I am bothered by. I don’t think the police is the problem here, yeah it’s very easy not to cover it like to verify the police and there are many problems with the police but I think the police in general. What are we talking about so they will start to use it to for what? For people that we’ve no parking tickets, probably no.
BS: Isn’t it too expensive?
It creates some asymmetry it means that a foreign player can cast genetic surveillance on large part of the US population. And what is the meaning for the ability of the of US government to operate that’s in covert operations for instance.
And this is a concern if these adversaries of the US, they can now let’s say there is some covert operation of US forces, it’s very hard not to leave behind your DNA. If you pee in these operations and you don’t take or you just even sweat and touch something there is a chance that you will leave your DNA behind.
And in this case the ability to go and with this small database to nearly identify everyone in the US population means that these people that were part of the operation are now subject to be identified. Now the whole point of this operation that they cannot be identified because it creates risk for them, for them for their families. See what the Russians did for their own like spies that went like flip sides now this is the one in the UK. It is like he was not even a big spy, it was a very small fish and still it was more for them, it was worth it for them to actually invest and send these two “tourists” within a neurotoxin.
I don’t want to be like fear mongering but I just want to be realistic about things and to say and but we think that we can do something about it. I think there is a mitigation scheme that if all companies will adopt, we can really prevent this type of harmful consequences.
BS: What kind of steps would this include?
YE: It’s like surprisingly simple. What if all companies will, before they give the users they’re even labs right not just companies but it can be just companies for this thing to watch. That before they give the user the raw genetic data which is just a text file they will sign the file with a cryptographic key. The file is still a plain text but you have another line that’s gibberish that signed by the company. Now when this file is uploaded to GEDmatch characters that was here, can now look at the signature and run very quick algorithm that says the file was not temporary. The signature is valid, that it belongs to company x.
If this is the case, GEDmatch will process the file seamless for the user, this operation takes like a fraction of second the user will not even know that there was something new about GEDmatch. If it doesn’t have the signature you serve a different web page for the user and in this web page you can say like who exactly you are. Maybe some user that temper the file is that please get your data from the company again. If you are a police, and GEDmatch wants to support the police, you have a different onboarding process you put maybe some paperwork of the police to say that you represent that you’re searching for the person that you’re saying that you’re searching some legal work.
Some streamlined legal work not like every time to build this contract from scratch. But some of the onboarding process and if it’s not the police and if it’s not a normal user, and it’s some whatever person from the KGB river right for it will not reach it.
I said KGB ’cause they don’t exist anymore. But it’s whatever person from foreign intelligence that an advisor for the US, they can just… they probably will not reach out to GEDmatch but now they can in fact be much more complicated for them to get their data.
BS: This is almost like a genotyping tamper seal…
YE: It’s also interesting for MyHeritage…to take on this type of study and you see this was done with the blessing of the CEO of the board of like MyHeritage.. employees. Of course, it was because I’m interested genetic privacy but the point he was that its people vilify Silicon Valley, or just high tech company.
We’re in Israel but we think ourselves as a Silicon Valley company to some extent right and that all the kind of like they just like moving forward they don’t take any ethical thinking about what they do they and he would kind of he was a company that is actually thinking about what we do. How to protect our users? How to create a sustainability and also importantly we do it not five years after the first case was in the news but we do it like immediately a week after the first case we started working on this manuscript because the horses are still in the barn.
They didn’t leave the barn yet, so we always think that we can do to prevent some issues five to 10 years from now and important want to emphasize that we are now building the infrastructure for the next decade for genomics. This is just the beginning what we see, this is just the genotype has department like 17 million people all over the world. It’s nothing, we’re building this thing now, it’s a drop in the bucket so we are building now the infrastructure so this is important to think about these issues right now.
BS: You do believe that this is going to become something more than just people opting in based on curiosity but a lot of people do it because becomes a cultural sensation.
YE: I think so but also and I want to build this technical means that will allow people to have full control over what to do with their data while reducing externalities of their actions on other people. Which is kind of like what you want from a liberal society.
BS: That’s excellent, I guess something that I should want to ask is, do you think there’s a certain amount of fear and a certain amount of misunderstanding of science that just kind of comes with the territory, we see this frequently enough with GMOs or even in some cases vaccines or global warming. So, what about genotyping and genetic sequencing?
YE: I think there is a general mistrust and the population, you look at like polls data from especially from United States about like different institutes. And they all go like going through the drain basically like compare things from the 70s to now, it’s like consistently like monotonic decreasing function here. And I think that currently genotyping is not associated with a government activity or something like that with companies. And I think there is more trust in this domain and also and conflict, despite what to describe, also it seems like across the board with data. Like people give Facebook their data, people use Google and share the most personal searches, which is much more scary than genomics.
And in all of that, it’s an interesting conflict phenomenon to think about, there is general mistrust or there is trust is going down, but the other hand, like you do give this data to companies. And this I think, for more philosophical perspective, it says I think something about the difference between first between attitudes and behavior and also between conflict, like how much people prioritize their privacy compared to other types of utility that they get from websites. And it seem the people put their utility much, much higher than privacy.
Of course if you ask people, do you care about your privacy like do you care about your privacy but you know use Facebook that they just share it to like here, like give it to the Cambridge Analytica now and I also use Facebook, I’m not like some saint or something like that. And I think the most striking example is the case of Ashley Madison. Just remind to our listeners that Ashley Madison was a website or is a website for cheaters with the slogan “Life is short, have an affair.”
BS: Ashley Madison?
YE: Yes. Yeah, well, pronounce it properly like… its Ashley.
BS: No, no, it’s. I’m just talking with my Long Island accent.
YE: That’s my Israeli accents, it’s Ashley. Is Ashley okay? Anyhow, in this website that the thing is that the website was hacked two years ago, 36 million profiles were leaked. This is data that is far worse than genomics, far worse. We talk about first I downloaded the data, this is why I know what was there? I actually look at that.
BS: You did?
YE: Yes, email addresses, credit card numbers, the passwords were not protected, the passwords were like, basically, with MD5, which is like a child game to break it. It’s like we don’t use MD5 I don’t know for many years already, but they use MD5 to hash the passwords. So the passwords… so if you compromise like, not just this account but also other accounts of the users and sexual preferences, sexual orientation and there’s being a cheater.
BS: Oh my gosh.
YE: This is like, far worse than any leakage in genomics. 36 million people now, just to give you a scale I looked at the number of email addresses we’ve is very suffixes and there were 200,000 addresses like that. And we need where we have 2 million households which means that 10% percent of the country use the website. Anyhow, the point is that this website know it was a catastrophe and I thought this guys are going too far. I thought this is like before and after Ashley Madison will not get this, it’s not the same internet.
And there was even a New York Times article about how are we going to see an Ashley Madison recession, and how because of all the divorce rate versus going to jump through the roof the price of small apartments will go high because people, demand and I thought I’m going to make some money. I know privacy I will buy the small apartments, if we like make some money out of the Ashley Madison thing. Anyhow nothing happened, few people committed suicide and that’s it. It’s like gone, gone with the wind. The funny thing is right now Ashley Madison is more popular website than Cold Spring Harbor.
If you look at like you look to an Alexa site in for which is like a measures the traffic to websites Ashley Madison is a top 5000 website. Google is the first, Facebook is a second this is the top 5000 website Cold Spring Harbor, I think is not even like a 50,000 it’s not and also like even I know some NIH websites with that protect your genetic data and of that which people keep using the website. And tells you conflict, talk about utility versus privacy. And here’s an example about like where is the mind of people really.
BS: I guess you could also say, all press is good press.
YE: I’m not sure I think there were some point that’s bad press and things like losing all your data of its web plus also accounts that they said that they deleted you had to pay kind of like a special fee to delete to wipe your account so they took the money and they never deleted.
BS: Oh my gosh.
YE: Or they deleted in a way that was like was compromised a bit.
BS: And people are still trusting this site?
YE: People, they use the website, they use the website.
BS: That’s a utility.
YE: I think now they switched the slogan from “Life is short, have an affair.” To something like “Have your moment.” Or something like that. It’s totally a new branding.
BS: It’s interesting but I mean at the same time Ashley Madison and even people trusting things like Facebook Messenger and Facebook and Twitter and what have you even though it is easy enough to gather information on these things. Even ad analytics…. our own Facebook tells us our followers, what their likes and dislikes because they told Facebook. We want to tell them what they like and dislike and tie it into our science and give them good storytelling but not everybody has as genuine intentions. But nobody’s worried about that because it’s so integrated into their life. Genotyping isn’t really integrated.
YE: That’s right but I think the difference in DNA is that even if some people say, “Oh, I don’t want anything to do with it.” The externality here is that it’s your third cousin that decides to go and now you know. Now we share the Golden killer. Good luck with that. That’s, I think the difference between and DNA is the, from all the other types of information has this ability, that none of the other types of information is that this ability to affect very far relatives.
BS: So it is worth mentioning because some of our viewers and some of our listeners might bring it up themselves and comments and questions. I think most people are aware who are following this kind of news that MyHeritage was hacked not too long ago. Was it 2016 or 2017?
YE: Half year ago.
BS: Half a year ago, but that was exclusively emails, correct?
YE: That’s was emails and probably hashed passwords. The interesting thing is that we were hacked like in October 2017 and we couldn’t know about that until a company that monitors the dark web found that here there was like some sort of list of email addresses of MyHeritage customers that someone offered to sell. So they contacted us and we verified that the list is in fact, is authentic and not something kind of scam. Since we have users joining the website every few seconds so we can see like the moment in time that they obtain the list because you see all the users and then it stops a bit which really help the investigation.
And everything was discussed in the public… we reported it immediately. Not like Yahoo…who was like, “let’s think about it for three years before we let everyone know.” We reported it immediately. Which is ethical thing to do and then, our engineers, basically we called all the engineers in the company and they worked for 24 seven for like really like people like stepped on couches and the point was first kind of like… we had these in our roadmap like security features that we wanted to roll out like two factor authentication and to do better code review and certain types of the website.
Also there were like cover just to remind this website is its kind of you think about is one website. But in fact, it’s kind of like many layers that were built because it started as a standalone software just for genealogy, then it integrated through some website. There was like we have Genia on top of it, they were historical records. We bought some companies also so integrated their code and it’s there so and then DNA and so on so the good thing is you know this was just the email addresses and the password no DNA which is-
BS: Most people were going to be worried.
YE: Most worried about but since we are DNA company that’s where the news we’re like, a DNA company was hacked and you try to explain yeah but these are not even an email addresses was from the genealogy kind of like part of things not from the DNA.
BS: Oh wow, okay.
YE: It’s like the DNA is like we don’t have 92 million people with the DNA otherwise I wish we had and since like basically the engineers like, there was like a roadmap that need to execute now this security features within days and they work really, really hard to take us. We completed all day now the basically, the plan for 2018 within like a week or two for security, why to just to do this. And then we change many things the way that we do stuff in the company like are like who has, like we would use the number of people that have access to the data, put more sensors to detect and those activities. And hired, in Israel there is like the great thing about it, there is a strong community for cyber security. So we hired consultant also from tech top companies, people that certainly days or intelligence to help us kind of like we think about our security practices. And so it was, I think, in a way turning lemons to lemonade. This was I think, it sucks that it happened. On the other hand, at least it was just the email addresses and not more sensitive information, which allows us now to protect everything else that we accumulated about our users in much more, it was perfect, very good teachable moment I think.
BS: Thank you. I think we covered all, I appreciate it.
YE: Sure, thank you very much.
BS: That was a lot of fun.
YE: Yeah, it was fun.
BS: All right, that’s it. If you’re still with us right now, thanks so much for tuning in and listening to this whole chat. We might do more of these long-form, unedited Q&As, it’s really up to if everybody liked it. Be sure to follow Cold Spring Harbor Laboratory on Facebook and Twitter and you can let us know what you’ve liked about Base Pairs thus far, what we might want to change, and if you also like these long, special episodes formats. Keep in touch and look forward to more!