Genetic Data: Why sharing is such a dilemma

The last two decades have seen a boom in the dissemination of genetic data. However, this also comes with some risks.

In Pocket speichern vorlesen Druckansicht
200 Megabyte Daten in DNA gespeichert
Lesezeit: 6 Min.
Von
  • Dr. Scott Thiebes
  • Prof. Ali Sunyaev
Inhaltsverzeichnis

Dieser Artikel ist auch auf Deutsch verfĂĽgbar.

In the last two decades, the distribution and processing of gene data (i.e., digital representations of an organism's genetic sequence) has increased rapidly due to enormous scientific and technological advances.Today, access to genetic data is no longer the exclusive preserve of research, but has long since found its way into various areas of life, including healthcare, law enforcement and the consumer market.

Companies such as 23andMe or Ancestry offer "direct-to-consumer genetic tests" for genealogy, paternity tests and the generation of health reports in the global consumer market. Genetic data is also becoming increasingly attractive to hackers and other malicious actors. It only became known at the beginning of October that hackers had captured the family tree data of millions of customers of 23andMe, one of the most prominent providers of genetic tests for consumers, and offered them for sale online.

In healthcare, for example, genetic data is regularly used to analyse pre-symptomatic relatives of cancer patients for specific genetic mutations that cause the disease. In law enforcement, genetic tests to identify criminals in serious offences such as murder or rape have long since become routine.

What makes genetic data so exciting for the aforementioned use cases and for people with dishonest intentions are their special properties. In particular, genetic data allow for conclusions to be drawn about the health and behaviour of individuals. For example, an increased risk of breast cancer can be identified by certain genetic mutations in the BRCA1 and BRCA2 genes. In contrast to many other health-related data, such as once-recorded vital data, the information content of genetic data does not decrease over time. On the contrary, its information content increases as our understanding of human DNA advances and genetic analysis methods become more sophisticated.

In addition, a person's genetic sequence is relatively unique and is subject to only a few changes over time. The probability of two people having the exact same genetic sequence tends towards zero. Even identical twins can have genetic differences. Genetic data therefore serves as a unique identifier over a long period of time. At the same time, we share part of our genetic sequence with our blood relatives. This allows us to draw direct conclusions about family relationships and the health of our relatives based on our own genetic data.

All these characteristics mean that, in addition to the potentially positive aspects of sharing genetic data, such as its importance for research or the diagnosis and treatment of serious diseases, it also harbours considerable privacy risks.

A prominent example of the various privacy risks arising from the sharing of genetic data is the case of the notorious Golden State Killer, Joseph James DeAngelo. Between 1976 and 1986, DeAngelo committed several rapes and murders on the US West Coast. Although investigators found DNA samples from the killer at some of the crime scenes, they were unable to match them to DeAngelo at the time.

The turning point only came in 2017, when investigators uploaded DNA samples from crime scenes to the publicly accessible genetic data website GEDmatch. There they identified several distant relatives who had apparently previously shared their own genetic data on the same website. After extensive investigations, this ultimately led to the identification and conviction of DeAngelo in 2018 – more than 30 years after his last known offences and without him ever having made his own genetic data publicly available.

Although it is certainly positive that DeAngelo's distant relatives unintentionally contributed to the capture and conviction of a serious criminal by sharing their own genetic data, the case of the Golden State Killer and the increasing dissemination of genetic data nevertheless raises many questions regarding genetic privacy. These questions by no means only affect a small portion of society. A study published in the journal Science in 2018, for example, concludes that 60 per cent of the white US population could be re-identified through an anonymised DNA sample, even if they have never shared their own genetic data with a genealogy database or a government agency.

Genetic data play an indispensable role in modern biomedical research and have already led to numerous scientific breakthroughs. Broader access to genetic data from a more diverse population raises hopes for further medical advances in the future. Therefore, the possibility of data donation provided for in the German Patient Data Protection Act, especially in connection with genetic data, seems right and important. However, it does not take much imagination to realise that such information, and gene data in particular, is becoming increasingly interesting for cybercriminals and other malicious actors.

It remains open whether the end, namely progress in medical research, justifies the means, namely the donation of genetic data. Paradoxically, this decision is probably in the hands of every potential donor, although this decision can also have a direct impact on relatives. In addition to technological safeguards, it is therefore essential that potential donors are fully informed about the possible risks. This includes not only the fact that genetic data is difficult to anonymise effectively, but also in particular that sharing entails privacy and data protection risks for relatives.

Dr Scott Thiebes from the Karlsruhe Institute of Technology (KIT) dealt with the dilemma of genetic data donations (PDF) in his dissertation and received an award for his work at DAFTA. Professor Ali Sunyaev heads the Critical Information Infrastructures (cii) research group at KIT.

(mack)