Missing link: On the DNA trail of the perpetrators

DNA traces can be the decisive clue in the search for a perpetrator. If traces from several people are mixed, analysis is difficult, but not impossible

Save to Pocket listen Print view
Ein Laptopbildschirm zeigt eine schematische Darstellung eines Körpers und seiner Organe mit DNA-Doppelstrang

Traces must be secured at the crime scene.

(Image: Gorodenkoff/Shutterstock.com)

13 min. read
By
  • Imke Stock
Contents
This article was originally published in German and has been automatically translated.

DNA analysis has revolutionized law enforcement. As a forensic tool, it has become an indispensable part of modern criminalistics. DNA traces have helped solve countless crimes, cold cases have been solved and innocent people have been exonerated.

Since the discovery of genetic fingerprints in 1984, DNA analysis methods have become increasingly sophisticated. Where previously a visible bloodstain was required, today the trace can be smaller than a flake of skin. For the analysis, only cells with cell nuclei must be found in a trace. Ideally, the analysis would provide a single DNA trace profile of a person. However, mixed traces, i.e. DNA traces from different people that have overlapped, are often found. This happens particularly frequently with minimal and micro-traces.

Every person is a biological trace layer in their environment: on average, a person loses 50 to 100 hairs per day. In addition, thousands of skin cells fall off the body every minute. Apart from minimal residues of classic biological traces - i.e. body tissue and fluids such as blood, saliva or hair – traces of skin abrasion can now also be examined as contact traces in DNA analysis. In pilot studies, DNA traces were found in the analysis of air and dust samples and in air conditioning filters at fictitious crime scenes.

This article explains how complex mixed traces can be analyzed thanks to algorithms and compared with the DNA database and why a hit still does not provide absolute truth for guilt or innocence.

"Missing Link"

What's missing: In the fast-paced world of technology, we often don't have time to sort through all the news and background information. At the weekend, we want to take this time to follow the side paths away from the current affairs, try out other perspectives and make nuances audible.

The exchange principle according to Eduard Locard is a basic principle of forensics and scientific criminology: every contact leaves a trace.

Traces can be found at the crime scene, the victim, the perpetrator, the means of the crime or other locations relevant to the case. The more intensive or longer the perpetrator had contact with an object or person, the more traces may be present.

For example, DNA traces could be found on the handle of a handbag that the perpetrator snatched from the victim as they walked past and later threw into a bush. Where a (non-visible) trace of DNA is suspected, the area is rubbed with absorbent cotton. Whether a trace can be analyzed can only be determined after examinations in the laboratory.

DNA trace evidence using a cotton swab on a pistol.

(Image: BKA)

In the laboratory, the lengths of certain DNA sections are made visually visible as lines from a DNA trace. The term "genetic fingerprint" has become established for these DNA patterns in analogy to the skin ridge pattern of the fingers, the classic fingerprint. The classic fingerprint is unique. Even identical twins have different fingerprints and are unique in this respect. As identical twins, however, they have an identical genetic fingerprint (DNA profile) as they have the same genetic make-up.

Deoxyribonucleic acid (DNA), the carrier of genetic information, is found in the form of chromosomes in the nucleus of all cells in the human body (with the exception of erythrocytes). DNA consists of four basic building blocks: the bases adenine (A), guanine (G), thymine (T) and C (cytosine).

Schematic representation of DNA: In the cell, the chromosome is located in the cell nucleus, at the end of which the basic building blocks can be seen in the double helix.

(Image: OpenClipart-Vector/Pixabay.com)

A person's entire DNA consists of several billion of these basic building blocks, which form a ladder-like double strand (double helix) in different sequences. The majority of all human DNA is non-coding. This means that no information is stored there that is responsible for the formation of a characteristic such as eye color or hair color.

To create a DNA profile, the DNA is examined more closely in defined, non-coding areas. These areas consist of a short sequence of building blocks, the so-called short tandem repeats (STR), which are repeated a different number of times in succession. These repeating DNA building block sequences are counted.

Evaluation and typing of a DNA trace in an electropherogram showing the DNA sequences.

(Image: BKA)

The number of repetitions of the STR is given in simple numerical values, the DNA is "typed". Counting currently takes place in 16 areas, with each area representing a feature system. As DNA is a double strand, there are two characteristics (so-called alleles) as values for each characteristic system. The DNA as genetic material represents the paternal and maternal inherited traits. A DNA profile is a numerical pattern in which a total of 32 values are listed for 16 trait systems. For the DNA profile, highly variable areas were explicitly selected for the trait systems, so the number of STR repeats present there can vary greatly from person to person. In addition, the sex-specific amelogenin is determined - two X chromosomes (XX) in a woman and one X and one Y chromosome (XY) in a man.

The finer the DNA analysis methods, the more likely it is that even small amounts of DNA material can be found. A trace that has more than two alleles in one trait system is usually a mixed trace - provided there is no mutation or genetic peculiarity. Traces can overlap, they can have come to the trace carrier in different situations, from different sources and also at different times. For example, two people could shed skin cells and a third person could cough up skin mucosa cells and thus contribute to the previous trace mixture of the two people. Especially in the case of contact traces, a mixed trace is often found instead of a single DNA profile. The more people who have had contact with an object, the more different DNA can theoretically be found on this object.

Symbolic image for a complex mixed trace: colorful Smarties as different DNA sources in a DNA trace layer.

(Image: Voronina Svetlana/Shutterstock.com)

In the example of the handbag robbery, the victim's DNA would probably be found on the bag. Depending on how hard the perpetrator pulled on the bag and how he handled the bag after his escape, more or less of his DNA could be found as skin abrasion marks. If the victim had previously lent the bag to another person, the DNA of this person could also be found on the bag.

The transfer of DNA could also happen indirectly via other people or objects or be the result of contamination and lead investigators on the wrong track.

When examining and analyzing traces in the laboratory, the challenge is to sort such a mixed trace apart, i.e. to separate the individual DNA profiles.

Symbolic image: Mixed trace separated from various Smartie DNA traces.

(Image: 3d_kot/Shutterstock.com)

The informative value of DNA traces is not always unambiguous. DNA is sometimes classified as uncertain evidence. During the proceedings, investigators and forensic experts not only have to clarify who the trace probably came from, but also how and when the trace was created and under what circumstances. And whether or not it has anything directly to do with the crime. The court must be able to come to a conclusion whether the trace is relevant to the crime and whether a person is sufficiently likely to be the source of the trace or not.

When comparing a DNA trace with a possible perpetrator, the feature systems are compared for a match. As the feature systems are only a section of the total DNA, statistical calculations are carried out in the second step to determine the probability of identification. The genetic fingerprint is based on calculations of how rarely a particular combination of traits (the DNA profile) occurs in a reference population in the population. The result of a match between a DNA trace and the DNA profile of a trace originator is then a "biostatistical probability" in the form of a probability quotient.

The Likelihood Ratio (LR) method is used to assess the probative value of a DNA finding in relation to a specific person as the source of the trace evidence. The probabilities are calculated on the basis of mutually exclusive hypotheses. Each hypothesis describes a clear scenario for the occurrence of the trace.

It is calculated whether the statistical probability that hypothesis 1: This person/suspect contributed to the DNA trace/is the source of the trace is many times higher than the probability for the alternative hypothesis 2: A random other person from the population is the source of the DNA.

In the case of mixed traces, the calculations for the probable identification of the trace originator become more complex, as many combinations of alleles and probabilities in the composition of the trace must be considered. LR can be applied to incomplete partial or mixed DNA profiles and calculate the likelihood ratios for alternative explanatory scenarios. For example, the individual comparison patterns of both the suspect and the victim can be included in the calculations as possible contributors to the DNA mixture. As DNA profiles of relatives are similar, LR is also used in cases in which the possible causative agents are related to each other.

In the example of the handbag robbery, if there are only two DNA profiles in the mixed trace, the following two hypotheses can be considered:

Hypothesis 1: The mixed trace comes from the victim and the suspect.

Hypothesis 2: The mixed trace comes from the victim and another unknown person who is not related to the suspect.

The likelihood (probability) of the trace being created is then calculated, assuming the respective hypothesis. The result of the LR value is a probability quotient. The probability that the evidence occurs under hypothesis 1 is divided by the probability that the evidence occurs under hypothesis 2. A statement is therefore made about the ratio of probabilities and not that a probability is true or false.

When interpreting an LR result, there can be a significant misunderstanding, a "prosecutor's fallacy", which confuses two conditional probabilities. Suppose a person is charged with purse snatching. The only evidence presented by the prosecution is a DNA trace found on the handle of the handbag, which matches the DNA profile of the person charged. According to the DNA expert opinion, there is a 1 in 10 million chance that a random other person (not the defendant) has the same DNA profile as the trace found. The prosecutor could wrongly conclude, by implication, that the probability that the defendant is innocent is also 1 in 10 million. Guilt or innocence cannot be determined by probability calculations. A DNA match as evidence is generally not sufficient for conviction as the perpetrator and a conviction. This always requires further investigations into the circumstances.

Apart from this problem of interpretation, there are also difficulties when hypotheses are not mutually exclusive and exhaustive. This is particularly the case for complex mixed traces with more than two trace originators. In such cases, individual hypotheses relating to a number of X different possible trace causative agents and combinations come into consideration.

The BKA's DNA analysis database (DAD) has been existing for over 25 years. To date, more than 390,000 DNA hits have been obtained, leading to more than 300,000 cases being solved. As of December 31, 2023, a total of 1,183,479 DNA data records were stored, of which 804,737 were personal data records and 378,742 were trace data records.

Mixed traces cannot simply be stored in the DAD. There are mixed traces in which a dominant DNA profile can be recognized due to the differences in intensity of the characteristics, which is considered to be the main cause of the trace. This DNA profile can be extracted and compared with a trace originator and/or stored in the DAD. In unsolved relevant cases, mixed traces that do not contain a dominant DNA profile can also be compared with the DNA database since 2020. The BKA uses the algorithm-supported SmartRank software for this purpose.

SmartRank: Example of a mixed track.

(Image: GitHub)

SmartRank: Example of a known DNA profile, the Locus column shows the STR trait systems, the right-hand column shows the respective two alleles as values.

(Image: GitHub)

The open-source software was developed for more efficient searches in extensive DNA databases with partial trace profiles or complex mixed traces. SmartRank calculates the LR value for all DNA profiles in the DAD whose patterns could be contained in the mixed track. The rate of false negative and false positive matches should be kept to a minimum. DNA profiles that exceed a defined LR threshold and could be potential (co-)originators of the trace are listed in a ranking according to their probability. These clues can lead to new investigative approaches in cases that previously seemed hopeless.

(mack)