Boystown investigations: Catching criminals on the darknet with a stopwatch

The investigation file for the Boystown trial contains references to a timing analysis that can be used to deanonymize Tor users. There is no remedy in sight.

Save to Pocket listen Print view
Hand with watch

(Image: Erzeugt mit Midjourney durch heise online/mid)

9 min. read
Contents

According to the investigation files, the analysis of data traffic played a key role in the deanonymization of the operator of the darknet paedo platform Boystown. This was reported by the political magazine Panorama. The investigators did not exploit a security gap in the Tor anonymization service, but rather temporal correlations to be able to trace the path of the data through the Tor network to the recipient.

To anonymize users of the Tor browser, the connection is encrypted at least three times and routed through three different servers all over the Internet before it reaches its destination. At the beginning there is the so-called Entry Node, also known as Entry Guard, to which the Tor browser connects end-to-end encrypted. Only this node knows the true IP address of the user.

From the entry node, the Tor browser establishes an end-to-end encrypted connection to another Tor node, the so-called middle node. The middle node only knows the IP address of the entry node, so it does not know which user is behind it. The entry node, in turn, does not know what Tor users and middle nodes are discussing, as it only sees the encrypted communication between the Tor user and the middle node.

To anonymize Tor users, data traffic is encrypted end-to-end three times and routed via three Tor nodes around the world. If you compare incoming and outgoing data packets, you can establish a connection without having to decrypt the data due to the deliberately low latency in the Tor network.

The Tor browser contacts at least one other node, the exit node, via the middle node. However, the middle node cannot read the data because the connection between the Tor user and the exit node is also end-to-end encrypted. The exit node, in turn, does not know where the user is located, as it only knows the IP address of the middle node. Only the exit node establishes the connection to the target website (hopefully encrypted via HTTPS). If the destination is a so-called hidden service from the darknet, the data is routed via three more Tor nodes and encrypted again each time.

Cascading at least three times ensures that the entry node knows the user, but has no idea what they are using the Tor network for. The middle node is virtually the most clueless, it knows neither the originator of the data packets nor the destination nor the purpose, it is merely the middleman between the entry and exit node. The exit node, on the other hand, knows where the data is flowing to, but has no idea who the originator is.

In addition, the Tor browser changes the middle and exit nodes after ten minutes at the latest so that connections cannot be traced over a longer period of time. This is what makes it so difficult for investigators to find out the identity of Tor users.

In the so-called correlation analysis, also known as timing analysis, the authorities take advantage of the fact that Tor is a low-latency network: Data is routed through it in real time wherever possible. The delay is usually so short that even live streams and live chats can be run via Tor. For example, if a Tor user starts downloading a large file, an investigator monitoring the exit node's traffic might notice a corresponding increase in packet volume. Due to the low latency, the outgoing traffic to a specific server – would increase at the same time and the middle node would be exposed without the authorities gaining access to the exit node or decrypting the data.

An increase in incoming and outgoing traffic could also be observed at the middle node in the same temporal context, thus determining the entry node. And in the next stage, the user himself could be deanonymized if the entry node could be observed. With around 8000 Tor nodes worldwide, it hardly seems feasible to monitor a relevant number for such temporal correlations.

In contrast, Tor is particularly vulnerable to live chats and instant messengers due to its low latency: a message is transmitted instantly from the sender via the Tor nodes to the recipient. According to Panorama, this is precisely what the authorities in the Boystown case are said to have exploited by communicating with the alleged operator via the chat software Ricochet, which encrypts the data and transmits it anonymously via the Tor network.

Since the investigators, as the originators, knew exactly when they were sending a new message, it was sufficient to monitor several hundred Tor nodes for simultaneously incoming data packets of a similar size – presumably by the authorities renting a corresponding number of fast, well-connected servers and placing them on the network as Tor nodes. Since the Tor browser switches exit and middle nodes every few minutes and favors nodes with low latency and high bandwidth, it was only a matter of time before their Ricochet interlocutor used the investigators' Tor node as a middle node. This allowed them to determine the entry node.

To get the suspect's IP address, they would have had to redirect him to one of the investigators' entry nodes or monitor or take over the node he was using. Due to previous attacks in which Tor users changed entry nodes to nodes controlled by attackers in a short timeframe and were then exposed, the Tor browser now uses the same entry node – for several days to several weeks, which is why it is now referred to as an entry guard. It could take months before the person suspected in the Boystown case switches to an entry guard controlled by the authorities.

It is still unclear from where, but the investigators apparently knew that the suspect was using O2 as his internet provider. They therefore chose a different approach: based on the correlation analysis of the middle node, they had already found out the IP address of the entry guard – and could hope that the suspect would continue to use it in the coming days and weeks. So the next time the suspect was online in Ricochet, all they had to do was ask Teleofnica for the addresses of all the O2 customers who were currently connected to this very Entry Guard. The result should have been a fairly short list.

This is by no means proof that one of the people is the Boystown operator. However, narrowing down the list to just a few people allow the authorities to concentrate their investigations. Contact with the Entry Guard or the temporal connections between data packets are at best a small indication. Convicting the perpetrator remains traditional police work – Correlation analysis has only helped to sift out a few suspects from the thousands of Tor users worldwide.

The correlation analysis method has been known for a long time and is said to have played a role in the seizure of the darknet forum Deutschland im Deep Web (DiDW) back in 2017. At the time, there were daily telltale connection failures of the hidden service, which turned out to be DSL forced disconnections of the operator's Internet access. In issue 22/2017, c't also reported in detail on the method and the crucial role played by the middle node in correlation attacks.

Making the Tor network more robust against such correlation attacks will be difficult. If there were a multiple of the current 8000 Tor nodes, attackers and investigators would need far more servers to be selected by the suspect with sufficient probability. However, the biggest problem with timing analyses is the low latency that makes the Tor network attractive to users. For example, nodes could collect data packets, compress them or mask them with additional, random data so that not every incoming data packet is immediately recognizable as an outgoing packet of almost the same size.

Tor users should take care to use as few real-time applications as possible, as these are particularly susceptible to correlation analyses. There is no general protection, as even a compromised hidden service could split images and other data into packets of very specific sizes or send them at specific intervals, creating a characteristic signal that investigators can easily track through the darknet.

In the long term, the Tor project will have to come up with a solution. After all, state investigators are not always out to find paedophiles. In some countries, it is opposition politicians, dissidents or simply dissenters who are hunted on the darknet and, in the worst case, pay for inadequate anonymization with their lives.

(mid)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.