16 billion credentials: No new leak, lots of old data
A report of an alleged data leak of 16 billion access data is currently doing the rounds. However, much of it is old.
(Image: Black_Kira/Shutterstock.com)
Many media outlets are currently reporting on an alleged massive data leak in which 16 billion access data for "Apple, Facebook, Google and others" (as Forbes headlines it) has fallen into the wrong hands. The source is once again Cybernews –, which has already attracted attention in the past with massive exaggerations and the sensationalist promotion of discoveries of data dumps with old, already known leaked data. In this case, too, excitement about alleged data leaks is misplaced.
Now Cybernews writes under the almost fitting title "The 16-billion-entry data leak that no one has ever heard of" that anonymous security researchers have found 30 exposed data dumps since the beginning of the year, each with tens of millions to 3.5 billion entries, which add up to 16 billion credentials. There are no reports of the individual data dumps, only of one with 184 million entries. The data dumps were only accessible for a short time; they were temporarily accessible, unsecured Elasticsearch instances or object storage instances.
Content of the data dumps
"The researchers claim that most of the data in the leaked data sets is a mixture of details from Infostealer malware, credential stuffing sets and repackaged leaks," the company itself describes the data findings. They could not have effectively matched the data at all, but "it is safe to assume that overlapping entries are definitely present. In other words, it is impossible to say how many people or access points were actually exposed".
Videos by heise
However, the researchers would have found most of the information in a clear structure: URL followed by log-in details and passwords, as collected and stored by "modern infostealers". The databases were named "Logins" or "Credentials", for example, but the employees also found geographical assignments such as "Russian Federation" or services such as "Telegram". These are also indications that (known) data was processed there.
Data from infostealers usually ends up in openly accessible data dumps, which are often discovered. Troy Hunt's Have-I-Been-Pwned project now also collects this data and can warn registered users if their data appears in such data dumps. Hunt had already classified the "mother of all breaches" (MOAB), as Cybernews exaggeratedly called a data breach in early 2024: It was a collection of long-known data. Hunt is looking into our request for an assessment of these supposedly new data leaks.
Breach, leak or simple collection?
In reporting on such incidents, accuracy is sometimes subject to the longing for a catchy headline. When the English-language media refer to a "breach", they usually mean a theft of data directly from a company or website operator, such as Google or Apple. This is clearly not the case here – although the authors suggest this in the headlines. At best, according to the media narrative, it could be a "leak", i.e. data inadvertently made public by criminals.
The "clear structure" of the data is also common in the scene and is well known to every halfway reputable player in the Infostealer environment: These are so-called "txtbases", i.e. access data exchanged in text format. The scene usually uses the format"service|username|password", txtbase files can be downloaded free of charge in gigabytes in openly accessible messenger groups.
As a quick exercise for the bridge day, we logged on to a well-known exchange site for such data records and downloaded almost 70 text files with a total volume of around 7 GB. These contain around 122 million entries, including 4 million entries on Meta's social network Facebook alone. However, the overlap is considerable: half of the Facebook account names appear twice or more in our sample.
While the heise-security editorial team uses command line tools such as grep and awk (and does not store the data obtained in a leak database), access data expert Troy Hunt takes a much more professional approach. Last February, he processed a database of 23 billion entries and meticulously documented the process on his blog.
In total, over 10,200 files are available for download on the txtbase exchange we visited, with an average of 1.8 million lines per file according to our sample. This means that this one source alone contains over 19 billion access data – almost 20 percent more than in the headline-grabbing "mega-leak". And that's without any darknet hullabaloo or payments to cyber criminals, without a leak or double bottom, so to speak.
Panic again misplaced
With this knowledge, it is clear that panic over the "new revelation" is misplaced. As in the past, cyber criminals are trying to qualify old data findings and break into services using credential stuffing, for example. Internet users must continue to be vigilant about any unusual access to services they use and change passwords if they suspect anything. Activating multi-factor authentication or even using passkeys is recommended for better protection.
Infostealers also remain a widespread phenomenon. We recently came across malvertising with macOS tips, but malware authors also hide malware in game betas and fake apps. In "Operation Endgame", prosecutors are therefore focusing on the cybercriminals who operate a lucrative ecosystem around infostealers.
(cku)