RE-LAION-5B: Image database without abuse images

Abusive images of children were found in a dataset for AI image generators, LAION-5B. A corrected version has now been published.

Save to Pocket listen Print view

(Image: Bild von heise online mit Midjourney generiert)

1 min. read
This article was originally published in German and has been automatically translated.

The organization LAION has provided the data set RE-LAION-5B, a revised version of their data set LAION-5B. This is a collection of 5.5 billion publicly available images that is used, for example, for training AI models.

The image databases do not contain the images themselves, but rather a hash value of the image file and the URL under which LAION found the image on the Internet. At the end of 2023, the Stanford Internet Observatory discovered 1673 references to images of child abuse. LAION immediately took its data set offline and asked users to stop using it and delete any further copies.

LAION then worked with the Stanford researchers and other child protection organizations to search its database for references to illegal content. A total of 2236 relevant links were discovered and removed. The resulting RE-LAION-5B database is now available for everyone to use under an Apache 2.0 license. Further details on the database can be found on the organization's homepage.

Parallel to the clean-up, LAION has developed a filter system. This should make it more difficult for illegal content to be included in the database in the future. You can find a detailed portrait of LAION in c't 6/24.

(jo)