Like peeing in the sea: Poisoning AI does not help
Trying to poison data helps as little as peeing in the sea – it remains a sea. So says the provider of protection software.
(Image: MAGNIFIER/Shutterstock.com)
If data is incorrect or even deliberately irritating, it is said to be poisoned. This in turn is intended to irritate or disrupt AI models. If they learn from poisoned data, they deliver results based on the wrong information. But this is not as easy as it sounds at first, says Xe Iaso. She is the founder of a software company that takes a different approach to protecting content from AI models – therefore has an interest.
In an interview with 404 Media, Iaso compares the poisoning of data used for training purposes to the fact that an individual person can pee in the sea, but it still remains a sea. Iaso also criticizes the fact that this may use up resources that do not need to be used.
In fact, the effectiveness of tools such as Nightshade, in which images are provided with false information about the images, is questionable if only very few artists or individuals do it. If people all over the world pee unfiltered into the sea, this could possibly have an effect on water quality after all.
Videos by heise
Excluding bots instead of poisoning data
Instead of poisoning, Iaso proposes its software. This could be used on a technical level to prevent crawlers from tapping content for AI training. Anubis forces bots to solve cryptographic computing tasks in the browser. This is expensive for those who send bots out. It is a kind of invisible captcha, but humans do not have to solve the tasks.
However, it has recently been shown that large-scale campaigns to poison training data are already having an effect. Russia is said to operate numerous websites – with the sole intention of providing AI models with selected information both during training and in real-time searches. Real-time search is particularly vulnerable to attacks. For example, information and instructions can be hidden on websites so that humans do not see them. Incorrect information can then also have far-reaching consequences for the overall conclusion, for example in reasoning models.
(emw)