Credit card data and phishing emails: outwitting AI agents made easy
Researchers have tested AI agents. The assistants are apparently not armed against any attack scenario, no matter how simple.
(Image: photoschmidt/ Shutterstock.com)
AI agents on the Internet are easy to attack. And that means, among other things, that they reveal sensitive information and credit card details. Researchers have specifically tested the attack scenarios on various Anthropic agents –, but the results are likely to be fundamentally valid. In another study, ChatGPT has already been poisoned using a similar setting.
Affected are AI agents that are allowed to move freely on the Internet, i.e. do something that the client has given them. The providers all rave about the fact that agents will one day be able to book a trip for us and take care of other tedious tasks. In addition to a large language model (LLM) that understands content, agents usually consist of a large action model that can do things – The names vary here. In the case of Anthropic, an AI agent takes screenshots of websites and can thus recognize where an input field is and where a click is required.
Videos by heise
Incorrect information poisons AI agents
In the study, the researchers from Columbia University and the University of Maryland used a Reddit post to lead AI agents to a fake website on which there were supposedly products for sale, such as a "Himmelblau Königskühl Diplomat refrigerator". The AI agent was persuaded to enter credit card details and user data. On another site, the scientists were able to plant viruses on the agents. On instruction, the AI agents downloaded files from untrustworthy sources.
If the user of an AI agent was logged into their email client via the browser, the researchers were even able to get the agent to send phishing emails from this account.
(Image: Screenshot aus der Studie)
Anthropic's AI agent for scientific purposes is a special case: ChemCrow is actually supposed to help with research. However, the authors were able to trick it into replacing harmless chemicals with dangerous chemicals by means of a fake paper –, albeit purely in the written output; an AI agent is not yet capable of actually mixing anything together.
The researchers say that all attacks were successful without any special technical knowledge. A manipulated technical article that ended up in a database, a fake store, a public instruction to the AI agent, Reddit, all of this was possible with minimal specialist knowledge. The methods are also known as jailbreaking, prompt injection and data poisoning – and can be used to attack all LLMs. In the case of an acting AI agent, however, the consequences are more serious.
The researchers are therefore calling for better monitoring systems for the agents, such as more mandatory human intervention and URL checks. A reasoning process could also help. This means that the AI agents should monitor their own actions better.
In addition to those from Anthropic, there are already AI agents from OpenAI, a general one called Operator and one specializing in research called deep research. The AI search engine Perplexity also has an assistant on the market. Google is working on Jarvis and says it is already entering the era of agents with Gemini 2. However, if these all entail such major security risks as shown in the study, it is likely to be a particularly exciting era.
(emw)