Grammatical errors make prompt injections more likely
LLMs are more susceptible to prompt injections or simply skipping the metaphorical crash barriers if you make mistakes in the prompt.
(Image: agsandrew / shutterstock.com)
Writing without periods and commas can help – at least if you want to outwit a Large Language Model (LLM). Very long sentences with the worst possible grammar and errors ensure that the AI models throw their guard rails overboard and do what is written in the very long prompt. This was discovered by security researchers at Palo Alto Networks Unit 42.
It appears that LLMs do not activate their security precautions in time due to the missing punctuation marks, but instead read the entire prompt first and process it as a whole.
The security researchers have developed a solution. They use so-called logits, i.e., raw values that an LLM assigns to a potentially upcoming word. Alignment training is used to teach the logits to prefer rejection tokens. This means that if an LLM encounters something that should activate metaphorical crash barriers, it reacts to it with priority. Unit 42 has developed the Logit Gap Steering framework for this purpose.
AI browsers susceptible to prompt injections
AI models are susceptible to so-called prompt injections. This means that a prompt overrides the actual built-in guard rails. For example, an LLM can be told to behave like a pirate from now on. So far, so harmless. Of course, when in doubt, it is also a matter of capturing data in this way.
Videos by heise
Just recently, the browser manufacturer Brave discovered a vulnerability in Perplexity's AI browser Comet that can be exploited using prompt injections. Attackers hide commands on a website or in comments that an AI agent interprets as a user instruction when summarizing a page. According to Brave, email addresses and one-time passwords could be tapped in this way via Comet. Perplexity is said to have provided an update for Comet that makes access more difficult. However, all AI browsers and AI models are potentially affected.
Even Sam Altman, CEO of OpenAI, recently warned that the agent in ChatGPT can be attacked and that there are not yet sufficient security measures in place. It is therefore better not to let the AI agent access all emails or account data.
(emw)