OpenAI: Prompt injections remain a problem for AI browsers

AI agents and browsers are better protected against prompt injections. However: The problem will persist for years, according to OpenAI.

listen Print view
The letters AI float in the air.

The letters AI fly around hooks and warning triangles.

(Image: tadamichi/Shutterstock.com)

3 min. read

Prompt injections will be a persistent problem for AI browsers and the agents they contain, according to OpenAI. There are apparently no prospects of real security. Instead, the company compares the attack to humans falling for scams, and that there has been no way to protect them so far.

Nevertheless, OpenAI assures in a blog post that they have once again made their AI browser Atlas at least more secure against prompt injections. Just not completely secure. Internally, the so-called Red Teaming continuously discovers new attack possibilities against which the browser – or rather the AI model behind it – is secured. Secured means that the model is given a concrete example and corresponding courses of action are defined. To this end, OpenAI uses, among other things, an LLM-based attacker to train agents.

This also means that it is a kind of cat-and-mouse game that attackers and AI companies play. Each side continuously devises new attacks. OpenAI writes that they assume that attacks will at least become increasingly difficult and expensive. "Ultimately, our goal is for you to be able to trust a ChatGPT agent when using your browser as much as you would trust a highly competent, security-conscious colleague or friend."

How far such trust extends probably varies from person to person. But it also indicates that there is no definitive security or control.

Videos by heise

In the case of prompt injections, the agent in the browser or an AI model is fundamentally tricked into behaving in a certain way or, in the case of an agent, acting upon it. It can be as simple as instructions like a prompt for the agent being placed on a website. This can be done, for example, with white text on a white background, so that it is not visible to humans, but is visible to the agent.

In the blog post, OpenAI gives the example that an attacker could write a malicious email that tricks an agent into forwarding sensitive data, such as tax documents, to an email address controlled by the attacker. It is a typical agent scenario that such emails are processed and summarized automatically.

Sam Altman had also warned of potential dangers from prompt injections for the AI browser Atlas and the ChatGPT agent. Shortly thereafter, it emerged that an attack had already occurred. ChatGPT revealed personal data from emails to attackers.

(emw)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.