Prompt Injection Attacks on Apple Intelligence

Apple's language models are potentially vulnerable. Researchers demonstrate several new methods.

listen Print view
Symbolic image Prompt Injection

Symbolic image Prompt Injection: Danger for local Apple AI?

(Image: Shutterstock.AI Generator / Shutterstock)

3 min. read

Two papers presented at the recently concluded RSAC security conference describe novel attack vectors on Apple Intelligence. The corresponding vulnerabilities in the area of so-called prompt injections, where AI prompts are manipulated, are said to have already been fixed by the manufacturer. They exploit, among other things, that Apple uses weaker local models before cloud-based, more complex large language models are used.

Prompt injections are intended to make AI systems deliver outputs that developers actually prohibit, such as profanity or information about criminal activities. So-called guardrails are used for this purpose, which are intended to block such outputs. With 100 random prompts using the method, the rules were broken in as many as 76 percent of cases. The study (Study, technical details) comes from three security experts who work for the RSAC security conference's research team. Apple was informed in October and is said to have made internal changes to its operating systems and its Private Cloud Compute Server Infrastructure (PCC).

Videos by heise

Within Apple Intelligence, local and PCC models are used seamlessly. The system detects when it makes sense to go to the servers with a request. The models can also be used free of charge by app providers. On iOS, macOS, and iPadOS, functions such as the so-called Writing Tools, which are used for text optimization, can be used with compatible computers, and there are also image generators with Image Playground and Genmoji, which are direct components of the systems. Prompts can be entered, among other things, to make changes to texts - the researchers were able to manipulate this output. Apple currently does not plan for chatbot operation.

Among the attack methods used by RSAC were so-called Neural Execs, where prompts are translated into a language that appears nonsensical to humans, but the output delivered by the LLM then corresponds to something that should not actually be possible. Another hack was the use of Unicode languages that are written from right to left.

Malicious instructions went through here. In total, the RSAC researchers managed to bypass both the internal guardrails of the models and Apple's downstream filters. The main problems seem to lie in the weaker local models. In principle, this is not surprising - they also tend to hallucinate more. Weaker models are also generally considered easier to attack. For example, OpenClaw recommends not using weak models to avoid security issues.

Empfohlener redaktioneller Inhalt

Mit Ihrer Zustimmung wird hier ein externer Preisvergleich (heise Preisvergleich) geladen.

Ich bin damit einverstanden, dass mir externe Inhalte angezeigt werden. Damit können personenbezogene Daten an Drittplattformen (heise Preisvergleich) übermittelt werden. Mehr dazu in unserer Datenschutzerklärung.

(bsc)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.