Apple Intelligence - The First Prompt Injection
Apple Intelligence was jailbroken within days of its Beta release...
On 30th July 2024, Apple released its Apple Intelligence Beta to the world. The release was largely well-received, but within 9 days, Evan Zhou demonstrated a fascinating prompt injection proof of concept. In this post, we will look at what the proof of concept does, how it works, and what this means for the future of Apple Intelligence.
Contents
How Does Apple Intelligence Work?
System Prompts
Finding A Target
Special Tokens
Executing The Exploit
Patch
Final Thoughts - The Future
How Does Apple Intelligence Work?
Apple Intelligence has 3 modus operandi:
On-device processing - For basic requests, Apple will process them via machine learning models running on device chips
Private Cloud Compute processing - For more complex requests, Apple will send them off to a private server they own
OpenAI processing - When a request requires more real-world context, Apple will send it to ChatGPT and provide the answer to a user
The only features available in the Apple Intelligence beta are certain text-based operations with on-device processing. Tech nerds are still eagerly awaiting the promised image-generation capabilities and ChatGPT integration among several other cool features.
You can read more about Apple Intelligence here:
System Prompts
2 days after the beta release, reddit user devanxd2000 found the system prompts used for on-device processing. These are located in the /System/Library/AssetsV2/com_apple_MobileAsset_UAF_FM_GenerativeModels directory on MacOS, and serve as guidelines to instruct AI models on how to behave.
Several prompts were found - below is the verbiage used by Apple to guide their email assistant:
Finding A Target
Evan Zhou wanted to see if he could use the leaked system prompts to create a prompt injection attack. Prompt injection attacks use specially crafted input to cause an LLM to behave in a way not intended by the developers.
Evan decided to target the Writing Tools, which rewrite text in a certain tone by feeding it to an LLM. His aim was inducing Apple Intelligence to answer his text as opposed to rewriting it, signifying arbitrary LLM behavior and a bypass of Apple’s system prompt.
Evan found the following prompt for the Professional Tone writing tool:
Special Tokens
Let’s examine how this prompt template works in more detail:
{{ specialToken.chat.role.system }} - The template starts by injecting the system prompt, guiding the LLM on how to act
{{ specialToken.chat.component.turnEnd }} - Ends the system role
{{ specialToken.chat.role.user }} - Signifies the start of the user content
{{ specialToken.chat.component.turnEnd }} - Ends the user role
{{ specialToken.chat.role.assistant }} - Allow the assistant to rewrite the text
Apple uses these “special tokens” as dividers to separate different pieces of text used in their LLM prompt.
Theoretical Attack Chain
By injecting special tokens in the {{ userContent }} section where the user’s input is placed, Evan could theoretically end the turn containing Apple’s system prompt, inject a new system prompt of his choosing, then put his user prompt beneath a {{ role.user }} token.
Finally, he could use {{ turnEnd }} and {{ role.assistant }} tokens to neatly close the injection, preventing any system errors. This crafted input is similar to classic web application attacks such as Cross Site Scripting (XSS).
This would allow him to inject arbitrary prompts into the AI, as shown below:
Evan found a Special Tokens Map located above the prompt template, allowing him to substitute in the correct values for his prompt injection:
Executing The Exploit
Evan’s final prompt injection is displayed below:
The most interesting part of this is the text located under the {{ system<n> }} token. Since the LLM already had Apple’s system prompt in its context, Evan cleverly manipulated the AI to switch roles by disguising his request as a system test.
Apple Intelligence responded to Evan’s “Hello” user input instead of summarizing the entire prompt, showcasing a successful prompt injection.
Evan mentioned that Apple Intelligence was actually interpreting the user input as its own text, leading it to simply extend the string as opposed to truly responding. Despite this, his findings are truly fascinating and will open the door for more advanced injections.
Patch
As outlined in this GitHub Gist, the prompt injection still works in the current beta version. Apple can patch this in one of 2 ways:
Change the special tokens, then obfuscate them
Add logic to strip special tokens from user input
While these patches seem trivial, in practice there are several potential pitfalls that Apple could experience:
The old special tokens may still work
The new special tokens may still be guessable
The stripping logic may be bypassable with classic web application hacking techniques
Final Thoughts - The Future
The Apple Intelligence Beta Prompt Injection serves as a fantastic reminder that cybersecurity is a continual arms race between attackers and defenders. Attackers will continually innovate with more elaborate and novel techniques, while defenders will implement more robust mitigations. Evan’s research is fantastic and exposes an immediate flaw in Apple’s AI solution - that their usage of special tokens allows users to break out of the user input field.
Being able to see Apple’s system prompts is amusing and underwhelming - including phrases such as “Do not hallucinate” seems naive and unlikely to work. Time will tell how effective these prompts are and whether they change.
While this is a beta, I find it mildly concerning that Apple did not think of this prompt injection in advance. I believe this oversight is representative of the entire industry, where very few people are educated in AI security. Continual findings such as this will drive organizations to hire AI security experts, making the skillset valuable and crucial in our AI-driven future.
Check out my article below to learn more about Apple Intelligence. Thanks for reading.
Very good!
nice research job!