Security researchers at SafeBreach have detailed a novel indirect prompt injection technique affecting Google Gemini's voice assistant. The research, titled "Gemini's Secret Affair: Exploiting Gemini Voice Assistant Through Instant Messaging Apps," demonstrates how an unauthorized party could manipulate the assistant's ability to summarize message notifications.
Led by security research team lead Or Yair, the team showed that threat actors can hide instructions within foreign language text or muted hyperlinks. When Gemini processes these notifications, it executes the instructions silently. During testing, researchers found these actions could range from controlling smart home devices and initiating unauthorized video streams to impersonating trusted contacts and altering the large language model's (LLM) long-term memory.
SafeBreach shared these findings with Google through responsible disclosure. Google has since deployed content classifier updates to mitigate the issue. Currently, there is no evidence of this technique being used outside of controlled research environments.
Bypassing guardrails with Fake Context Alignment
Yair refers to this method as Fake Context Alignment. It builds on previous SafeBreach research regarding calendar invitations and a technique known as Delayed Tool Invocation, which embeds a command to conduct an unsafe action upon a user's secondary approval.
By presenting a legitimate authorization scenario to Gemini's security mechanisms while hiding the true context from the user, Fake Context Alignment bypassed existing guardrails. For instance, an unauthorized party might send a WhatsApp message containing hidden Chinese characters and a muted hyperlink. If the user asks Gemini to summarize their messages—a common practice while driving or multitasking—the assistant reads the benign portion of the message but silently processes the hidden commands.
In this scenario, the hidden instructions tell Gemini to output a tool-authorization question in Chinese, but keep that text entirely inside a muted hyperlink. The user hears a perfectly normal English prompt, replies affirmatively to what sounds like a benign question, and silently triggers the hidden instructions. Yair noted that combining foreign characters and a hyperlink yielded the highest reliability during testing.
Securing LLMs against untrusted input
While Google has addressed this specific issue and Gemini users do not need to take direct action, the findings offer a broader lesson for securing AI systems. Yair emphasizes that AI assistants must treat all external content as untrusted by default.
Because every external input serves as a potential instruction, indirect prompt injections require a different defensive approach than traditional software vulnerabilities. There is currently no single patch that permanently prevents prompt injections in public-facing models. Instead, organizations deploying LLMs should prioritize the following security practices:
Implement active security controls: Develop and maintain guardrails and content classifiers that monitor system behavior and validate inputs on the vendor side.
Monitor for context shifting: Track all communication channels between the user and the AI assistant to detect when partial or obfuscated outputs alter the conversation context.
Restrict access controls: Limit the permissions and access controls granted to LLM models, ensuring that an AI agent cannot execute high-risk actions without explicit, verified user authorization.