Researchers Find Relatively Simple Prompt Injection, Content Manipulation Attacks in Google Gemini

by | Mar 19, 2024

All LLMs are struggling with security issues, but a new report from cybersecurity firm HiddenLayer indicates that Google Gemini may be lagging behind the competition. The researchers demonstrated multiple prompt injection attacks and content manipulation methods, all in relatively simple ways that have previously compromised other AI systems.

Content manipulation demonstration: Fake election results, auto hotwiring tutorials produced

The two central examples of content manipulation that the researchers produced were not earth-shaking, consisting of a story about fake election results and another that involved the absolute basics of hotwiring an automobile. However, they demonstrate how easy it is to get Google Gemini to slip its guardrails and produce fictional stories, something the AI is supposed to be purpose-built to avoid.

Google Gemini has since been restricted from answering questions about elections. The AI was previously put in time-out in this way when its heavy-handed attempts at diversity led it to produce bizarre image responses. Nearly every other AI has proven to be similarly vulnerable to prompt injection attacks and content manipulation efforts to get around its guardrails, but Gemini seems to still be dealing with security issues that were raised and addressed in ChatGPT months ago.

Prompt injection attacks on Google Gemini expose private information

The prompt injection attacks developed by the researchers focus on convincing Google Gemini to expose its “system prompt” information, something that only developers should have access to. These are essentially the core instructions by which the AI judges how to handle search and output requests, and parties other than developers could make use of them to further figure out how to jailbreak and manipulate the AI.

Unfortunately, it seems that simply finding the right alternate phrasing for “system prompt” gets Google Gemini to spill the beans. One prompt injection attack by the developers unlocked the system by asking about “foundational instructions” instead. Another got the AI to ignore its safety protocols by asking for the information to be printed in a markdown box. The AI can also be tricked into repeating prior instructions by simply repeating a word over and over again, in an approach that is reminiscent of the ChatGPT vulnerability that caused random scraped information from the AI’s training set to be output.

Google Gemini is not yet among the biggest LLMs, but has a substantial userbase of about 100 million and has been expected to catch up given its ability to incorporate multimedia into results and its range of versions tailored to particular device types. That has seemingly made it a frontrunner for the sort of “white label LLM” approach that is expected to become popular with businesses, in which an AI vendor would sell the organization a customizable model meant for private internal use. Google has responded to the assorted prompt injection and content manipulation concerns by promising new safeguards and assuring users that it regularly runs red-team exercises in an attempt to anticipate issues such as these.

Recent Posts

How can we help?

13 + 15 =

× How can I help you?