Theoretical GenAI Worm Attack Raises Alarms; Self-Replicating Malware Shown to be Viable Against ChatGPT

by | Mar 11, 2024

A research team composed of members from several universities and Intuit Research have demonstrated a theoretical GenAI attack that corrupts the databases of the likes of ChatGPT and Gemini Pro, causing them to pass on a form of self-replicating malware that can be used to send emails and steal files.

Though such an attack has not yet been seen in the wild, this research demonstrates that a “GenAI worm” is well within reach for hackers and is a possibility that security teams must now prepare for. Called Morris II (in honor of the first computer worm), the attack is a zero-click that simply requires a GenAI email assistant to read the content of a malicious email or view a malicious image packed with the same instructions.

Study calls for regulation, rethinking of GenAI systems

Both state-sponsored hackers and profit-seeking criminals have been observed playing around with GenAI to enhance their attacks, but to date they have not done anything as sophisticated as producing self-replicating malware. Most of the utility thus far has been adding proficiency in languages that the attackers are not familiar with, for purposes such as enhancing the believability of phishing messages.

The GenAI worm takes the next step of corrupting an email assistant and convincing it to send out malicious emails. The “self-replicating malware” aspect is in the fact that these messages can be crafted to corrupt other similar AI systems upon receipt. And all of this is done without any user interaction required.

The attack is fundamentally similar to existing attacks that overrun memory to escape natural boundaries of systems and software. But in this case, the GenAI assistant being attacked is fed a series of malicious prompts that convince it to use its ability to retrieve outside information (“retrieval-augmented generation” or RAG). Naturally, the hackers have it fetch more malicious data that corrupts it. The system is now capable of delivering self-replicating malware by automatically repeating this process via email to other systems.

This attack is still only theoretical, but is very worrisome as GenAI email assistants are a technology sector that is very possibly outpacing security developments in its adoption (as also happened when companies very quickly pivoted to cloud services during the Covid-19 pandemic). One of these self-replicating malware worms escaping into the wild would essentially prove that theory.

OpenAI says that this type of attack is unlikely to actually be successful because its own filtering processes and limits on API access would prevent the system from replicating this malicious input as output. Some security researchers agree with this assessment. Others feel that these filters and rules are not adequate, and even if this particular technique does not prove to be real-world viable there will eventually be one that is. If that is true, the only solution might be siloing of AI functions to prevent different systems from talking to each other in a way that can facilitate attacks.

Is GenAI the next big attack vector?

When it emerged in the late 1980s, the original Morris Worm sparked a revolution in thinking about cyber defense. This development is not likely to have such a profound effect, but will hopefully spark some thinking about effective defense before this type of self-replicating malware appears in the wild.

One of the most worrying aspects of this research is that the prompts that eventually jailbreak and corrupt the GenAI can be embedded in an image. That opens up possibilities beyond simply emailing attack prompts to a victim (though the zero-click nature of this attack makes it dangerous enough on its own).

Also concerning is that the self-replicating malware can fish out sensitive personal and financial information from email accounts as it makes its way around. The researchers demonstrated that it could pull Social Security and credit card numbers from inboxes, among other things.

OpenAI has said that it is working to fix the issue (which was only demonstrated with ChatGPT 4.0) and that additional filtering and checks will eliminate the problem. Regulation is developing across the world to address AI security issues, but largely not yet in place, putting organizations in what could be a tough position in terms of determining how to respond to this (as of yet) theoretical threat.

Recent Posts

Attempted Audio Deepfake on LastPass is “The New Normal” for Voice Phishing
Attempted Audio Deepfake on LastPass is “The New Normal” for Voice Phishing

Employee targeted in the voice phishing attack received several different deepfake call attempts and at least one voicemail message, but did not respond as it’s exceedingly rare for anyone to communicate internally via WhatsApp, let alone for the CEO to randomly start peppering an employee with messages after business hours.

How can we help?

8 + 10 =

× How can I help you?