Hacking Auto-GPT and escaping its docker container

Abstract

  • We showcase an attack which leverages indirect prompt injection to trick Auto-GPT into executing arbitrary code when it is asked to perform a seemingly harmless task such as text summarization on an attacker controlled website
  • In the default non continuous mode, users are prompted to review and approve commands before they are executed by Auto-GPT. We found that an attacker could inject color-coded messages into the console (fixed in v0.4.3) or benefit from the built-in unreliable statements about future planned actions to obtain user approval for malicious commands
  • Self-built versions of the Auto-GPT docker image were susceptible to a trivial docker escape to the host system with the minimal user interaction of restarting the Auto-GPT docker after it is terminated by our malicious code (fixed in v0.4.3)
  • The non-docker versions v0.4.1 and v0.4.2 also allowed custom python code to execute outside of its intended sandboxing via a path traversal exploit after a restart of Auto-GPT

Read blog post here
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more