
Hacking Auto-GPT and escaping its docker container
Date : 2023-06-29
Abstract
- We showcase an attack which leverages indirect prompt injection to trick Auto-GPT into executing arbitrary code when it is asked to perform a seemingly harmless task such as text summarization on an attacker controlled website
- In the default non continuous mode, users are prompted to review and approve commands before they are executed by Auto-GPT. We found that an attacker could inject color-coded messages into the console (fixed in v0.4.3) or benefit from the built-in unreliable statements about future planned actions to obtain user approval for malicious commands
- Self-built versions of the Auto-GPT docker image were susceptible to a trivial docker escape to the host system with the minimal user interaction of restarting the Auto-GPT docker after it is terminated by our malicious code (fixed in v0.4.3)
- The non-docker versions v0.4.1 and v0.4.2 also allowed custom python code to execute outside of its intended sandboxing via a path traversal exploit after a restart of Auto-GPT
Read blog post here
Recently on :
Artificial Intelligence
Security | Surveillance | Privacy
WEB - 2025-11-13
Measuring political bias in Claude
Anthropic gives insights into their evaluation methods to measure political bias in models.
WEB - 2025-10-09
Defining and evaluating political bias in LLMs
OpenAI created a political bias evaluation that mirrors real-world usage to stress-test their models’ ability to remain objecti...
WEB - 2025-07-23
Preventing Woke AI In Federal Government
Citing concerns that ideological agendas like Diversity, Equity, and Inclusion (DEI) are compromising accuracy, this executive ...
WEB - 2025-07-10
America’s AI Action Plan
To win the global race for technological dominance, the US outlined a bold national strategy for unleashing innovation, buildin...
WEB - 2024-12-30
Fine-tune ModernBERT for text classification using synthetic data
David Berenstein explains how to finetune a ModernBERT model for text classification on a synthetic dataset generated from argi...