Scenarios: Model- and harness-agnostic test scenarios for demonstrating prompt injection patterns
Prime Radiant is an AI research lab. Broadly, we're building tools that help people get things done. One of our guiding principles is that AI can and should be used to help people do things. It's increasingly clear to us that AI has the potential to massively transform human society and it's crucially important to us that we’re building tools that work for people, rather than the other way round.
One of the key challenges in building personal digital assistants relates to security: how can we ensure that these assistants won't be tricked into acting in ways that harm the people who use them?
Prompt injection is the category name for a class of attacks that exploit the fact that language model systems mix instructions and arbitrary user input together in the same stream of text. The lethal trifecta describes a common pattern of prompt injection attacks which combine three elements:
- Access to private data - information that the user wants to process with their agent but does not want exposed to the world.
- Exposure to potentially malicious input - the agent reads web pages, emails, or other content that an attacker could conceivably manipulate to insert malicious instructions.
- Some way to exfiltrate data - once tricked, a way the agent could send that private data out to the attacker.
Unfortunately, this combination also represents the most obviously useful form of agentic personal assistant! Everyone wants an assistant that can access their email (providing both private data and potentially malicious input) and act on their behalf (sending emails or using the web = exfiltration).
We’ve built Scenarios, a test suite that can help document and illustrate how these attacks might be structured, using simulations of real-world systems.
How Scenarios is structured
A principal goal of the project is to be model- and harness-agnostic. You can use Scenarios to test how vulnerable your tools are to the simulated attacks documented by the scenarios.
New models are released all the time, and we want to make it as easy as possible to run scenarios against any of them.
Similarly, there are many different ways to build agentic systems. A rite of passage for developers getting started with LLMs is to roll their own - a basic "agent" is an LLM running in a loop making tool calls, and a basic system can often be built in a few dozen lines of code. (Jesse’s best is currently 646 bytes.)
If you haven't tried building an agent yet you totally should!
As such, Scenarios aims to provide examples that are not tied to any particular harness. A scenario is a folder with data files and YAML. It should be possible to take any agent harness and write minimal code to parse that YAML and provide access to those files.
The repo includes two reference implementations - one using the LLM Python CLI utility and one that integrates with Claude Code. Scenarios use tools, which are both described as text and also served as a reference Python implementation with an MCP server.
The repo includes instructions for running scenarios with those default harnesses.
Here's an example run captured using Showboat.
Do not use scenarios to "prove" your system is secure
My one concern with releasing scenarios is that I don't want developers misusing the project as a set of tests they can use to prove that their system is secure against prompt injection attacks!
If you come up with a system prompt or agent harness that avoids the scenarios in this repo you have not demonstrated that your system is secure - merely that you have worked around the examples represented here.
The goal of the project is to help demonstrate and explore variants of these attacks. This is not a comprehensive tool for testing your own defenses.
I want to be explicit: do not use this project to claim your system is secure against prompt injection. I would be extremely disappointed to see it used that way.
If you do manage to build a harness that avoids all of the scenarios represented here, I challenge you to contribute back a new scenario that your harness fails to handle!
There are infinite ways an agentic system could be tricked by a malicious input. Solutions to this problem need to sit outside of the realms of prompts and probabilistic filters.
Contributions welcome
What we’re releasing today is only the tip of the iceberg. We need your help to build out the library of scenarios.
If you have ideas for new scenarios and want to contribute to the project, please do! Open an issue in the repository to talk to us about your plans.
