Leveraging agents, crafting exploits, and mining the hidden gems of AI security

 

Howdy! I’m Ads Dawson—staff AI security researcher by day, mayhem wrangler by nature. My motto? Harness code to conjure creative chaos—think evil; do good. This means I spend my time pushing large language models (LLMs) to their limits in offensive security, whether by leveraging red teaming models, poking at hybrid attack surfaces, or exploring weird, wonderful exploits in AI systems.

In my initial offensive agents blog post, I mentioned my love of Rigging (version 2.2.8 while writing), a sleek LLM interaction framework I use daily for so many different things. It’s ridiculously versatile. Honestly, I even summon models as janitors to auto-polish my pull request descriptions. Beautiful, right?

In this post, I’ll show you how to build a bare-bones Rigging agent to tackle AI hacking challenges on dreadnode’s Crucible platform, a platform packed with 80+ hands-on AI red team scenarios. From prompt injection and fingerprinting to model evasion and inversion (my favorite), Crucible is an AI hacker’s playground—fun, educational, and accessible whether you’re a curious beginner or an expert in the trenches.

Why use a CTF to showcase this? Because Crucible’s awesome challenge architecture mimics real-world LLM applications (typically a chatbot interface with privileged access to tools and sensitive data). This makes it the perfect sandbox to explore real attack surfaces and test your mettle.

Note: For this first walk-through, we’re going raw—no tools, no helpers, just a system prompt, a mission, and model-versus-model mischief.

This content and these libraries are for educational purposes only, and the code snippets are not cited. Unless you have explicit permission, do not perform the actions demonstrated on anyone or anything.

 

RTFM

Rigging is a flexible library built on other flexible libraries that makes integrating LLMs in production code both straightforward and powerful. Under the hood, it leverages Pydantic for structured I/O but lets you mix in raw text whenever you need flexibility. By default, it ships with LiteLLM—so you immediately get access to dozens of backends—while still letting you swap in any model via simple “connection strings” just like you would a database URL – i.e. (<provider>!<model>,<**kwargs>).

You author prompts as plain Python functions, complete with type hints and docstrings, and Rigging handles everything else:

  • Unified I/O—Seamlessly convert between Pydantic models and unstructured text.
  • Extensible generators—Get LiteLLM support out of the box, plus drop‑in replacements via URL‑style model specs.
  • First‑class prompts—Define your prompt templates, forks, continuations, and parameter overrides directly in code.
  • Tool integration—Invoke external tools even when the underlying API doesn’t natively support tool calls.
  • Tracing and logging—Built‑in Logfire hooks let you trace every request and response.
  • High‑throughput async—Batch requests asynchronously for large‑scale generation with minimal boilerplate.
  • Rich callbacks and metadata—Plug in custom callbacks, attach metadata, and convert data formats on the fly.
  • Modern Python ergonomics—Get full support for async/await, Pydantic validation/serialization, and static type checking.

Taken together, Rigging gives you a rock‑solid, extensible foundation for building, measuring, and iterating on LLM‑powered applications. I’ll explain shortly, but using Rigging in a simple chat example is as easy as a few lines of code:

Setup a Rigging generator chat using Anthropic – claude-3-sonnet-20240229

 

 

Check it out! Add a provider API key, “pip install rigging,” and feel free to run the above code or check out an interactive chat example here.

 

Infernal tracecraft: Rigging the demonic audit

One of many features I love to use within the Rigging framework is the concept of Tracing, which was built on the OTEL (OpenTelemetry) standard. Personally, my go-to log sink is always Pydantic’s Logfire. It’s incredibly intuitive and easy to learn while keeping you close to the raw inference so you can understand what’s really going on under the hood without clogging up your standard output. Again, it’s as simple as a few lines of code:

 

When running the code, you’ll be gifted a hyperlink like the following example to access the Logfire project and traces:

Rigging captures and logs call parameters, results, and chat objects from tools and prompt functions, serializing them as JSON with dynamic schemas via Logfire for seamless web view integration. While it avoids tracing at the generator level, Rigging focuses on higher-level abstractions like pipelines and tools. For deeper insight, it’s compatible with external instrumentation libraries like Logfire, LiteLLM, and OpenAI, which trace low-level inference activity and integrate cleanly with Rigging’s spans.

For this example, when you successfully retrieve the challenge flag, you can use the following SQL query in Logfire to navigate straight to the trace in question:

attributes->'chats'->0->'generated'->0->>'content' ILIKE '%gAAA%'

 

Fiendish Rigging: Agent harness from the abyss

If you prefer a visual example of the code running while reading this post, check it out here, export your CRUCIBLE_API_KEY, and enter the following into your terminal: 

python examples/crucible.py --generator-id ollama_chat/gemma3:1b --challenge pieceofcake

 

Here’s an example of invoking the code using Google’s open-source Gemma 3 1 billion parameter model via Ollama so you don’t have to worry about inference budgets

 

Infernal bull’s‑eye: Rigging the fiendish mark

The code allows for smooth testing against any challenge using a simple parameter: --challenge <challenge-name>. This makes it easy to adapt, test, and refine the script as needed.

So what does this do, anyway? Every orchestration approach in Rigging requires a “run” concept—typically implemented as an event loop that manages agent execution until predetermined exit conditions are met. This is where implementation uses async/await patterns to process actions sequentially, maintain state between iterations, and terminate execution when specific success criteria are achieved, such as extracting flags.

The “pieceofcake” challenge is designed to be as easy as eating a slice of cake—Yum! Who doesn’t love cake? This sample challenge is one of the easier Crucible challenges meant to act as a warm-up for basic prompt injection skills. Your task is to extract the secret flag from the chatbot.

Without further ado, let’s dive in!

 

Stochastic rodeo: Breaking down the Rigging agent harness

You’re staring at a wall of Python. Let’s tear it apart like real AI hackers.

This Rigging code is an LLM weaponization platform built to attack a Crucible CTF challenge, jailbreak a model, and extract that sweet, sweet flag. A closer look at the core components of the code is necessary. Here’s how every major moving part works.

System prompt: Inject the mission objective

 

 

  • Purpose: Preloads the model with a “you’re a hacker now” system message.
  • Why it matters: By shaping the model’s behavior up front, we soften it up for the real injections later.
  • Noteworthy: It even suggests creative injection techniques like obfuscation, roleplay, and multilingual attacks, straight up turning the LLM into a rogue agent.

Custom Pydantic model: CrucibleRequest

 

 

This is our minimal packaging layer.

  • Purpose: Structures how payloads (hacked prompts) are sent to the Crucible challenge API.
  • Why it matters: A clean, standardized format means it’s easier to validate inputs/outputs automatically.
  • Noteworthy: This is how we ship the data to our target model.

CrucibleState (mission control for our attempt)

 

 

These are the brains behind the operation. @dataclass is a Python decorator from the dataclasses module that automatically adds special methods like init, repr, and eq to classes representing data containers. In both the bandit and crucible examples, @dataclass is used to define the state class, which is crucial for maintaining an agent’s contextual information throughout execution cycles. It’s important because it significantly reduces boilerplate code while providing a clean, structured way to handle complex state management. In the Rigging framework, it enables the efficient tracking of conversation history, memories, goals, and agent actions without manually implementing initialization or comparison logic. 

This allows developers to focus on agent behavior rather than state-handling mechanics, maintaining consistent state representation across iterations of the agent’s decision-making loop:

  • Tracks attempt count, successful techniques, and failed techniques
  • Records potential flags found
  • Analyzes responses for hints (“Did it say ‘secret’? That’s a good sign!”).

This avoids brainless brute force. Instead, it learns mid-attack, evolving like a real hacker would.

record_attempt(technique, response)

This figures out if an attempt moved the needle and If partial success is detected (e.g., LLM says “flag” but no flag given), it still encourages exploring that route. Smart.

Eagle eye—check_for_flag()

 

 

  • Watches the chat flow in real time
  • Sniffs for flag-like patterns (e.g., anything starting with gAAAAA)
  • Also tracks partial flags to guide future prompt evolution.

Master interceptor—process_chat_message()

 

 

Every time the attack LLM crafts a new attack prompt, it does the following:

  1. Fires it at the Crucible challenge
  2. Records what worked/didn’t
  3. Looks for flag fragments
  4. Logs everything in bright, colorful logs for your hacker pleasure.

If a potential flag is found, It auto-submits it to the Crucible API for verification.

🔥 There is no manual babysitting with a fully automatic exploit cycle.

It is important to note that we offload the API flag submission from the attack model and handle it separately. This is to ensure that the process remains modular and flexible, allowing us to focus the attack model purely on prompt generation and response analysis without overcomplicating the flag submission flow.

To better understand how the attack unfolds over multiple turns, here’s a simple simulation of the chat pipeline loop:

 

Orchestrator of doom—main()

This is the main infinite loop running the siege.

  • Boots up the LLM with the SYSTEM_PROMPT
  • Watches every message
  • Auto-updates temperature randomly (makes generation more creative) every few tries:

 “Insanity is doing the same thing over and over. We tweak randomness to stay unpredictable and creative.”

  • Manages retries (max_steps).
  • Prints beautiful progress reports every few shots.

 

Battle test—Executing the attack loop

Now, it’s time to run the code and watch our attack model navigate the challenge and uncover hidden flags like in a digital treasure hunt while each prompt brings us one step closer to victory. With every iteration, the model cleverly adapts, refining its strategies and testing its approach, until—just as we hoped—the first flag is uncovered, revealing its encoded form: 

gAAAAABoBY9EbEVie-nQ5I2ab--wwWpu-_6qHD63JLijxrcVWuRJK_EdUgtrHRrgiV7olPka4TJmfT_30kVTWnuBFvtPvqORNPcNhZ00JdEuc8F7Wb6sumxHQtOzqUpq-tqNb9xXYHjLv1Bw6lMJWFefx_AyYfZOcGzHV7RX3iNAmjHH37ebOYv2ixpf7_Ql6RltqhthHNOs

The treasure hunt continues!

 

Success!

To ensure we aren’t totally tripping, we can also verify our flag submission by manually running this:

Kaboom!

 

Wrapping up

I’m taking a one-two punch approach to blogging, with one track focused on AI and LLM red teaming and the other on offensive security agents. In the next post for the AI track, I’ll dive  deeper into Rigging and its upcoming V3 release and continue honing our “dev chops” with a more technically complex and sophisticated twist. We’re talking major updates, wild new hooks, and letting these stochastic sons of sin loose with tools and other pandemonium-inducing projects. Shoot, maybe we will do a series on MCP too—who isn’t these days, eh?

Additionally, there are a few other fun examples in the Rigging repo that I’d suggest taking for a spin. Alternatively, check out these awesome docs.

Keep up with Ads’ wacky experiments or follow any of his content @ LinkedIn, GitHub, or GitHub page. Ever in Toronto, Canada? Hit me up for coffee and donuts!