Author: Bronxi

About a year ago, I wondered how I could use AI within a workflow I had already automated using Bash. I had created this workflow to search for information disclosure bugs in PDFs or XLS files indexed by Google. At the time, I had no idea how to go about integrating AI in this manner as I had only used it in chat mode. In conducting extensive research on AI, I took note of many people discussing multi-agents. This jump-started my exploration into AI agents.

I began by looking into creating my own multi-agent setup. I came across different orchestrators, including LangChain, but ultimately decided to use a simpler one that’s built on top of LangChain called CrewAI. It took a long time to find a base project or something similar, but I eventually found a YouTube video showing how to use CrewAI to create OSINT agents. With that as a starting point, I modified the agents to perform the same tasks my Bash automation already did:

Passive URL discovery + Researcher Agent
Web Scraper Agent to search for PDF and XLS(X) files
Compliance Agent to identify sensitive information
Writer Agent to generate the report

I was so excited to see each of my agents working together! Each agent passed its output to the next, with the final agent producing a report and delivering it to me. At that time, I used the OpenAI API key for everything. I didn’t fully understand—at least not as I do now—the importance of the kind of model used. I’ll talk more about model types later.

So, what happened next?

My workflow yielded exactly what you may have guessed—many false positives. The multi-agents flagged information disclosure vulnerabilities by the dozen, but when I manually reviewed the results, I found that many files contained words like password or confidential but only as part of an explanation. It was clear that they did not contain actual passwords, nor was the information actually confidential.

How to correct false positives

Now that I could identify the gaps in my agents, it was time to start the real work. First, I decided to review which parts of the AI workflow worked better with my traditional Bash script and which worked better with multi-agents. This comparison took a bit of time and patience, but it was well worth it. After this comparative process, only one step remained: improving the Writer Agent’s report generation. Once that was sorted, my classic Bash automation worked much better.

That automation was a direct modeling of how I conducted manual searches. Therefore, I had faith in my Bash automation because my process has been validated and rewarded time and time again. I mention this because it’s common for new hackers to start automating without truly understanding the how and the why, regardless of whether they’re implementing AI.

Pro tip: Develop a clear objective and a pathway to reach it before automating anything. This will save you headaches in the long run, and you will understand what the script or AI agent is actually doing. Also, intellectually and professionally, you should know what you are doing, right?

What did I learn?

AI is not always necessary. Of course, I continue to use AI (or rather, different LLMs) for hacking and my day job. My main point is that there is a time and a place for AI. There are many proven methods and tools that have been refined over years of trial and error, such as open-source tools and the sharp eyes of many hackers. Human intuition and creativity especially cannot be replicated by LLMs.

Is AI useless in bug bounty?

AI can certainly be useful in bug bounty. It’s a fundamental assistant, and it can even find valid vulnerabilities. I’ve had good results using AI to parse through code (both for interesting paths and for information disclosure) and identifying IDOR findings. These are vulnerabilities that you can find with enough time and dedication, but AI can certainly speed up and optimize your work in this case.

Not everything is of the same quality

Several open-source repositories and paid tools promise to perform pen testing and bug bounty almost fully unattended (i.e., without humans). When I see these, I ask myself two questions:

Which model is it compatible with?
Is there really no user input required to start this?

The answers to these questions make it clear that such solutions are not truly unattended. Their behavior can vary greatly depending on the prompt and the contextual information the user provides before starting. A more powerful model will generally produce better research and PoCs, but they still require human intervention.

This is because not all LLMs are created equal. Each model varies in power and capacity. Understanding which model to use is essential for your testing. Obviously, the more powerful the model, the more expensive the tokens will be. I have had excellent results with the most recent versions of Anthropic’s Sonnet and Opus models.

You don’t always need the most advanced model for a given task. It’s important to be critical and truly understand what you’re asking of AI-driven automation. Take into consideration the workflow, the level of difficulty, the tools the AI will need access to, and the tasks.

Keep it simple and consider these basic steps before you start:

Clearly outline the workflow that the AI automation needs to follow.
Provide the most specific guidance possible and consider which LLM you’re using.
Choose the appropriate model for each task.

No model will do the job the way you do, but if you take appropriate steps ahead of time, it will come pretty close.

AI in bug bounty can be an amplifier of proven personal skills and methodologies, not a substitute. Even AI-based solutions that rank on different bug bounty platforms have highlighted that the human eye is still necessary at varying stages of the reporting workflow. Embrace AI and keep it human.

Happy hacking, powered by AI!

Hacking en AI age con bronxi

Quiero contarles cómo descubrí las capacidades de la AI y cómo fui aprendiendo a mejorar mi forma de hackear a partir de mi propia experiencia.

Hace aproximadamente un año me pregunté cómo podía usar AI dentro de un flujo que ya tenía automatizado en Bash. Había creado ese flujo como una forma de buscar information disclosure en archivos PDF o XLS indexados en Google. En ese momento, la verdad es que no tenía ni idea de cómo usar AI para ese tipo de tareas: solo la utilizaba en modo chat.

Empecé a investigar sobre AI y rápidamente noté que muchas personas hablaban de multi-agentes. Así comenzó mi camino con los agentes de AI.

Busqué cómo crear mis propios multi-agentes y encontré distintos orquestadores, entre ellos LangChain. Finalmente di con uno más sencillo, basado en LangChain, llamado CrewAI. Me llevó bastante tiempo encontrar un proyecto base o algo similar, hasta que finalmente di con un video en YouTube que mostraba cómo utilizar CrewAI para crear agentes de OSINT. Tomé ese ejemplo como punto de partida y modifiqué los agentes para que realizaran las mismas tareas que ya hacía mi automatización en Bash:

Búsqueda pasiva de URLs + Researcher Agent
Web Scraper Agent para buscar archivos PDF y XLS(X)
Compliance Agent para identificar información sensible
Writer Agent para generar el reporte

Ver a los agentes trabajando en conjunto fue realmente emocionante. Cada agente le entregaba su output al siguiente, y el último utilizaba toda la información previa para generar un reporte y entregármelo. En ese momento utilicé la API key de OpenAI para todo el flujo. Aclaro que, por entonces, no entendía —al menos no como ahora— la importancia del modelo utilizado. Hoy el tipo de modelo hace una enorme diferencia, pero hablaré de eso más adelante.

¿Y qué creen que pasó después?

Exactamente lo que están pensando: muchos falsos positivos. Los multi-agentes detectaban supuestos casos de information disclosure en grandes cantidades, pero cuando revisaba manualmente los resultados con mi ojo humano encontraba archivos que contenían palabras como password o confidential solo como parte de una explicación o documentación. Al analizarlos con más detalle, quedaba claro que no contenían contraseñas reales ni información verdaderamente sensible. Si bien me alegró haber logrado que los agentes funcionaran en conjunto, la realidad es que los resultados no eran buenos.

Cómo corregí los falsos positivos

Cuando pude identificar las falencias del enfoque, llegó el momento del verdadero trabajo. Decidí revisar qué partes del flujo funcionaban mejor con mi script tradicional en Bash y cuáles funcionaban mejor con multi-agentes. Esa comparación llevó tiempo y paciencia, pero valió la pena. Al finalizar, solo un paso quedó a cargo de un agente: el Writer Agent encargado de generar el reporte. Una vez ajustado eso, mi automatización clásica en Bash volvió a funcionar mucho mejor.

Esa automatización, además, era un modelado directo de cómo yo buscaba manualmente. Es decir, confiaba en ella porque replicaba el mismo proceso que utilizo cuando hago hunting de forma manual, un proceso que ya había sido validado y recompensado en múltiples ocasiones. Menciono esto porque hoy es muy común ver a personas —especialmente quienes recién comienzan— automatizando con y sin AI sin entender realmente el cómo y el por qué.

Tip práctico: tené un objetivo claro y una noción del camino para alcanzarlo antes de automatizar cualquier cosa. Eso te va a ahorrar dolores de cabeza y te permitirá entender qué está haciendo realmente tu script o tu agente de AI. Además, desde lo intelectual y lo profesional, deberías saber qué estás haciendo, ¿no?

¿Qué aprendí de todo esto?

Que la AI no siempre es necesaria. Por supuesto que sigo utilizando AI —o mejor dicho, distintos LLMs— tanto para mi trabajo del día a día como para bug bounty hunting. El punto principal es que hay un momento y un contexto para usar AI. No todo necesita hacerse con AI, porque existen métodos probados que se han perfeccionado durante años de prueba y error, como las herramientas open source y el ojo entrenado de muchos hackers. Ese tipo de procesos no puede ser replicado por ningún LLM, al menos por ahora.

¿La AI es inútil para bug bounty?

Para nada. Es un asistente fundamental y, en algunos casos, puede encontrar vulnerabilidades válidas. He tenido buenos resultados utilizando AI para analizar código, tanto para identificar paths interesantes como posibles casos de information disclosure, y también para detectar IDORs. En estos escenarios, la AI acelera y optimiza el trabajo. Son vulnerabilidades que podrías encontrar con suficiente tiempo y dedicación, pero que podés descubrir mucho más rápido con la ayuda de AI.

No todo es igual

Mientras escribo esto, existen varios repositorios open source y herramientas comerciales que prometen realizar pentesting y bug bounty de forma casi completamente desatendida, es decir, sin intervención humana. Cuando me encuentro con este tipo de soluciones, me hago siempre dos preguntas:

¿Con qué modelo funcionan realmente bien?
¿De verdad no requieren ningún input del usuario para comenzar?

Las respuestas a estas preguntas dejan en claro que no son soluciones verdaderamente desatendidas. Su comportamiento puede variar enormemente según el prompt y la información de contexto que el usuario provea antes de iniciar el proceso. Además, un modelo más potente suele producir mejores investigaciones y PoCs, pero también implica un mayor costo en tokens.

Esto se debe a que no todos los LLMs son iguales. Cada modelo tiene distintas capacidades y niveles de potencia, y entender qué modelo usar es imprescindible para realizar buenas pruebas. He tenido excelentes resultados con las versiones más recientes de Sonnet y Opus de Anthropic.

Tampoco siempre necesitás el modelo más avanzado para cada tarea. Es clave ser crítico y entender realmente qué estás pidiéndole a una automatización con AI: el flujo, el nivel de complejidad, las tareas involucradas y el acceso a herramientas que va a necesitar.

Antes de comenzar, tené en cuenta estos puntos básicos:

Definí claramente el flujo que debe seguir la automatización
Brindá la guía más específica posible y evaluá qué LLM estás usando
Elegí el modelo adecuado para cada tarea

Ningún modelo va a hacer el trabajo exactamente como vos, pero si tomás estas decisiones de forma consciente, el resultado puede acercarse bastante.

Hoy, utilizar AI en bug bounty funciona como un amplificador de habilidades y metodologías personales ya probadas, no como un sustituto. Incluso soluciones basadas en AI que rankean en distintas plataformas de bug bounty han reconocido que el ojo humano sigue siendo necesario en diferentes etapas del flujo de reporte.

Adoptá la AI, pero mantené siempre el factor humano.

Happy hacking, potenciado con AI!!!

Tags:

What I learned building AI agents for bug bounty hunting

So, what happened next?

How to correct false positives

What did I learn?

Is AI useless in bug bounty?

Not everything is of the same quality

Hacking en AI age con bronxi

¿Y qué creen que pasó después?

Cómo corregí los falsos positivos

¿Qué aprendí de todo esto?

¿La AI es inútil para bug bounty?

No todo es igual

Subscribe for updates

Products

Use cases

Industries

Why Bugcrowd

Company

For Hackers

What I learned building AI agents for bug bounty hunting

So, what happened next?

How to correct false positives

What did I learn?

Is AI useless in bug bounty?

Not everything is of the same quality

Hacking en AI age con bronxi

¿Y qué creen que pasó después?

Cómo corregí los falsos positivos

¿Qué aprendí de todo esto?

¿La AI es inútil para bug bounty?

No todo es igual

More from the blog

From volume to validated risk: KPIs that measure exploitability, impact, and fix velocity

Experience vs methodology: How hackers make decisions

Community spotlight: Just Eat Takeaway.com

Subscribe for updates