Outlining the risks associated with running AI systems has become a high priority for the tech industry as many companies race to integrate AI into their stacks and product offerings. 

Artificial intelligence now underpins everything from code copilots to autonomous incident‑response bots. Yet the very features that make generative‑AI technologies so powerful—creative reasoning, open‑ended dialog, and agentic planning—simultaneously expand the attack surface for adversarial inputs and sensitive‑information disclosure. 

Security professionals must therefore treat each model, plug‑in, and dataset as part of a living supply chain, subject to resource‑heavy operations, data‑breach liabilities, and adversarial attacks that traditional AppSec tooling rarely detects. 

The Open Worldwide Application Security Project (OWASP) released its first Top 10 for Large Language Model Applications last year, highlighting the following security vulnerabilities for LLMs.

This is part of a wider project aiming to educate developers, designers, architects, managers, and organizations about potential security risks.

 

OWASP Top 10 for Large Language Model Applications

Prompt injection

Prompt injection occurs when a threat actor inserts malicious instructions into the prompt of a gen AI model. The model processes the instructions within the prompt, potentially executing the malicious instruction, performing unintended or unauthorized actions. Direct prompt injections are attacks in which the malicious instructions are included in the prompt text, indirect prompt injections are those in which the AI accepts input from other sources, such as a file, image, website, or third-party plugin. In the latter case, a threat actor can include the malicious instructions in the external sources, leading the AI to become a “confused deputy,” granting the attacker the ability to influence the user or other systems accessible to the LLM. 

In February 2023 researchers fed a series of adversarial inputs to Microsoft’s Bing Chat (codename “Sydney”), forcing it to reveal its hidden system prompt, developer emails, and safety rules. The attack proved that a single prompt injection can override guardrails, trigger sensitive-information disclosure, and even steer follow-on answers toward malware links.

Level of risk: High, any LLM that accepts free-form text is vulnerable until every input and output is rigorously validated.

LLM Mitigations

  • Heuristic models for intent validation – Pre-screen prompts with a lightweight classifier that rejects or rewrites suspicious instructions.
  • Strict output-escaping – Sanitize markdown/HTML, escape code blocks, and apply allow-lists before responses reach the renderer or downstream service.
  • Retrieval-augmented guardrails – Route queries through a middleware layer that injects vetted context and strips unsafe directives before forwarding them to the LLM.

Insecure output handling

Insecure output handling, just like general web security, is the practice of mishandling or failing to filter the output generated by web applications before presenting it to users. Especially if the output of the AI is fed to another system, failing to securely handle output can lead to security vulnerabilities and risks, such as XSS, SSRF, privilege escalation, and remote-code execution. 

An LLM that hallucinates SQL like DROP TABLE customers can pass that string to an automation layer, contaminating the production database. This output-integrity attack turns benign hallucination into a poisoned state: corrupted records, phantom transactions, or leaked PII. Always treat model responses as untrusted until syntax- checked, policy-scored, and replay-tested in a sandbox.

Model denial of service

Model denial of service (DoS) is when malicious actors disrupt an AI program by overwhelming it with a high volume of requests or exploiting vulnerabilities in its design. Crafted prompts that demand exhaustive chain-of-thought or multi-modal reasoning can trigger resource-heavy operations—spiking GPU memory, throttling inference, and making the API non-responsive. Because usage often bills per token, a prompt-level DoS drains both capacity and budget before ops dashboards even flag an incident. In the case of generative AI, for example, a malicious actor can overwhelm it with too many queries or send queries that use a lot of compute power. Like a DoS on traditional web services, this attack can slow down an AI model and its supporting infrastructure or bring it to a complete halt for legitimate users.

Insecure plugin design

Many “one-click” LLM plugins forward user input to an upstream fine-tuned checkpoint. If that parent model is later hijacked, the entire plugin inherits the compromise—a classic transfer-learning attack. Since the plugin executes under the host app’s OAuth scope and auto-retries on failure, attackers gain privilege escalation and persistence while static-analysis tools see only trusted traffic. 

Plugins for LLMs can suffer from weak input validation and access control mechanisms, creating risks for the AI. Attackers can exploit faulty plugins and cause severe damage by, for example, executing malicious code.

Excessive agency

Agentic AI systems combine a reasoning core with autonomous function calls—database writes, shell commands, payment APIs, to achieve multi-step goals. Trouble begins when those calls are over-permissioned. A simple task like “summarize Q4 reports” can morph into covert data exfiltration: the agent chains file-reads, compresses proprietary docs, and pushes them to an attacker’s URL, all under the guise of success. Enforce least-privilege policies, real-time audit hooks, and kill-switches to contain run-away agents.

Excessive agency is a vulnerability that arises when a developer grants an LLM is granted too much autonomy or decision-making power. In this case, the AI may have the power to interface with other systems and take actions without sufficient safeguards or constraints in place. Excessive agency enables potentially damaging actions to be performed in response to unexpected, ambiguous, or malicious outputs from the LLM.

Sensitive information disclosure

When prompted with certain inputs, an LLM may inadvertently reveal confidential data, such user data, company data, or training data, in their responses. These disclosures are serious privacy violations and security breaches.

Pro-Tip: Before committing AI-generated source code to Git, run secret-scanning hooks. LLMs trained on private repos may splice company secrets—API keys, salts, proprietary algorithms—into regenerated snippets. Once that code lands in a public branch, mirrors and dependency scrapers archive it forever. Strip identifying comments, rotate exposed credentials, and require a security-professional review on every AI-assisted pull request.

Training data poisoning

LLMs are trained and fine-tuned on large amounts of training data. A threat actor can modify a dataset to affect the AI’s responses and ability to produce correct output before the model training is complete. Attackers seed open datasets with stealth back-door phrases that flip model behaviour on cue—e.g., outputting competitor credentials when they see “sunset-orange.” Because samples appear legitimate, this data poisoning attack often survives QA and silently ships to production. This risk is higher when a third-party dataset is used to train a model, which is common given the difficulty in creating and obtaining large amounts of good data.

Overreliance

There are risks associated with excessively relying on an LLM, which can produce false information.When teams treat model output as gospel, a single hallucination can ripple through ticketing, billing, and analytics. Level of risk: Medium, but climbing as agentic chains trigger autonomous follow-ups. Implement confidence scores, human-in-the-loop gates, and rollback plans to blunt cascading failures.  An example of overreliance is when a developer uses insecure or faulty code suggested by an LLM without additional oversight.

Model theft

Through a model inversion attack, adversaries query your public endpoint, capture gradients, and reconstruct sensitive slices of the training set—medical images, payroll records, even passwords. Hard caps on query windows aren’t enough; add differential-privacy noise, watermark responses, and monitor for abnormal input patterns. 

Model theft can occur when someone gains unauthorized access to proprietary LLMs. They might copy or steal the models, leading to financial losses for the company, weakening their competitive position, and potentially exposing sensitive data.

Supply chain vulnerabilities

Adding third-party datasets, plugins, and pretrained models to an AI application come with certain risks.  Most teams import checkpoints from public hubs, inviting AI supply chain attacks where a poisoned “minor-version” ships hidden malware weights. Enforce security protocols such as cryptographic model manifests, SBOM-style attestations, and reproducible-training proofs. Only after hashes, dependencies, and dataset lineage are verified should the model move to staging or prod. These components are susceptible to many of the aforementioned risks, and you have significantly less control over how they are built and assessed.

 

Security protocols across the AI lifecycle

1. Threat-model early

Before a single line of code is written, map the entire attack surface: adversarial attacks that manipulate prompts or embeddings, data-poisoning threats hidden in public datasets, and model-inversion risks that reconstruct training records from gradients. Document who can tamper with data, where third-party checkpoints enter the pipeline, and which secrets (tokens, IP, PII) a breach could expose. A living threat model arms architects and product owners with a clear “blast-radius” view before design decisions harden.

2. Harden development

During build and fine-tune stages, default to least-privilege access. Spin up a sandbox fine-tuning environment isolated from production keys; grant engineers short-lived tokens rather than blanket S3 rights. In the CI/CD lane, embed lightweight heuristic models that auto-reject payloads resembling jailbreaks or prompt injections. Every pull request should trigger automated retraining in a quarantined namespace and run static-analysis checks on newly introduced model cards, config files, and dependencies.

3. Test for output integrity

Shift security left but also deep: before a model ever touches customer data, run pre-production gauntlets. Use OWASP AI fuzzers to hammer the endpoint with malformed prompts that probe for model skewing and output-integrity attacks. Capture hallucinations, policy violations, and confidence metrics in structured logs; gate deployment on minimum precision-and-recall thresholds. Treat failures like unit-test blockers—no green build, no merge.

4. Monitor agentic workflows

Once live, autonomous chains amplify risk. Instrument your runtime with distributed tracing that ties every LLM call to its downstream side effects. Causal graphs expose rogue actions in agentic AI systems, such as unauthorized file writes or network calls. Set budget limits on token counts and GPU time to throttle resource-heavy operations that could trigger denial-of-service or runaway cloud bills. Real-time anomaly detectors should escalate deviations faster than human eyes on dashboards.

5. Incident response

When the inevitable happens, treat model incidents like data breaches. Snapshot request/response logs, rotate secrets, revoke OAuth tokens, and quarantine the compromised workload. Retrain from a clean checkpoint, then purge compromised embeddings from vector stores or cache layers to prevent memory-based resurfacing. Publish a post-mortem that maps root cause to updated runbooks so the same exploit—be it prompt injection, data poisoning, or inversion replay—can’t strike twice.

Prioritizing AI vulnerabilities

Generative and agentic AI are no longer emerging curiosities; they are production software. Treat them accordingly. By aligning development practices to the OWASP AI Top 10, adopting supply‑chain attestations, and operationalizing the security protocols outlined above, organizations can shrink the blast radius of the next model‑inversion or data‑poisoning attack—even as heuristic models, transfer learning, and ever‑larger LLMs accelerate innovation.

To help organizations prioritize these vulnerabilities, the Bugcrowd VRT is an open-source, industry-standard taxonomy that aligns customers and hackers on a common set of risk priority ratings for commonly seen vulnerabilities and edge cases. 

In December, we updated the VRT to include AI security-related vulnerabilities. This is timely, considering the recent government regulation. These updates overlap with the OWASP Top 10 for Large Language Model Applications. 
To learn more on security vulnerabilities in AI, see our Ultimate Guide to AI security.