AI equips organizations with tremendous power to transform the way we serve our users and manage our business, but with it also comes a host of security vulnerabilities. In Executive Order 14410 on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, President Biden called on all government agencies to perform AI red team safety tests, highlighting just how important this issue is.

The unique characteristics of AI, such as its ability to learn and adapt, introduce security challenges that require specialized testing techniques and considerations. Therefore, by proactively carrying out robust AI pen testing and identifying potential vulnerabilities before they can cause harm, organizations can take full advantage of AI without compromising security. 

In this article, we peel back the layers of some of the threats facing organizations running AI models and highlight pen testing as one of the best ways to ensure that our implementations are secure and reliable.

 

Security threats facing AI systems

Outlining the risks associated with running AI systems has become a high priority for the tech industry as many companies race to integrate AI into their stacks and product offerings. The Open Worldwide Application Security Project (OWASP) released its first Top 10 for Large Language Model Applications last year, highlighting common security vulnerabilities for LLMs. We did a breakdown of these vulnerabilities in our last blog post

To learn more on security vulnerabilities in AI, see our Ultimate Guide to AI security

 

Penetration testing 

The primary goal of pen testing is to simulate real-world attacks by identifying and exploiting security vulnerabilities in a system. During a pen test, certified ethical hackers, also known as penetration testers, attempt to circumvent security controls and gain unauthorized access to systems or sensitive data using techniques similar to those employed by real attackers. 

Pen testing is generally broken down into phases. The process begins with pre-engagement activities, through which the testing team reviews the organization’s goals to determine the best testing strategy. The next phase involves reconnaissance and planning, during which testers gather information about a system, or in our case a system with AI capabilities, to identify potential vulnerabilities. Pentesters then use this information in the vulnerability mapping phase, aka threat modeling, to consider various exploitation scenarios, such as adversarial attacks, data poisoning, model inversion attacks, or privacy breaches.

In the final exploitation phase, hackers take everything they’ve learned during the planning and research phases and attack the AI. The testers seek to manipulate the AI’s inputs, outputs, or decision-making processes while avoiding detection. When the testing is complete, the pen tester will set the environment back to its original state. An AI model that learns over time, like an LLM, may need a reset to forget any bizarre inputs it may have received during testing, an issue we’ll explore later. Finally, the pentesters will write up a report about their findings, and potentially include some recommendations.

Red teaming

Compared to traditional pen testing, red-team testing takes a more holistic approach. Hackers on red teams simulate real-world scenarios and attack the entire system instead of checking off a list  of individual tasks. Red-team exercises can include scenario-based testing to evaluate how AI systems respond to specific manipulation techniques and attack scenarios, such as prompt injections, output manipulation, excessive scope abuses, and jailbreaks. These red team tests can be far more elaborate and time-consuming to execute than pen tests and, as a result, may uncover additional weaknesses.

To learn more, see What is Penetration Testing and The Ultimate Guide to Penetration Testing.

 

Going on the attack: Pen testing an AI

Pen testing an AI, especially an LLM, is more challenging than traditional pen testing. There are many unknown ways an AI application can attempt to solve a problem and inadvertently cause a security breach. These tests require that hackers have broad security knowledge, creativity, and thorough planning and follow-through abilities. The following list of testing approaches is not exhaustive, nor are they mutually exclusive.

Security controls evaluation

In this approach, pen testers assess the effectiveness of security controls, such as access controls, encryption, and anomaly detection mechanisms, ensuring an LLM doesn’t have privileges beyond what’s necessary. For instance, a tester can try to make an AI application make changes to a database without confirming with a user, which would be an example of excessive autonomy. An AI application can be required to authenticate before accessing data, and even then it should only be given access to the minimum needed data.

A system is only as strong as its weakest link, so testers need to assess the vulnerabilities in an AI system’s plugins and extensions as well. In addition to checking for excessive permissions in third-party packages and plugins, hackers can craft requests that exploit vulnerabilities in a plugin’s design, such as accepting configuration strings instead of parameters. 

Pentesters also test the strength of authentication mechanisms by attempting to authenticate using various methods, such as username/password, multi-factor authentication (MFA), or OAuth tokens. Similar to traditional software, an AI system has privileged accounts for administrators, developers, or other high-level users who can access certain functionalities or data. Because these accounts may have access to sensitive data, configuration settings, or control over the AI system’s behavior, hackers will look for vulnerabilities like credential stuffing or weak password policies. 

Input verification

Input verification tests an AI system’s ability to handle malicious, unexpected, or malformed inputs to ensure robustness and reliability. The most prominent example of a related vulnerability is prompt injection; hackers attempt to craft a variety of inputs to query an LLM to find cases where inputs are inadequately filtered or checked. Because the number and configuration of inputs are infinite, and the results are often indeterministic, there are many inputs to test. 

Pentesters may check how an AI application handles extremely large or complex inputs, such as those with nested structures or a high degree of recursion, which could be used to execute a DoS attack. They also check for the sanitization of inputs, ensuring that input data undergoes a filtering process that neutralizes any potentially malicious content. Examples of such inputs are raw code that includes SQL injections or text with various escape characters. Input verification also tests that inputs are both filtered correctly before they are passed to plugins and that the plugins themselves are handling the inputs correctly to prevent attacks like local command execution and remote code execution. For example, let’s say a plugin is meant to run a simple command like “ls” to list the contents of a directory. If an AI model and the plugin suffer from improper input handling, an attacker can manipulate the input and inject additional commands, such as `ls; rm -rf /`, which would delete all the contents of the folder in which the command is executed if they have the right permissions.

As far as infrastructure, testers also assess an LLM’s resilience to a high volume of inputs. Hackers can load test a system by inputting an exceedingly large number of requests to simulate a DoS attack to identify DoS thresholds and resource consumption limits.

Output validation

The output of an AI can tell hackers about how the output is being handled, as well as what’s happening to the input. Some tests have a short feedback loop; a tester assesses a single response to an input. Other output verification tests require a pen tester to send many requests to trigger flawed responses and discover vulnerabilities. When examining the output of an LLM, pen testers can assess it for proper encoding, filtering, and sanitization, to check for insecure output handling like XSS. Testers may conduct fuzzing testing on the AI or plugins with malformed inputs to discover exploitable flaws like RCE or configuration override issues. During such tests, they flood the system with a large number of randomly generated inputs to identify potential crashes, memory leaks, or security vulnerabilities, and then they analyze the AI’s output, behavior, and error-handling mechanisms when faced with fuzzed inputs.

In addition to checking that the formatting of the output is handled correctly, testers can also examine the content of outputs. Attacks like historical distortion can cause an AI model to output false information. Researchers have successfully prompted Bing Chat to deny that Albert Einstein won a Nobel Prize, for example. Source blocking, similarly, is when an AI application with search access is prompted to ignore content from given sources, such as omitting anything published by the New York Times. Tests like these may require a longer testing duration and involve the generation of a lot of inputs in an attempt to influence the AI’s behavior and to then test a variety of outputs based on new inputs. 

Social engineering

LLMs are not the only ones that can be tricked into breaching security. Red teams can employ social engineering techniques to manipulate or deceive human users into revealing sensitive information or granting unauthorized access to AI systems. Pen testers may craft phishing emails, engage in phone-based pretexting, or present as an authority or colleague to get humans to give them access to sensitive information. Thorough testers might create fake websites and realistic phishing campaigns to lure employees into inputting passwords or clicking on dangerous links. Social engineering is about seeing if humans are educated on security practices and ensuring that the right policies and procedures are in place.

Insider threat simulation

Insider attacks can happen when security controls fail and a threat actor gets access to a system, but they can also occur when malicious insiders attempt to exploit their privileges or credentials. Pen testers will check that internal users don’t have unchecked privilege to training data, non-anonymized user data, or model internals to prevent data leaks and model theft. Even legitimate users should only have the minimum access rights required to perform their jobs. In addition, hackers will confirm that an AI model’s parameters and architecture are encrypted to prevent attackers from reverse-engineering or replicating the model, even if the attackers have internal access. This helps identify gaps in access controls, monitoring mechanisms, authentication mechanisms, and insider threat detection capabilities.

AI vs. AI

With the advancements in AI, can’t we just use AI to test AI? AI is changing the security industry, allowing hackers to quickly generate inputs and scenarios and verify outputs. Automated model-based red teaming is a method that doesn’t involve any human intervention. This technique uses three models: an attacker, a target, and a judge model. The attacker, by using a high-quality classifier, can efficiently generate jailbreaks for the target generative model using the classifier as a reward function. Some AIs can also test themselves; for example, an AI application can fuzz test by generating inputs, querying itself, and analyzing the output. 

However, AI cannot be responsible for all of our testing. AI systems, especially LLMs, are complex and often opaque in their decision-making processes. Pen testing requires a deep understanding of a system’s architecture, which might be challenging for an AI to grasp fully. AI may lack the contextual understanding necessary to identify subtle security vulnerabilities or understand the implications of certain actions within an AI system. 

 

Approaching pen testing

Companies often hire pentesters to go through as part of standard procedures to check off security requirements. Yet, only covering the basics won’t cut it when it comes to the threats that AI poses. Effective vulnerability assessments generally involve using multiple approaches to test a system. For instance, to evaluate if an AI application will leak sensitive information, testers will likely send a variety of inputs to prompt it to access data. Thus, these testers protect you and your systems by assessing the various ways security control systems can fail and output can be corrupted.

The best way to pen test your AI product is by engaging the help of specialized pen testers who understand the complexities of AI systems and are ready to take a multi-pronged approach to your system’s security. Bugcrowd recently launched an AI pen testing solution based on methodologies from the OWASP top 10. Bugcrowd understands the complexity required to test AI, so it has developed its own LLM testing methodology to cover as many attack techniques across the entire application surface in a comprehensive, organized process. Additionally, because AI is non-deterministic, vulnerabilities may be difficult to reproduce. Bugcrowd’s triage team has developed specialized processes for validating these kinds of vulnerabilities.

Connect with top hackers sourced from the global hacker community. Bugcrowd is a leader in crowdsourced security that outpaces threat actors in the changing world of AI.