The rapid evolution of general-purpose AI (GPAI) systems has brought incredible advancements, but also significant risks. Flaws in these systems—ranging from harmful outputs to systemic vulnerabilities—are not just technical issues, they’re societal ones. The stakes are high, and yet, the infrastructure to identify, report, and mitigate these flaws is underdeveloped.

Today, we’re urging AI developers to prioritize the needs of third-party, independent researchers and hackers who uncover flaws in AI systems. I joined 33 other experts in fields spanning machine learning, law, security, social science, and policy, in co-authoring a new paper, In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI. This paper proposes actionable solutions and advocates for a transformative approach to AI flaw reporting. This includes establishing standardized protections for researchers, robust reporting infrastructure, and coordinated disclosure mechanisms. These elements are essential to empower the researcher community and ensure AI systems are safer, more transparent, and accountable.

Drawing from decades of experience in software security, it’s clear that the AI community must adopt lessons from non-AI vulnerability disclosure to create a safer, more accountable ecosystem. At the heart of this effort is the concept of safe harbor: a legal and procedural framework that protects good-faith hackers while enabling organizations to address flaws responsibly. Without it, we risk stifling the very research that could make AI systems safer for everyone.

 

The case for safe harbor in AI flaw reporting

Safe harbor is not a new concept. In the software security world, it has been a cornerstone of vulnerability disclosure programs, providing legal protections for hackers who act in good faith to identify and report flaws. This framework has been instrumental in fostering collaboration between hackers and organizations, enabling the discovery and remediation of countless vulnerabilities.

However, the AI landscape presents unique challenges. Unlike traditional software, AI systems are often opaque, with complex supply chains and transferable flaws that can impact multiple stakeholders. Without clear guidelines and protections, hackers face significant barriers to reporting flaws, including legal threats, reputational risks, and a lack of standardized processes.

As I’ve said before, the ambiguity of existing laws and lack of framework surrounding protocols for good-faith security testing has sometimes resulted in legal threats, unlawful criminal punishment, and even jail for ethical hackers working to improve global security. ‌This chilling effect is already evident in the AI space, where hackers often choose to remain silent or disclose flaws informally, bypassing the organizations that could address them.

To address these issues, we need to establish a safe harbor framework tailored to AI. This includes clear rules of engagement, liability protections for researchers, and standardized reporting processes that ensure flaws are addressed without exposing researchers to undue risk.

 

Lessons from non-AI vulnerability disclosure

The software security community has spent decades refining the art and science of vulnerability disclosure. Initiatives like Disclose.io, which I co-founded, have demonstrated the value of standardization in creating safe harbor for hackers and organizations alike. By providing clear guidelines and legal language, Disclose.io has helped make vulnerability disclosure the “new normal” for many companies.

These lessons are directly applicable to AI:

  1. Standardized reporting
    Just as Disclose.io provides a template for vulnerability disclosure policies, the AI community needs a standardized AI Flaw Report schema. This schema should include key details such as system information, timestamps, flaw descriptions, and reproducibility steps. By standardizing the reporting process, we can reduce ambiguity and ensure that flaws are triaged and addressed efficiently.
  2. Clear scope and rules of engagement
    One of the biggest challenges in vulnerability disclosure is defining the scope of what hackers can test. In AI, this is even more complex due to the interconnected nature of GPAI systems. Providers must clearly define the boundaries of acceptable research and offer liability exceptions for researchers who operate within these boundaries. As the blog post notes, “broad readings of providers’ terms of service can deter flaw discovery as terms may prohibit reverse engineering or automatic data collection.” This is a problem we’ve seen before in software security, and it’s one that safe harbor policies can address.
  3. Centralized coordination
    The proposed Disclosure Coordination Center is a brilliant idea that builds on the concept of coordinated vulnerability disclosure (CVD). In traditional software security, CVD has been critical in managing multi-stakeholder vulnerabilities, ensuring that all affected parties are informed and able to mitigate risks before public disclosure. For AI, a centralized coordination body could play a similar role, routing flaw reports to the appropriate stakeholders and managing disclosure timelines to balance transparency with risk mitigation.

 

Building the infrastructure for AI flaw reporting

To make safe harbor a reality for AI, we need more than just legal protections—we need the right infrastructure. This includes:

  1. Flaw bounties
    Platforms like Bugcrowd prove the power of incentivizing vulnerability discovery through bug bounties. AI providers should adopt similar programs, expanding their scope to include not just security flaws, but also issues related to safety, trustworthiness, and ethical considerations. This aligns with the blog post’s recommendation to “consider a broader scope to include safety and trustworthiness flaws as well.”
  2. Reporting interfaces
    A dedicated interface for flaw reporting is essential. Email addresses are not enough— hackers need a secure, user-friendly platform that supports anonymous submissions, ongoing communication, and efficient triage. This is an area where existing platforms like Bugcrowd play a significant role, leveraging their expertise in vulnerability management to support the unique needs of AI flaw reporting.
  3. Community engagement
    Safe harbor is not just a legal framework—it’s a cultural shift. Organizations must actively engage with the research community, demonstrating their commitment to transparency and collaboration. This includes publishing responsible disclosure policies, participating in industry initiatives, and recognizing the contributions of hackers who help make their systems safer.

 

The path forward

The challenges of AI flaw reporting are significant, but they are not insurmountable. By learning from the successes and failures of non-AI vulnerability disclosure, we can build a system that empowers researchers, protects organizations, and ultimately makes AI safer for everyone.

Safe harbor is the foundation of this effort. It’s about creating an environment where researchers can operate without fear, where organizations can address flaws without hesitation, and where the public can trust that AI systems are being held to the highest standards of safety and accountability.

As the blog post rightly concludes, “Coordinated flaw reporting infrastructure coupled with protections for third-party AI researchers can help reshape the future of AI evaluation.” This is not just an opportunity—it’s an imperative.

 

Further reading and resources

By leveraging these resources and building on the lessons of the past, we can create a future where AI flaw reporting is not just possible, but effective, impactful, and safe for all involved. Let’s get to work.