In March 2026, we made a set of enforcement changes to Bugcrowd’s submission pipeline. The trigger was a three-week stretch where the triage queue grew by 334%, almost entirely from low-quality submissions—thin evidence, templated write-ups, and no real validation.
Internally, we started calling the pattern sloptimism: reports generated quickly and hopefully, where the author trusts the LLM more than the underlying evidence.
This post is about why that happened—and why the same phenomenon is now echoing up across security, open source, and even academia.
AI isn’t the perpetrator of bad reports. Rather, it changed the economics of any human-validated system. Convincing content got cheap to produce, and checking whether it’s actually correct did not. This isn’t just a bug bounty problem—it’s happening in science, the law, and everywhere we rely on human validation.
This single shift is enough to break any system built on manual triage.
Bug bounty platforms are just one place where we’re feeling the sloptimism pain. After the spike, we put in four concrete controls:
These weren’t policy tweaks. They were the first layer of spam control.
If you talk to maintainers, you will hear the same story. Open-source projects are getting reports that look legitimate and often look well-researched on the surface. They have references, impact descriptions, and even suggested patches. But that is all surface-level, and they collapse under inspection. People are spending real time proving that bugs don’t exist.
Daniel Stenberg, maintainer of curl, has described a steady stream of AI-generated vulnerability reports as “death by a thousand slops.” In early 2026, they received 20 submissions, none of which identified an actual vulnerability. curl isn’t alone, as tons of other OSS maintainers are raising similar concerns.
Academia is seeing its own version of this. Researchers estimate that 2–6% of citations in new conference submissions are hallucinated by AI. Estimates will vary of course, but even single-digit percentages are enough to matter. Reviewers generally assume cited work exists and use it to assess novelty and correctness. They typically will check the main references, but even a few seemingly innocuous references can have a snowball effect as papers cite each other over time. At Bugcrowd, venues are now adding AI slop detection as part of their paper review process, but as an insider, I can tell you that the current Band-Aid solutions won’t hold for long.
Lawyers are also running into the same failure mode, documents look correct but feature completely hallucinated precedent. The canonical example is the 2023 case of Mata v. Avianca, where attorneys submitted a brief generated with ChatGPT that cited multiple non-existent court cases. While courts have issued guidance, there simply isn’t enough human power to check everything.
Despite the diverse ecosystems, the problem is rooted in the same issues: generation got cheap, but validation didn’t.
For years, the system worked because writing a convincing vulnerability report was hard.
You had to do the following:
Now, anyone can generate a report that looks realistic for the price of only a few tokens. It will have the right structure, the right language, and even the right tone. What it won’t necessarily have is a real bug behind it.
In science, we assumed you had to do the following:
Now you can write any sentence and ask an LLM to fish for evidence that supports your claim, and when it fails, the LLM may hallucinate whatever you need, just to filfill your request. The law is similar.
This means the bottleneck has moved. It’s no longer report generation—it’s validation.
And that’s the variable that matters. If it’s cheap to generate and expensive to validate, volume wins—always.
Modern-day sloptimism mirrors email spam of the late 1990s. Spam overtook email because it effectively hid those that were legitimate.
Email recovered, but not because we got better at reading messages. It recovered because we layered defenses:
It took a decade to reach a workable equilibrium.
Security disclosure is now at the start of that same transition. But this problem will be more difficult to solve. With spam, we found good classifiers to make sure that 99% of it never reaches your inbox, and usually that 1% is easy to identify manually. Hallucinated reports require significantly more work.
This reminds me of something from complexity theory. We usually assume an NP-like world; it’s hard to produce a valid result but easy to check one. AI breaks that assumption and flips it on its head; it’s easy to produce a valid (looking) result but hard to check.
A few things actually seem to make a dent, and they are not surprising.
Identity. At some point, you need to know who is submitting. Without that, rate limits and enforcement don’t stick.
Reputation. Past signal matters. Researchers who consistently submit valid findings should move faster through triage, and new accounts should not. The academic community has floated similar ideas, where to submit a paper, you must have an email that corresponds to a known researcher, university, or institution. I’m on the OpenReview security review board, and we know this isn’t enough and are actively working to improve.
Rate limits tied to reputation. This is where things get practical. High-signal contributors get freedom. At Bugcrowd we rate limit submissions for new researchers. Similarly, top-tier conferences are rate limiting submissions, such as the 7 paper max at the tier-1 IEEE CCS security conference.
Pre-submission validation. This is the big one. If a report doesn’t include a reproducible POC—or can’t be validated automatically—it shouldn’t consume human attention. In academia, and I would hope law, automated tools are now combing references to make sure they are valid DOI references.
AI triage. Used correctly, this helps absorb volume—not by trying to detect “AI-written” reports but by prioritizing, deduplicating, and routing submissions so that humans can spend time where it matters. At Bugcrowd, this is already part of how we scale triage under load.
Non-technical pressure. We’re not aware of anywhere that has really leaned on this yet, but I think it’s coming. Reputation, visibility, and consequences outside a platform can change behavior in ways filters can’t. Over time, expect more experimentation with incentives and disincentives—how identity, reputation, and accountability show up publicly—not just how submissions are filtered.
At the same time, one thing that doesn’t work is simply rejecting any submission that uses AI. On a technical level, there are too many false positives to precisely determine whether something is AI-generated. However, it would be a big mistake to block any AI-generated submission. Some of the best research now uses AI effectively. The goal isn’t to ban tools—it’s to filter out volume without substance.
This shift cuts both ways, and legitimate researchers recognize the problem and want to help. As a community, we’ve put in endless hours building a foundation of responsible disclosure, and no one wants AI slop to shut down responsible bug bounty programs. If you’re submitting reports, please do the following:
Right now, the easiest way to get filtered out is to look like volume.
The easiest way to stand out? Make sure the proof signal is clear and validated.
This isn’t a temporary spike. It’s what happens when the cost of generating plausible technical content drops by orders of magnitude.
Every system built on human validation has to adjust. That’s a strength—the creative power of humans is unmatched. But it’s also a weakness when attackers try to flood the zone, whether it be security, academia, the law, or any other field.
Regardless of its strengths and weaknesses, AI is here to stay. What’s important is that we stay ahead of attackers trying to flood the system.