What happens to research when AI starts solving problems we thought were central?

Blog summary

  • AI is automating core security tasks—bug finding, triage, exploit sketching, patch drafting—making individual bug discovery a weak research contribution on its own.
  • Drawing on Richard Hamming, the author argues research must shift from single bugs to classes of bugs, evidence standards, and generalizable methods others can build on.
  • Four work categories emerge: work AI will commoditize, work AI makes newly tractable, work AI makes more urgent, and work AI structurally cannot do.
  • The field should study AI failures as first-class research: what agents consistently miss, where they hallucinate exploitability, and when human judgment is irreplaceable.
  • Like cryptography’s shift from break/fix cycles to hardness-based foundations, AI security research needs a similar theoretical grounding around limits, not just capabilities.
  • The researcher’s role shifts from execution to choosing the right questions, defining evidence standards, and generalizing findings into durable knowledge.

If AI substantially handles parts of vulnerability discovery, then ‘finding the next traditional security bug’ stops being the important research problem. The important problem moves one level up. The hard part is both identifying those problems, and letting go of the problems now solved.

What do great researchers do differently? This is a question I puzzle over constantly. One of the great things about being a professor at Carnegie Mellon University (CMU) is that I get to interact with great researchers every day. Some just keep hitting home runs. How? 

What comes next, now that AI is changing computer security so quickly? What would a great researcher start working on? My friend Rachel Dzombak at CMU reminded me about a paper on this problem by Richard Hamming. If you’ve not read it, it’s worth the diversion to check it out here

The parallel is not subtle

The parallel between Hamming’s discussion and what is happening in AI and security is almost too clean. Hamming was not really discussing productivity hacks. He was asking a harder question: How do you notice when the important problems in your field are changing, and how do you make yourself move toward them before the safe work eats your career?

Those are the questions security research must grapple with now.

For decades, a large part of software security has been organized around finding bugs—find the memory corruption, find the kernel bug, and figure out how to patch it autonomously. That work mattered, and it still does. But AI is starting to put real pressure on the assumption that finding individual bugs hard and expensive.  

Maybe this is not true everywhere. But the direction is hard to ignore.

So what do we do if a problem now has a solution? What was it like to be a  physicist working in the time of Newton? What did they do when they realized that Newton had solved a great many problems? Or Einstein? I think we’re living in one of those times now.

The wrong conclusion is: AI solved bug finding, so security research is dead. Physics continued to develop after Newton. And after Einstein. Cybersecurity work will also continue after Claude and ChatGPT. 

The better conclusion is: AI may solve enough of the old workflow to uncover new important research problems. 

Hamming’s useful test: Consequence plus an attack

One of the best parts of Hamming’s paper is his definition of an important problem. An important problem is not merely one with big consequences. For example, bringing about time travel would be hugely consequential. But, in his framing, such theories do not represent important research problems because there is no reasonable angle of attack. You could weave a good story, but there isn’t a place for science to start. 

Hamming proposed a more useful standard that is sharper: the problem has to matter, and there has to be some plausible way to make progress.

How does this relate to AI and security? AI has not made software safe. It has not removed the need for taste, judgment, or deep technical work. Simply ask anyone who has just filled up their context window. 

However, AI has created a plausible path of attack for problems that used to feel stubbornly manual: scalable vulnerability discovery, harness generation, crash triage, variant analysis, exploit sketching, patch drafting, and cross-codebase reasoning.

So the Hamming question becomes less comfortable:

If AI has created a plausible attack path for formerly unresolved, what should a serious security researcher work on now? 

Hint: The answer is not what was prestigious 10 years ago, not the work I got my PhD in, and not what lets us keep using the same mental furniture. 

Bug finding is now just plumbing

This is easier to write than to accept. A lot of us built our taste, reputation, and frankly, some of our identity around being good at this work. So if AI starts automating it, the hard part is not only technical—it is emotional.

This is the first uncomfortable truth. If AI systems become good at finding certain classes of bugs, then bugs in that class no longer comprise first-class research. Sure, it may still matter operationally, but the novelty is gone. 

Hamming was once asked to solve a numerical problem. He realized the deeper opportunity was not merely to arrive at an answer. It was to show that a digital computer could beat an analog computer on its own ground and do it in a way that produced a reusable method. He changed the problem and made the work matter more.

Security needs to move the same way.

Old frame vs new frame

Old framing More durable framing
Can we find this bug? Can we continuously discover, validate, prioritize, patch, and explain classes of bugs?
Can we generate a proof of vulnerability? Can we define what evidence is sufficient for a machine-verifiable vulnerability claim?
Can AI find a bug?  What is hard for an AI to reason about, and can we predict what it will be poor at? 
Can we rank this CVE? Can we reason about attacker paths across all signals in code, cloud, identity, and assets to accurately characterize operational risk?

 

The research move is to stop treating the single bug as the unit of thought. The better unit is the class, workflow, evidence standard, decision, or system that lets other people build on top of the result.

Hamming says the essence of science is cumulative. That is an important warning for security. A beautiful exploit against one target may be a brilliant artifact, but it may not accumulate. A method that changes how a class of security decisions gets made can accumulate. It becomes a platform for other work.

The uncomfortable sorting exercise

I keep coming back to four categories. They overlap, and the fourth is different from the others, but they help me think.

1. Work AI may commoditize

Some work will become less publishable unless it is attached to a deeper idea. It’s not useless; it’s just not a novel idea or important finding.

Finding a bug, by itself, is becoming a weak claim. While not unimportant, it’s simply not enough to justify academic research without something much larger underneath. 

We have all seen the pattern: This bug sat in the code for X years; therefore, the method must be important. While this is sometimes true, it is often just a way of making a single find sound like a general result. As AI gets better, this move will get harder to defend. A paper cannot simply say, “We used AI and found bugs.” It has to say what capability was unlocked, where it failed, and why the result should change how the field works. 

2. Work AI makes newly tractable

Some work becomes more interesting because AI lowers the cost of exploration.

You can now put 10 papers into AI and ask it what the unsolved questions and good lines of attack are. This is where AI becomes a microscope. It lets us ask empirical questions that were previously too expensive to pose or required too much human effort to execute well.  

Some may discount this as AI-generated work, but I disagree with a sweeping statement. I think the distinction is whether the human is providing real taste and direction while the AI acts as a research assistant, reviewer, and scale multiplier all at once.

3. Work AI makes more urgent

Some work becomes urgent because AI creates new risk.

How we can safely evaluate autonomous offensive systems is a big one. However, if you look one level deeper, the question becomes how you iterate assuming there are bugs. This issue hasn’t gone away. A big problem I don’t see enough discussion on is AI generated patches. Why aren’t they accepted today? How do we build a proper reward function that allows us to create patches a human would accept? We need this if defense is ever to operate at the same speed as offense. 

We can also turn existing practice problems into research problems. AI slop has threatened every area. What evidence should a machine produce before we believe it, say for a discovered vulnerability? Can we define something like proof-carrying AI vulnerability reports, like we had proof-carrying code in the 2000s? Can we measure the reduction in attacker effort resulting from AI rather than simply count bugs found? 

These questions are bigger than ‘can we find bug X?’ Answers to these questions are also harder to fake or generate from a single prompt. 

4. What can’t AI solve?

The fourth bucket is the most interesting to me right now: What can’t AI solve? 

Currently, the field is mostly focused on using AI to get new results. Can it find bugs? Can it generate exploits? Can it patch code? Can it run a workflow faster than a human? Those are useful questions and have been since 2010. But very few people are seriously discussing what was missed, and even fewer are asking whether there are structural reasons AI may keep missing certain things.

This matters because mature security fields eventually stop being organized around examples and start being organized around hardness. Cryptography is the obvious example. Early crypto was often break/fix. Someone proposed a cipher, and then someone else broke it. Then the field matured by asking a deeper question: What can we build if we know certain problems are hard? What are the limits? For example, we know perfect obfuscation has a negative theoretic result, but steganography has a positive result.  

AI/security needs a version of that question.

We should study AI failures as first-class research objects. What kinds of bugs do agents miss even after more training? What assumptions cause them to hallucinate exploitability? What environment details break their reasoning? Where do they produce patches that pass a test but fail maintainer judgment? Where do they need formal evidence, runtime context, or a human with taste? Can a small, deterministic change in the code, build, prompt, or environment reliably throw an agent off trajectory?

The goal is not to prove AI is weak. The goal is to map the boundary. If we can predict where AI succeeds and fails, then we have something much more valuable than another demo—we have the beginning of a theory for AI-native security.

The researcher’s job changes, but it does not go away

This is probably the most important point for people who worry that AI will make security research less meaningful.

Computers did not make science less scientific. They changed what counts as scientific work. Hamming saw that early. He spent time asking how computers would change science, how they would change Bell Labs, and where the opportunities would be. That was the right level of questioning.

AI is doing something similar in security. Agents run fuzzers. Agents bisect patches. Agents synthesize exploits. Agents sometimes weaponize. Agents test patches. Agents run the same experiment across thousands of targets. This is the world we had been working toward, and it looks like our efforts are finally seeing fruit. 

In this world, the human researcher is not the person turning the crank. It’s bittersweet, as many of us, including me, have spent years building up the skills to do these tasks ourselves. Instead, the researcher becomes the person who chooses the right question, designs the environment, defines the evidence standard, spots the artifact, and generalizes the result into knowledge.

That is not smaller work. It is different work.

The danger is not that AI does the work humans had to do. The danger is that we keep pretending the part that AI has solved is the research going forward. 

What Hamming would probably tell security researchers

I obviously do not know what Hamming would say, but let me make some predictions based on the nature of his talk. 

He would probably say to stop hiding behind luck. If AI is changing the field, then being early is partly luck and partly preparation. The people who have already thought through the important problems will see an opening first.  

If you need inspiration, there are some people who seem to repeatedly spot the next abstraction. In academia, Dawn Song (binary analysis -> agentic AI early) and Dan Boneh (constantly finding new crypto primitives for trust) are both obvious to me. When they say something, I just assume it’s correct. I’d also put Nicholas Carlini on the list for adversarial AI  and model evaluation.

Where I predict future research is headed

If I were turning my predictions into a concrete agenda for AI and security research, I would organize my thoughts under a few themes.

  1. Evidence, not assertions: Build systems that produce structured evidence for vulnerability claims: triggers, reachability, exploitability, affected versions, environmental assumptions, patch status, and confidence. AI allows larger-scale experiments to require less human effort to kick off, but we still need the human to carefully curate and analyze the evidence. 
  2. Benchmarks that resist cheating: Design environments where agents must actually discover, reason, and validate. Managers know you improve what you measure, so we need the proper measurements. 
  3. Attack-path reasoning: Move from isolated findings to chains of weakness across code, assets, identity, cloud, and business context. This is going to be especially hard in an academic setting because we often don’t have access to this data, but it is a critical next step.  
  4. Patch and remediation science: Treat remediation as a research problem, not cleanup. AI today tends to produce shallow patches that fix surface problems, not address root causes. The key question isn’t whether the crash went away; it’s whether the curl maintainer would accept it no questions asked. If you can pass that bar, you’ve solved a big problem. (Or maybe, and this is a guess, AI-generated patches will become the norm for AI-generated code, but human-curated code holds out.)
  5. What is hard: A model is a huge compression function that takes in the world’s data and represents it in a finite space. The No Free Lunch theorem has to apply somewhere; the trick is predicting where. Are there things that are always going to be hard fundamentally, where if you can flip a few bits, then the trained agent does poorly? Right now, everyone is focusing  on what AI can do. Just as important is what it cannot do. I look to historical crypto as the big signpost here, where for years, crypto was break/fix. However, in the last 50 years, the focus towards building on on a strong foundation based on what is hard. 

The bottom line

AI does not make security research obsolete, but it does change the level at which serious research has to operate. The truth is this change is probably hardest for those who have been in the field the longest because it also requires an emotional divestment from things we’ve been doing a long time. 

As bug finding becomes easier, we need to move up a level. The field should not ask only ‘Can AI find the bugs we used to find?’ The better question is:

What should security research become when finding bugs is no longer interesting or when even writing bug-free code is now possible?

This question is bigger, rougher, and more controversial. That’s exactly why it’s a good question.