What happens to research when AI starts solving problems we thought were central?
If AI substantially handles parts of vulnerability discovery, then ‘finding the next traditional security bug’ stops being the important research problem. The important problem moves one level up. The hard part is both identifying those problems, and letting go of the problems now solved.
What do great researchers do differently? This is a question I puzzle over constantly. One of the great things about being a professor at Carnegie Mellon University (CMU) is that I get to interact with great researchers every day. Some just keep hitting home runs. How?
What comes next, now that AI is changing computer security so quickly? What would a great researcher start working on? My friend Rachel Dzombak at CMU reminded me about a paper on this problem by Richard Hamming. If you’ve not read it, it’s worth the diversion to check it out here.
The parallel between Hamming’s discussion and what is happening in AI and security is almost too clean. Hamming was not really discussing productivity hacks. He was asking a harder question: How do you notice when the important problems in your field are changing, and how do you make yourself move toward them before the safe work eats your career?
Those are the questions security research must grapple with now.
For decades, a large part of software security has been organized around finding bugs—find the memory corruption, find the kernel bug, and figure out how to patch it autonomously. That work mattered, and it still does. But AI is starting to put real pressure on the assumption that finding individual bugs hard and expensive.
Maybe this is not true everywhere. But the direction is hard to ignore.
So what do we do if a problem now has a solution? What was it like to be a physicist working in the time of Newton? What did they do when they realized that Newton had solved a great many problems? Or Einstein? I think we’re living in one of those times now.
The wrong conclusion is: AI solved bug finding, so security research is dead. Physics continued to develop after Newton. And after Einstein. Cybersecurity work will also continue after Claude and ChatGPT. The better conclusion is: AI may solve enough of the old workflow to uncover new important research problems.
The wrong conclusion is: AI solved bug finding, so security research is dead. Physics continued to develop after Newton. And after Einstein. Cybersecurity work will also continue after Claude and ChatGPT.
The better conclusion is: AI may solve enough of the old workflow to uncover new important research problems.
One of the best parts of Hamming’s paper is his definition of an important problem. An important problem is not merely one with big consequences. For example, bringing about time travel would be hugely consequential. But, in his framing, such theories do not represent important research problems because there is no reasonable angle of attack. You could weave a good story, but there isn’t a place for science to start.
Hamming proposed a more useful standard that is sharper: the problem has to matter, and there has to be some plausible way to make progress.
How does this relate to AI and security? AI has not made software safe. It has not removed the need for taste, judgment, or deep technical work. Simply ask anyone who has just filled up their context window.
However, AI has created a plausible path of attack for problems that used to feel stubbornly manual: scalable vulnerability discovery, harness generation, crash triage, variant analysis, exploit sketching, patch drafting, and cross-codebase reasoning.
So the Hamming question becomes less comfortable:
If AI has created a plausible attack path for formerly unresolved, what should a serious security researcher work on now?
Hint: The answer is not what was prestigious 10 years ago, not the work I got my PhD in, and not what lets us keep using the same mental furniture.
This is easier to write than to accept. A lot of us built our taste, reputation, and frankly, some of our identity around being good at this work. So if AI starts automating it, the hard part is not only technical—it is emotional.
This is the first uncomfortable truth. If AI systems become good at finding certain classes of bugs, then bugs in that class no longer comprise first-class research. Sure, it may still matter operationally, but the novelty is gone.
Hamming was once asked to solve a numerical problem. He realized the deeper opportunity was not merely to arrive at an answer. It was to show that a digital computer could beat an analog computer on its own ground and do it in a way that produced a reusable method. He changed the problem and made the work matter more.
Security needs to move the same way.
The research move is to stop treating the single bug as the unit of thought. The better unit is the class, workflow, evidence standard, decision, or system that lets other people build on top of the result.
Hamming says the essence of science is cumulative. That is an important warning for security. A beautiful exploit against one target may be a brilliant artifact, but it may not accumulate. A method that changes how a class of security decisions gets made can accumulate. It becomes a platform for other work.
I keep coming back to four categories. They overlap, and the fourth is different from the others, but they help me think.
Some work will become less publishable unless it is attached to a deeper idea. It’s not useless; it’s just not a novel idea or important finding.
Finding a bug, by itself, is becoming a weak claim. While not unimportant, it’s simply not enough to justify academic research without something much larger underneath.
We have all seen the pattern: This bug sat in the code for X years; therefore, the method must be important. While this is sometimes true, it is often just a way of making a single find sound like a general result. As AI gets better, this move will get harder to defend. A paper cannot simply say, “We used AI and found bugs.” It has to say what capability was unlocked, where it failed, and why the result should change how the field works.
Some work becomes more interesting because AI lowers the cost of exploration.
You can now put 10 papers into AI and ask it what the unsolved questions and good lines of attack are. This is where AI becomes a microscope. It lets us ask empirical questions that were previously too expensive to pose or required too much human effort to execute well.
Some may discount this as AI-generated work, but I disagree with a sweeping statement. I think the distinction is whether the human is providing real taste and direction while the AI acts as a research assistant, reviewer, and scale multiplier all at once.
Some work becomes urgent because AI creates new risk.
How we can safely evaluate autonomous offensive systems is a big one. However, if you look one level deeper, the question becomes how you iterate assuming there are bugs. This issue hasn’t gone away. A big problem I don’t see enough discussion on is AI generated patches. Why aren’t they accepted today? How do we build a proper reward function that allows us to create patches a human would accept? We need this if defense is ever to operate at the same speed as offense.
We can also turn existing practice problems into research problems. AI slop has threatened every area. What evidence should a machine produce before we believe it, say for a discovered vulnerability? Can we define something like proof-carrying AI vulnerability reports, like we had proof-carrying code in the 2000s? Can we measure the reduction in attacker effort resulting from AI rather than simply count bugs found?
These questions are bigger than ‘can we find bug X?’ Answers to these questions are also harder to fake or generate from a single prompt.
The fourth bucket is the most interesting to me right now: What can’t AI solve?
Currently, the field is mostly focused on using AI to get new results. Can it find bugs? Can it generate exploits? Can it patch code? Can it run a workflow faster than a human? Those are useful questions and have been since 2010. But very few people are seriously discussing what was missed, and even fewer are asking whether there are structural reasons AI may keep missing certain things.
This matters because mature security fields eventually stop being organized around examples and start being organized around hardness. Cryptography is the obvious example. Early crypto was often break/fix. Someone proposed a cipher, and then someone else broke it. Then the field matured by asking a deeper question: What can we build if we know certain problems are hard? What are the limits? For example, we know perfect obfuscation has a negative theoretic result, but steganography has a positive result.
AI/security needs a version of that question.
We should study AI failures as first-class research objects. What kinds of bugs do agents miss even after more training? What assumptions cause them to hallucinate exploitability? What environment details break their reasoning? Where do they produce patches that pass a test but fail maintainer judgment? Where do they need formal evidence, runtime context, or a human with taste? Can a small, deterministic change in the code, build, prompt, or environment reliably throw an agent off trajectory?
The goal is not to prove AI is weak. The goal is to map the boundary. If we can predict where AI succeeds and fails, then we have something much more valuable than another demo—we have the beginning of a theory for AI-native security.
This is probably the most important point for people who worry that AI will make security research less meaningful.
Computers did not make science less scientific. They changed what counts as scientific work. Hamming saw that early. He spent time asking how computers would change science, how they would change Bell Labs, and where the opportunities would be. That was the right level of questioning.
AI is doing something similar in security. Agents run fuzzers. Agents bisect patches. Agents synthesize exploits. Agents sometimes weaponize. Agents test patches. Agents run the same experiment across thousands of targets. This is the world we had been working toward, and it looks like our efforts are finally seeing fruit.
In this world, the human researcher is not the person turning the crank. It’s bittersweet, as many of us, including me, have spent years building up the skills to do these tasks ourselves. Instead, the researcher becomes the person who chooses the right question, designs the environment, defines the evidence standard, spots the artifact, and generalizes the result into knowledge.
That is not smaller work. It is different work.
The danger is not that AI does the work humans had to do. The danger is that we keep pretending the part that AI has solved is the research going forward.
I obviously do not know what Hamming would say, but let me make some predictions based on the nature of his talk.
He would probably say to stop hiding behind luck. If AI is changing the field, then being early is partly luck and partly preparation. The people who have already thought through the important problems will see an opening first.
If you need inspiration, there are some people who seem to repeatedly spot the next abstraction. In academia, Dawn Song (binary analysis -> agentic AI early) and Dan Boneh (constantly finding new crypto primitives for trust) are both obvious to me. When they say something, I just assume it’s correct. I’d also put Nicholas Carlini on the list for adversarial AI and model evaluation.
If I were turning my predictions into a concrete agenda for AI and security research, I would organize my thoughts under a few themes.
AI does not make security research obsolete, but it does change the level at which serious research has to operate. The truth is this change is probably hardest for those who have been in the field the longest because it also requires an emotional divestment from things we’ve been doing a long time.
As bug finding becomes easier, we need to move up a level. The field should not ask only ‘Can AI find the bugs we used to find?’ The better question is:
What should security research become when finding bugs is no longer interesting or when even writing bug-free code is now possible?
This question is bigger, rougher, and more controversial. That’s exactly why it’s a good question.