Everyone’s arguing about whether AI “works” in security.
Wrong question.
The right question is: are you building workflows that let it work, or are you pasting code into ChatGPT and calling it a day?
I’ve spent the past year building CRIA (Code Risk Intelligence Agent) — a structured, multi-stage pipeline for security code review. No magic prompts. No zero-shot miracles. Just constrained workflows where each stage feeds the next with focused, architectural context.
It was producing results that consistently outperformed every SAST tool I’ve used professionally. But I didn’t trust it. Good results on client codebases could just be the LLM getting lucky — or worse, pattern-matching against something it saw during training. If I was going to stake a claim on this thing working, I needed to prove it properly.
The Benchmark
I took a vulnerable version of a widely-used, enterprise-grade identity and access management platform and targeted a known 2024 CVE — an authorization bypass buried deep in the API layer. The kind of vulnerability that SAST tools miss because it’s not about a single bad line of code. It’s about missing logic across multiple files and request flows.
Then I stripped every identifying marker: no version numbers, no CVE references, no project name, no README, no hints. If CRIA found it, it had to find it by actually understanding the code — not by pattern-matching against training data.
I ran it two ways:
- A single open-source model — 27B parameters, running locally on one GPU
- A multi-model pipeline routing between fast, balanced, and deep-reasoning models
Both found it. Both identified the exact files, the missing authorization checks, and both generated the correct fix.
That was the validation. The pipeline wasn’t getting lucky. The architecture — the way it decomposes a codebase, builds context, constrains each analysis stage — was doing exactly what it was designed to do.
The Part I Didn’t Mention
When I posted those benchmark results on LinkedIn, I left something out.
While I was writing that post, CRIA was already mid-scan on the latest release of the same open-source project. Not the old vulnerable version. The current one.
It flagged findings.
On Friday, I submitted a responsible disclosure report to the project’s security team. Fifty minutes later, they responded — confirming that the reported issue potentially warrants a new CVE, pending internal confirmation on root cause.

First external responsible disclosure. First potential CVE from the tool.
Honest Assessment
I’m not going to pretend this is a finished story. The security team rightly pointed out that static analysis findings need working exploits to cross the finish line — and that’s a fair bar. Building proof-of-concept exploits is next on the list.
Static analysis, whether it’s human-driven or LLM-driven, finds potential issues. Turning those into confirmed vulnerabilities takes additional work. That’s not a limitation of the tool — that’s how security research works.
The Actual Point
The difference between “AI doesn’t work for security” and “AI just found a real vulnerability” isn’t the model. It’s the scaffolding you build around it.
A 27-billion-parameter model running on a single consumer GPU found the same authorization bypass that a multi-model cloud pipeline found. The model barely mattered. The workflow did all the heavy lifting: decomposing the codebase into reviewable units, building architectural context, constraining each analysis pass to a focused scope.
If you want consistent results from AI in application security, stop asking it to be omniscient. Start building workflows that make it focused.
CRIA is still early. But the direction is clear, and the use case is real.
Ankit Prateek helps organizations move from reactive security to continuous, developer-integrated security systems. He has built enterprise risk assessment frameworks, asset inventories, and risk management programs from the ground up across multiple organizations. His tooling work spans LLM-powered code analysis (CRIA) and a cloud security assessment suite — both built on the same principle: structured workflows over raw model intelligence. Risk management for strategy, asset visibility for coverage, AI for scale.
Leave a comment