The Future of Pentesting Is Not a Scanner And It’s Not a Human Either
- Trevor Baines
- Feb 19
- 10 min read
Why we built the infrastructure behind continuous testing — and what it says about where application security is headed.
Arcane Security Engineering Blog
February 2026
The Two Camps That Defined — and Limited — Our Industry
If you’ve spent any time in application security, you’ve seen the same debate play out a hundred times. Somebody on the engineering side says “we should just automate our security testing,” and somebody on the security side says “automated scanners miss everything that matters.” They’re both right. And they’re both wrong.
Camp 1: Automated scanners. They’re fast. They scale. They’re great at catching infrastructure-level issues — misconfigurations, missing headers, known CVEs. But they’re noisy. They produce mountains of false positives. And they’re completely blind to business logic flaws, which tend to be the vulnerabilities that actually get exploited in the real world.
Camp 2: Manual penetration testers. They’re thorough. They think creatively. They understand context in ways no automated tool can. But they’re slow, expensive, and don’t scale. A great pentester might spend two weeks with your application and produce a phenomenal report — and then your dev team ships 47 commits the following Monday and the report is already stale.
But here’s the part nobody talks about: even within Camp 1, there’s a massive problem that goes beyond just missing business logic. The scanners themselves are drowning their operators in noise.
The Dirty Secret of Security Scanners: The Noise Problem
Run Burp Suite against a modern web application. You’ll get back dozens, sometimes hundreds, of findings. Now sit down and actually triage them. You’ll discover that a significant percentage are false positives, informational noise, or findings that are technically accurate but carry zero real-world risk.
The same thing happens with cloud security. Run ScoutSuite against an AWS account and you’ll get flagged for default VPC security groups in 15 regions — but only 2 of those regions have any running resources. You’ll get flagged for an open SSH security group that isn’t attached to a single EC2 instance. Technically a finding. Practically meaningless.
Some real examples from our own scanning:
Burp flags “Input returned in response (reflected)” on a .css file. That’s not XSS. It’s a stylesheet.
Burp flags “Frameable response (potential Clickjacking)” on an API endpoint that serves JSON. You can’t clickjack a JSON response.
ScoutSuite flags an IAM user without MFA — but the user has no console access. It’s an API-only service account.
ScoutSuite flags S3 bucket policy issues, but public access is fully blocked at the account level. The findings are moot.
This is the noise that security engineers spend hours manually triaging after every scan. It’s the reason automated scanning has a credibility problem. Not because the scanners are bad — Burp Suite and ScoutSuite are both excellent tools — but because nobody has built the intelligence layer on top of them to separate signal from noise automatically.
Until now.
The Thesis: AI-Augmented Human Judgment, Running Continuously
At Arcane Security, our bet is that the future of pentesting isn’t about choosing between automation and humans. It’s about building infrastructure that lets them work together in a continuous loop, each doing what they’re best at, with AI helping prioritize, correlate, and reduce noise between the two.
Here’s what that looks like in practice:
Automated discovery runs continuously — scanning web applications and cloud configurations on an ongoing basis, catching the infrastructure-level issues scanners are genuinely good at finding. Not once a year during a pentest engagement. Always watching.
Intelligent triage eliminates the noise — false positive reduction through pattern-based rules and active cross-reference validation, so the findings that reach a human are the ones that actually matter.
Human pentesters validate and go deeper — focusing their time on real issues, chaining vulnerabilities together, and finding the attack paths that no scanner would ever see.
AI accelerates everything in between — prioritizing findings by exploitability, correlating vulnerabilities across scans and over time, and eventually performing agentic validation of findings autonomously.
This isn’t a theory. It’s the architecture we’ve built. And it’s called Project Theta.
Why We Built Our Own Security Scanning Platform
If the vision is continuous automated discovery feeding into intelligent triage feeding into human validation, the first thing you need is a scanning platform you actually control. One that covers both web applications and cloud infrastructure. One that you can orchestrate, schedule, and integrate into your own workflow. And critically, one with a false positive reduction layer baked in — not bolted on as an afterthought.
We looked at what was available. Nothing fit.
The Landscape We Evaluated
Enterprise SaaS platforms like Qualys, Rapid7, and Tenable are powerful, but they’re built for compliance teams and SOCs. They produce dashboards for CISOs. They don’t integrate naturally into a pentester’s workflow, and they don’t give you any control over false positive reduction logic. The licensing models don’t align with continuous testing across many client targets either.
Open-source DAST tools are getting better, but they lack the scanning depth of Burp Suite Professional. Burp’s engine has been refined by PortSwigger’s research team for years, and there’s a reason it’s the industry standard.
Burp Suite and ScoutSuite themselves are both excellent — but they’re standalone tools. Burp is a desktop app built for a single pentester at a GUI. ScoutSuite is a CLI tool that dumps findings to a file. Neither was built to be orchestrated programmatically, run as part of a larger platform, or combined under a unified API. And neither has any built-in intelligence for separating real findings from noise.
So we built the infrastructure to run it the way we needed it to run.
Project Theta: A Unified Security Scanning Platform
Project Theta is a Python-based platform that provides a unified REST API and CLI wrapping two types of security scanning under one roof:
Web Application DAST — powered by Burp Suite Professional running headlessly on cloud infrastructure
Cloud Configuration Security Reviews — powered by ScoutSuite (NCC Group’s open-source tool) for AWS, Azure, and GCP
On top of both scanners sits the key innovation: a false positive reduction layer that uses two complementary approaches to automatically separate signal from noise before findings ever reach a human.
At the operator level, you type a command and Theta handles everything else. Scope configuration, crawling, auditing, report generation — all automated, all hands-off.
The Architecture
Theta runs on a two-server architecture deployed on AWS EC2, both in the same VPC:
Component | Role |
Server 1: Theta Client + API | Runs the Python package: FastAPI REST API server (port 8080), Theta CLI, and ScoutSuite subprocess orchestration. This is the brain — it dispatches scans, manages lifecycle, runs false positive reduction, and serves the unified API. |
Server 2: Burp Server | Runs Burp Suite Professional headlessly via systemd, exposing Burp’s official REST API (port 1337) for scan control and a custom Python Flask adapter (port 8090) for scope management and reporting. |
A single POST /scan endpoint dispatches to either Burp or ScoutSuite based on a scanner_type field. From the consumer’s perspective — whether that’s a SaaS frontend, a CI/CD pipeline, or a pentester using the CLI — it’s one API for all security scanning.
The Unified API
The REST API is built on FastAPI and designed around a clean, consistent pattern: start a scan, poll for status, retrieve results, then optionally run false positive reduction on those results.
Endpoint | Purpose |
POST /scan | Start a scan (Burp or ScoutSuite, selected by scanner_type) |
GET /scan/{id}/status | Poll scan status (works for both scanner types) |
GET /scan/{id}/report | Download the HTML scan report |
GET /scan/{type}/{id}/findings | Retrieve structured findings as JSON |
POST /scan/{type}/{id}/runbook | Apply false positive runbook rules to findings |
POST /scan/scoutsuite/{id}/validate | Run active cross-reference validators on cloud findings |
These capabilities move scanning from a manual task to a repeatable workflow.
The Intelligence Layer: False Positive Reduction
This is the part of Theta we’re most proud of, and the part that makes the biggest practical difference. Scanning is a solved problem — Burp and ScoutSuite are both excellent at finding things. The unsolved problem is figuring out which of those things actually matter. Theta attacks this with two complementary approaches.
Approach 1: Runbooks
Runbooks are static JSON rule files that encode patterns of known noise. They’re built from hard-won experience — the patterns you learn after triaging thousands of scanner findings and seeing the same false positives over and over again.
For Burp, the runbook uses field matching with glob patterns. Fields within a rule are AND’d; rules across the file are OR’d. Some examples from our default ruleset:
“Input returned in response” on .css or .js files — not real XSS, just how stylesheets and scripts work
“Frameable response (potential Clickjacking)” on /api/ endpoints — API endpoints serve JSON, not frameable HTML
“Strict transport security not enforced” on http://* — you can’t enforce HSTS without TLS
All findings with severity=information AND confidence=tentative — pure noise by definition
For ScoutSuite, runbooks use ScoutSuite’s native exceptions format, organized by service and rule. You can except an entire rule or specific resources within a rule. Critically, runbooks never delete findings — they flag them. The consumer decides how to handle flagged items. The data is always preserved.
Approach 2: Active Validators
This is where things get interesting. Runbooks handle the obvious, pattern-based noise. But some false positives can’t be caught by pattern matching alone — you need to actually look at the context of the finding and cross-reference it against other data to determine if it’s real.
Theta’s active validators do exactly this. They’re code functions that analyze the data ScoutSuite already collected during the scan — without making any additional live API calls — to determine whether a finding is genuine or noise. This makes them fast, safe, and deterministic.
We currently have 7 validators covering 24 ScoutSuite rules:
Validator | What It Checks | Example False Positive Caught |
Security Group Open Ports | Is the security group attached to any instance or ENI? | Open SSH on an SG not attached to anything — zero risk |
Default VPC / Security Group | Does the VPC have any running resources? | Default SG rules in 15 empty regions — noise |
IAM User Without MFA | Does the user have console access? | API-only service account with no login profile |
IAM Unused Credentials | Cross-references credential reports for last-used dates | Adds concrete context: when keys/passwords were last used |
IAM Key Rotation | Are the flagged keys active or inactive? | No active keys — cleanup issue, not active risk |
S3 Public Access | Is public access blocked at bucket or account level? | Public access fully blocked — S3 findings are moot |
VPC / Subnet Findings | Does the region have any compute resources? | 15 empty regions flagged, only 2 with real resources |
The validators traverse ScoutSuite’s nested data structures — services, regions, VPCs, instances, ENIs, security groups — using dot-path resolution. Each finding gets a verdict (confirmed or likely false positive) and a human-readable reason explaining why. In a recent validation run, we reduced 45 raw cloud findings down to 22 confirmed issues, with 2 flagged as likely false positives and 21 findings in categories we haven’t built validators for yet.
That’s the kind of triage that used to take a security engineer hours. Theta does it in seconds.
The Hard Problems Nobody Talks About
Building Theta sounded manageable on paper. In practice, we ran into challenges that don’t show up in any documentation.
Running Burp headlessly in the cloud. Burp Suite Professional is a GUI application. Running it on a headless EC2 instance required license activation via X11 forwarding, followed by headless operation via systemd. None of this is documented. We built the operational playbook from scratch.
The Java compatibility wall. The most popular open-source Burp REST API wrapper is a Java Spring Boot application with a Java 9+ URLClassLoader incompatibility that silently breaks Burp’s license validation. Rather than fighting the Java ecosystem, we built a lightweight Python Flask adapter — same API surface, none of the compatibility issues.
ScoutSuite as a subprocess. ScoutSuite can’t be imported as a Python library — it’s CLI-only. Theta orchestrates it via subprocess with background threading for long-running scans. Cloud credentials are passed through environment variables, not CLI arguments, to prevent exposure in process listings. Small detail, but the kind of thing that separates a proof of concept from production-grade tooling.
Abstracting two scanners and three APIs into one experience. Burp’s official API and its extension-based API have different capabilities. ScoutSuite has its own output format. Getting all of this to behave as a single, coherent platform with a unified API, consistent findings format, and a shared false positive reduction framework required careful design at every layer.
First Blood: The Proof of Concept
On February 10, 2026, Theta completed its first end-to-end automated web application scan against scanme.nmap.org:
Metric | Result |
Crawl Requests | 48 |
Unique Locations Discovered | 7 |
Audit Requests Performed | 4,082 |
Vulnerabilities Found | 15 (1 Low, 14 Informational) |
Key Findings | Unencrypted comms, Clickjacking, Missing charset, Input reflection |
Report | HTML — auto-generated on completion |
The findings themselves aren’t the story — it’s a test target. The value of this scan isn’t the findings themselves. It’s that the entire pipeline ran automatically, from command invocation to report generation, without human intervention. Now scale that to real client applications running nightly, with cloud config scans running alongside, and findings flowing through the false positive reduction layer before a pentester ever sees them. That is the operating model we are building toward.
Where This Is Going
Theta today covers Phases 1 through 3 of our roadmap: core Burp integration, the unified REST API, and ScoutSuite cloud scanning with runbooks and active validators. Here’s what’s next:
Phase 4 — AI-Powered Validation. This is the next frontier. Using AI models for agentic vulnerability confirmation — automatically testing whether a reflected XSS finding actually executes, whether a SQL injection payload actually returns data, whether a cloud misconfiguration is actually exploitable in context. The active validators we built for ScoutSuite are the manual version of this. AI lets us do it at a level of sophistication and scale that rule-based systems can’t reach. As scan data accumulates and is validated by human pentesters, we build a feedback loop that allows prioritization models to learn from confirmed exploitability, false positives, and chained findings across environments. This is where the AI and human collaboration model becomes measurable and operational.
Phase 5 — Production Hardening. Queue-based scan management with Redis or RabbitMQ, multi-tenant support, persistent storage in PostgreSQL, Docker containerization, Kubernetes deployment, and structured observability. This is where Theta goes from a powerful internal platform to production-grade infrastructure that can run at real scale.
The Bigger Argument
Project Theta is the scanning foundation behind our continuous testing model. But the reason we’re writing about it isn’t to talk about scanning. It’s to make an argument about the future of this industry.
The companies that will lead the next era of security won’t be the ones with the best scanner or the biggest bench of pentesters. They’ll be the ones that figure out how to combine automation, intelligent triage, and human expertise into a single continuous pipeline — where each component amplifies the others and none of them operate in isolation.
Scanners are not going to replace pentesters. AI is not going to replace either of them. But the pentester who is augmented by AI, backed by continuous automated discovery, and freed from hours of manual triage by an intelligent false positive reduction layer is going to outperform everyone else — by a wide margin. That is not speculation. It is the natural outcome of combining scale with judgment.
Theta is how we’re building toward that future. And we’re just getting started.
—
Built by Arcane Security. For questions, collaboration, or to learn more about Project Theta, get in touch.




Comments