The Software Factory Is Going Dark. The Audit Trail Cannot.
Autonomous agents will soon write, test, and ship half your code. In a regulated institution, the advantage is not the coding agent. It is the supervisor that governs the floor and proves what it did.
Securing the Agentic Era. Article 19 · The New Stack
Most enterprise agentic initiatives are not failing because the models are weak. They are failing because they were never built to be governed.
And the pressure is climbing fast. This spring, Cognition’s Devin, an autonomous AI software engineer, crossed 492 million dollars in annualized revenue, with customers that are not startups: Goldman Sachs, Citi, and Santander among them. A rival, Factory, reached a 1.5 billion dollar valuation in a single round. The machines that write software are now writing it inside banks.
Gartner’s number is the one that should focus a board. By the end of this year, autonomous agents will write, test, or deploy close to half of all enterprise code. That is the autonomous figure, code an agent writes, tests, and merges on its own, not the AI-assisted coding that already fills most pull requests. Assisted code still has a person on the keystroke. Autonomous code moves the human off it, which is exactly what changes who is accountable. The same analysts expect more than four in ten agentic projects to be cancelled by 2027. The reason they give is not capability. It is inadequate risk controls.
The software factory is going dark, in the lights-out sense: work moving through the floor with fewer and fewer people standing over it. For most of the industry that is a productivity story. For a regulated institution it is not a tooling problem wearing a productivity disguise. It is a governance problem wearing a productivity disguise.
Lights-out only ever worked because the line proved every unit
Manufacturing went dark decades ago. Plants run unattended overnight, with no one on the floor. That was never possible because the robots were trusted. It was possible because the line instrumented itself. Every unit measured, every tolerance logged, every deviation flagged and stopped, so that in the morning you could prove what the factory made and that it stayed inside spec. Lights-out did not remove accountability. It moved accountability into the evidence the line produced on its own.
Software is now making the same move, and skipping the second half. We are racing to take people off the floor. We have not yet agreed that the floor has to prove what it did. Speed without that layer is just faster failure.
The market is funding the workers, not the supervisor
Look at where the capital and the attention have gone. Almost all of it is on the worker: faster, more autonomous coding agents that plan, write, test, review, and merge. They are impressive, and they are commoditizing quickly. There will be many capable ones, from many providers, and a serious enterprise will run several at once.
A second category is forming around governance, and it is welcome, but it is early and shaped for a different buyer. Some of it is security tooling that scans for the new class of agent risks, a useful taxonomy that only appeared at the end of last year. Some of it is a control surface bolted to a single platform, helpful if your entire estate lives inside that platform, and silent about everything that does not. The category is real. It is not yet built for the institution that has to answer to a regulator.
What almost no one is building is the part a bank actually needs. An independent supervisor that governs the whole floor, works across whichever coding agents you run, and produces evidence an examiner will accept.
Isolation is not governance
The most common answer right now is the sandbox. Give every agent its own sealed environment, a micro virtual machine with its own filesystem and network, and let it work without touching anything that matters. This is real progress, and it is necessary. The industry has been honest that ordinary containers are no longer enough for autonomous agents, and the isolated runtimes that shipped this year are the right substrate.
But isolation answers exactly one question: can this agent damage the host. It does not answer the questions a regulator asks. Did the agent stay inside policy while it worked. Who, or what, authorized the change it merged to production. What data did it touch, and what evidence of any of it survives after the environment is destroyed. A sandbox contains the blast radius. It does not govern the work, and it does not keep the receipts. Containment and control are different disciplines, and only one of them is examiner-ready.
What a governed floor actually does
Strip away the vocabulary and a governed software factory has a short, demanding job description. It constrains what each autonomous worker is allowed to do while it is acting, not after, so a sensitive action is stopped at the gate rather than noted in hindsight. It watches outcomes, not just outputs. It carries identity and authorization onto every change, so a merge into a production system can be traced to who or what approved it and under what policy. It produces a tamper-evident record of what ran, what changed, and what a human signed off on, one that holds up after the agent and its sandbox are gone. It keeps people on the exceptions, the decisions that matter, instead of on the keystrokes. And it does all of this independently of which coding agent did the work, so the institution can adopt the best worker available this quarter without rebuilding its controls the next.
That last property is the one the platform answers cannot offer. A supervisor that only governs its own vendor’s agents is not governing the institution. It is governing one corner of it.
Why this lands on regulated institutions first
A consumer app shipping agent-written code at speed is a productivity win. A bank doing the same thing is a supervised activity. Model risk management built for credit and market models is becoming the floor for AI, not the ceiling, and the supervisory direction is already on paper. Canada’s banking regulator has finalized model-risk expectations that reach AI models and take effect in 2027. Europe’s high-risk regime is moving on its own timeline toward the same destination: documented risk management, record-keeping, human oversight, and proof. None of these regimes will accept “the agent did it” as an answer. They will ask the institution to show what the agent did, that it stayed inside approved boundaries, and that a human was accountable for the result.
This is the real meaning of that cancellation number. The projects that die in regulated shops will not die because the coding agents were weak. They will die at the first audit the institution cannot pass.
There is a name for this
The series has circled this discipline for months. Applied to the software factory, it is the same standing loop. Not a one-time security gate before launch, but assurance that holds every day the floor is running, including the morning a new coding agent is swapped in and the old one swapped out, with identity, authorization, and the audit trail intact across the change. There is a name for it: Continuous Agentic Assurance.
The worker commoditizes. The governed floor is the moat.
The autonomous coding agents will keep getting better and cheaper, and within a year or two the choice of worker will matter far less than anyone selling one today would like. What will not commoditize is the ability to run a dark software factory and still prove, on any given day, what the machines did, under what policy, with what human accountability, across whichever agents happened to do the work.
That proving layer is the durable advantage, and it is the one almost no one is building for the institutions that need it most. Lights-out is coming to software whether the controls are ready or not. The factories that win in regulated industries will be the ones that went dark and kept every receipt.
Continuous Agentic Assurance
iTmethods builds the Trust Layer for enterprise AI, with select regulated enterprises. Reign delivers Continuous Agentic Assurance: the gateway, model-risk validation, evidence ledger, and assurance packs that let an institution run any model, run any coding agent, swap either under pressure, and prove, on any given day, what its AI did and that it stayed inside the lines. Built for the Chief Risk Officer, the Chief Audit Executive, and the audit committee.
Schedule audit-committee briefingPaul Goldman is Founder and CEO of iTmethods, where his team helps enterprises build and govern AI-native platforms, from model and agent control planes to the evidence and continuity that regulated industries require. He writes weekly on AI governance in the agentic era. Building the Trust Layer for Enterprise AI at itmethods.com.
Related reading
- The New Standard for AI Trust Is Here. The Runtime Layer Is Not. (June 18, 2026)
- Canada’s Sovereign AI Stack Has One Layer Left to Build (June 17, 2026)
- Three Days. One Export Order. A Frontier Model Gone. (June 16, 2026)
Sources
- Cognition, Devin annualized revenue and enterprise customers, reported by TechCrunch, May 27, 2026
- Factory, Series C at a 1.5 billion dollar valuation, TechCrunch, April 16, 2026
- Gartner, on enterprise agent adoption and proportional agent governance, May 2026
- OWASP, Top 10 for Agentic Applications, December 2025
- Docker, on micro virtual machine sandboxes for AI agents, 2026
- OSFI, Guideline E-23 Model Risk Management (final, September 2025; effective May 1, 2027)
- European Commission, EU AI Act timeline for high-risk systems
Paul Goldman
CEO, iTmethods
Creator of Reign and Forge. The platform and operational substrate for AI governance in regulated industries. Previously published "MCP Is Exploding. Your Governance Isn’t Ready."
Continue the AI Governance series
Or share your thoughts here
Your comment will appear on this page. The best insights may be shared in the LinkedIn discussion.
Get Paul’s next article before it publishes
Join 500+ security leaders