AI Governance

Inside OpenAI, the Platform Team Is the New Trust Boundary

OpenAI's own Data Infrastructure Engineering lead just described, in plain English, why agents need governance separate from the workload. She used the word 'adversarial.' She was describing OpenAI's own engineers' agents touching OpenAI's own platform. The vendor side of the wall has the same problem the regulated enterprise has. The regulated enterprise just has more witnesses.

Paul GoldmanCEO, iTmethods

May 28, 20269 min read

Securing the Agentic Era. Article 16 · AI Governance · Thursday Bridge

SR 26-2 Governance Gap→AI Agent Operations→15 Days. Six Vendor Moves.→A Tale of Two Cities→This Article: Inside OpenAI.

On Tuesday I published the framework for what is happening to enterprise AI in 2026. The trust boundary has moved customer-side. Six vendor moves in twenty-two days, three breaches, one threat actor, all pointing the same direction.

The day before I wrote it, on May 25, a different writer published a different piece. Nate’s Newsletter dropped a long-form interview with Emma, who leads Data Infrastructure Engineering at OpenAI. Emma was not making my argument. She was describing her own operational reality, from inside the company that made two of the six customer-side moves my framework was built around.

We were not coordinated. We were converging.

What Emma said, in plain English, on her own employer’s platform team, is the strongest external validation I have seen for the architecture I have been arguing for twelve months. She used a word for it I have been careful not to use: she said the agents touching her platform have become “almost adversarial.”

She is talking about OpenAI’s own engineers’ agents. Inside OpenAI’s own perimeter. Against OpenAI’s own data infrastructure.

If it is happening there, it is happening everywhere. The regulated enterprise just has more witnesses.

OPENAI WITNESS

DATA INFRA ENGINEERING LEAD

DAYS

AFTER TUESDAY'S FLAGSHIP

WORD

ALMOST ADVERSARIAL

SENTENCES

THE SCALING LAWS LINE

TOPOLOGIES

SaaS · DEDICATED · CUSTOMER · AIR-GAPPED

EM DASHES

LOCKED RULE

What Emma actually said

Emma’s team runs the analytics, streaming, ML infrastructure, training-data pipelines, and event bus that “every other team eventually depends on” at OpenAI. Her vantage point is the bottom of a very large stack. The interesting thing about that vantage point is that she sees what happens when everyone above her starts shipping faster.

In the last six months, OpenAI’s app teams have moved from “artisanal software engineering” (her words) to large fractions of their work being agent-driven. Codex has become enough of a colleague that some release processes now run end-to-end without human intervention. One example she described: a training-data export job, launched by an engineer who then went to sleep, was blocked in the middle of the night. The agent did not wait for a human to triage. It traversed four or five internal systems on its own, located a small bug three layers deep, patched a workaround, and finished the job by morning. The engineer woke up to a completed task. No conversation required.

That is what acceleration looks like at OpenAI right now.

The other example she gave is the one I want to dwell on. A user “vibe-coded” their way to flipping a feature flag they did not understand. The flag was not theirs to flip. It took down the entire Kafka cluster.

She did not soften the framing. She said the agents on top of her platform are “almost adversarial,” not because they intend harm, but because they are highly goal-directed, they will reach for any tool that gets them to the goal, and they will route around internal APIs that were never supposed to be exposed, change interfaces that other teams depend on, and produce code that her platform team is then responsible for running.

“There’s almost like a transfer of responsibility. There’s more code being created really fast and the teams are responsible for ultimately running it successfully and making sure it’s not breaking. They’re inheriting a lot of the burden.”

This is what the vendor side of the wall feels like in May 2026. From the inside.

The architecture she is describing is the third pillar

Emma did not name what she is building. She apologized in the interview for not having a good word for it. She called it “defense in-depth,” and listed the pieces: encoded skills, agent.md files for institutional knowledge, autonomous code review specialized per team, autonomous operations underneath the code-review layer, isolated sandboxes for testing, an evals library used to discover when a new model is finally good enough to trust with a new class of action.

She was, sentence by sentence, describing what I have been calling the third pillar of governed AI. AI Agent Operations is the continuous practice of operating an agent estate inside an environment that has to answer for the actions agents take on its behalf. It is what monitoring and assurance were not built for, and it is the discipline that is going to define vendor selection for the next two years.

Emma is building it for OpenAI. We are building it with Reign for the regulated enterprise. The architectural shape is the same.

The most important sentence she said, and the one I want to put on the wall, was this:

“I do think the incentives will always be somewhat misaligned. That’s why we have people who write code and people who review code, and they’re separate people. I think there should be a multi-agent architecture for this kind of thing.”

That is the architectural argument for why the policy decision point cannot be embedded in the agent doing the work. It has to be a separate agent, operating under a separate authority, looking at the work product through a different lens, encoded with the institution’s specialized knowledge of what can and cannot happen on its perimeter. That separate agent, that is what Reign is. That is what the Trust Layer for Enterprise AI is. Emma made the case for it in one paragraph, from inside OpenAI, without ever using the words.

The “scaling laws” line: three sentences that explain everything

About forty minutes into the interview, Emma said one thing that is going to end up in a hundred decks before the end of the quarter:

“The scaling laws of the upper layers are AI scaling laws, and the lower layers are human scaling laws. That’s not sustainable.”

This is the underlying physics of the problem. The application layer scales at the rate of agent output. The platform and governance layer scales at the rate of human review, human runbooks, and human incident response. The gap widens every week. Every additional unit of acceleration above the platform produces an additional unit of unscoped operational risk below it. You cannot close that gap by hiring. You can only close it by upgrading the platform layer’s own scaling discipline to be agentic itself: autonomous review, autonomous remediation, autonomous evidence capture, with policy authority that does not sit inside the workload.

This is the entire investment thesis behind AI Agent Operations as the third pillar. Emma said it in three sentences. From inside OpenAI.

Why this matters more for the regulated enterprise than for OpenAI

Read Emma’s interview straight through and you will notice something strange. The stakes she describes are real, but they are bounded. A Kafka cluster goes down for an afternoon. A feature flag flips when it should not. A user types “I don’t even know what Flink is” into the support channel. These are operational headaches at the world’s most consequential AI lab.

If the same thing happens inside a Domestic Systemically Important Bank, the operational headache becomes an SR 26-2 governance gap, a DORA Article 19 ICT incident, an EU AI Act high-risk system finding, an OSFI E-23 audit issue, a 21 CFR Part 11 evidence problem, a PCI DSS finding, or a state insurance regulator inquiry. The accountability chain has more witnesses.

The same architectural problem produces different consequences depending on which side of the wall it lands on. OpenAI can absorb the Kafka outage and ship a postmortem. The bank cannot.

This is the bridge from Emma’s reality to the regulated enterprise’s reality. Emma is describing the conditions under which adversarial agent behavior is already happening, at scale, accidentally, inside the perimeter, from goal-directed agents that have no malicious intent. Now run the same conditions through an institution whose ledger of accountability has a regulator at the top of it. The architecture either holds, or the institution does.

Four things to take from this

If I were running a platform team, an architecture review, or a vendor selection conversation in the next thirty days, here is what I would extract from Emma’s interview and put to work immediately.

One. The agent that writes code and the agent that reviews code should never be the same agent. Their incentives are misaligned by construction. Treat the policy decision point as a separate identity, with separate authority, capturing separate evidence. Ask every vendor on your short list how their architecture separates these two roles.

Two. Capture institutional knowledge in encoded skills, runbooks, and agent-readable policy. Emma is doing this at OpenAI. Reign is doing this for the regulated enterprise. The institutions that do it will scale. The institutions that do not will absorb the burden of every adversarial-by-accident agent action their app teams produce.

Three.Build a private eval suite, even if it is janky. Emma’s exact words. A notion doc with inputs and expected outputs is enough to start. The discipline you are building is not the suite. It is the muscle of knowing, every time a new model drops, whether the model has crossed the threshold for a new class of autonomous action on your platform.

Four.Topology portability is the architectural commitment of the decade. The same identity, the same policy decision point, the same evidence pipeline, the same runtime enforcement, working identically across SaaS, dedicated cloud, customer cloud, and air-gapped. Tuesday’s flagship made this case for the trust boundary. Emma’s interview made the case from the other side of the wall. Both sides are converging.

What we are doing about it

Reign is the policy decision point in the operational path of every model invocation and every tool call, sitting above the workload, capturing regulator-grade evidence continuously. Forge is the managed infrastructure layer that operates regulated workloads under the same governance, the same evidence pipeline, and the same operating model regardless of where the bytes sit. The four-topology operating model is the architectural commitment. The full continuous remediation capability inside AI Agent Operations is in active development with design partners in regulated industries. We are honest about what is shipped and what is in flight.

What we are not doing is asking the same agent that writes the code to review the code. Emma argued against that on her employer’s behalf. We are arguing against it on the regulated enterprise’s behalf. The architecture is the same. The witnesses are different.

The bottom line

The strongest case for the trust boundary thesis this week was not in Tuesday’s flagship. It was in an interview that dropped the day before, with OpenAI’s own Data Infrastructure Engineering lead, who described, without naming what it is, the architecture we have been building Reign to be.

Adversarial-by-accident agents are already inside OpenAI. They are running into the same wall I have been arguing the regulated enterprise has to put around itself for two years. Emma’s team has options. They can iterate, push back releases, refuse to launch features, and write the next runbook themselves. The regulated enterprise has fewer options. The regulator decides the timetable, and the regulator does not care which cloud the action emanated from.

If the vendor side of the wall is already this hard, the customer side of the wall is not optional architecture. It is the only sustainable scaling law.

Tuesday’s flagship called this a tale of two cities. The cities are converging. Two days after Tuesday, we have a witness from inside the city that defined the cloud-native AI model in 2022, telling us in plain English that the architecture has to change.

The architecture is changing. The question for every vendor on your shortlist, and every governance team in your institution, is whether they are building toward the convergence or away from it.

I know which side I am building from.

Building the trust layer for enterprise AI

Reign is the separate-agent policy decision point in the operational path of every model invocation and every tool call. Forge runs the managed substrate beneath. Same identity, same evidence pipeline, same operating model across SaaS, dedicated cloud, customer cloud, and air-gapped. Talk to engineering.

Talk to engineering

Paul Goldman is Founder and CEO of iTmethods. He has spent 21 years building managed infrastructure for regulated enterprises and writes weekly on AI governance in the agentic era. Building the Trust Layer for Enterprise AI at itmethods.com.

Paul Goldman

CEO, iTmethods

Creator of Reign and Forge. The platform and operational substrate for AI governance in regulated industries. Previously published "MCP Is Exploding. Your Governance Isn’t Ready."

Follow on LinkedIn Learn about Reign

Continue the AI Governance series

Or share your thoughts here

Your comment will appear on this page. The best insights may be shared in the LinkedIn discussion.

Get Paul’s next article before it publishes

Join 500+ security leaders