What DevOps Taught Me About AI Governance

#devops #aigovernance #platformengineering #mlops

Originally published at devopsdiary.blog

The teams adopting AI coding tools the fastest are the same teams that would never deploy to production without a pipeline.

I've been watching this for two years. Engineers who pushed back on manual deployments, built approval gates and rollback runbooks, spent months getting GitOps through architecture review. Those same engineers are committing AI-generated code with no review policy, no acceptable use boundary, no way to answer "what did this tool actually do to our codebase."

The governance instincts are there. They've just been turned off for AI.

DevOps gave me a set of instincts I didn't appreciate until I started watching AI adoption.

The most visceral one is blast radius. Before you ship anything, you ask: what's the worst this can do, and how do you contain it? Feature flags, canary deployments, rollback runbooks: all of them exist because shipping without blast radius thinking isn't engineering. It's gambling with production. Sitting right next to it is auditability. In a regulated environment, "it worked" isn't an acceptable answer to "what happened?" You need to know who approved what, when, under what conditions, and what the rollback path was. Not bureaucracy for its own sake. That's what lets you recover without a three-week forensic investigation.

AI tools write code. That code goes into production. The blast radius question barely gets asked. The auditability trail ends at the commit. The model, the prompt, the context: gone.

Then there's measurement. In November 2023, I built a dashboard that showed teams exactly how slow their pipelines were. Some of them hated it. Not because the data was wrong. Because visible things require a response, and these teams had spent months not responding. That friction was the point. You can't govern what you can't see.

Nobody is measuring how AI tools are affecting their delivery pipeline. Not throughput, not defect rates, not review time for AI-generated PRs, not acceptance rates. The data exists in theory. Nobody's collecting it.

The translation from DevOps to AI governance is straightforward on paper:

DevOps instinct	AI governance equivalent
Blast radius analysis before deploy	Scope controls on what the AI tool can touch
Approval chains and auditability	Model, prompt, and context captured in the commit trail
Pipeline measurement	AI delivery metrics: acceptance rate, defect rate, review time
Rollback runbook	Policy to constrain or disable a tool when it misbehaves

The table isn't the hard part. The decision to build it is.

These instincts didn't come from theory. They came from watching what happens when they're actually applied.

Getting GitOps through enterprise architecture review at Edward Jones took six months. Six months of presentations, security questions, "can you come back next cycle," and conversations with architects who needed to understand blast radius before they'd sign off. The process felt slow. It was slow. But I understood why it existed. A new deployment methodology touching hundreds of production pipelines warrants that kind of scrutiny.

GitHub Copilot arrived a year later. Teams were using it in production code within weeks of the pilots starting. No architecture review. No acceptable use policy. No measurement framework. Just "the demos land and teams want it."

GitOps got six months of scrutiny. Copilot got a pilot program and a Teams channel.

The difference is cultural. GitOps looked like infrastructure, so infrastructure governance applied. Copilot looked like a developer tool, so it went through the same review path as a new IDE plugin: essentially none.

AI coding tools write production code. That makes them infrastructure. The governance posture should match.

Platform engineers already know how to do this.

Apply blast radius thinking to AI: scope what the tool can touch, define what it can't, build the controls before you need them instead of after something breaks. Track auditability: capture the model, the prompt, the constraints, and the review that happened before the code shipped. Not for compliance theater. For the forensic investigation you'll eventually need. Measure: instrument the AI delivery pipeline the same way you'd instrument anything else. Two months of your own data will tell you more than any vendor benchmark.

None of this requires new tooling to start. It requires someone in the organization to decide that AI-generated code is production code, and production code gets governed.

That sentence is the whole shift. Everything else follows from it.

I spent most of 2024 watching the governance gap widen while the tooling race ran ahead of it. Every week there was a new agent framework, a new coding assistant, a new benchmark claiming another percentage point on SWE-bench. Very few conversations about what any of this looks like when it's been in your codebase for 18 months and something goes wrong.

AIEOS started from that frustration. The instincts built into it (blast radius, auditability, measurement, approval chains, rollback) are DevOps instincts. They translate directly. Most organizations already have engineers who understand all of this. What's missing is the decision to apply it.

That decision is the one most teams haven't made yet.

DEV Community

What DevOps Taught Me About AI Governance

Top comments (0)