Systems

Hard Limits Are Governance

Hard Limits Are Governance

No file over 700 lines. Subagent concurrency capped by policy, not vibes. Numbers are how the system says no on your behalf at 3am.

No file over 700 lines. That is not a style preference, it is a rule an agent checks before it lets a change through, and it does not bend for money-critical code either. Say it as a guideline and every agent interprets it differently by Friday. Say it as a number and there is nothing left to interpret.

A Stripe billing handler hit 1,214 lines and got split into seven modules, none over the ceiling. A billing API file hit 980 lines and became seven cohesive files, the smallest 31 lines, the largest 284. An SDK barrel hit 1,142 lines and became a thin re-export plus nine modules underneath it. A UI component hit 823 lines and came out the other side as a 74-line shell, a 610-line controller hook, and a 293-line presentational component. Every one of those numbers is real, every refactor kept the exact behavior of the file it replaced, and every single output file landed under 700.

The number itself is almost arbitrary. What is not arbitrary is that it is a number at all. "Keep files clean" is a request an agent can interpret four different ways depending on its mood that session. "Under 700 lines" is a fact an agent can check with `wc -l` and a fact I can check the same way, and neither of us has to argue about whether a given file feels too big. A monolith that mixes nine unrelated concerns in one file is not just hard for a human to review — it is the file every agent working in that repo has to load into context before touching anything nearby, and it is the file two agents working in parallel are guaranteed to collide on. The ceiling is not aesthetic. It is a concurrency policy wearing a code-style costume.

Concurrency itself gets the same treatment, and I have changed the actual number more times than I care to admit. The global rule at one point capped subagent fan-out at two running in parallel. I raised it to six. Then, mid-incident, staring at a machine that clearly had headroom to spare, I told an agent to ignore the six-agent rule I had written myself days earlier and run more. That is a real failure mode of hard limits — the person who set the number gets annoyed at their own number under pressure — and the fix was not to delete the rule, it was to write the exception into the rule. The policy now states explicitly that if I say bypass the cap, the agent bypasses the cap. The override is not a workaround anymore. It is a clause.

Six later became sixteen on a machine that could take it, and both numbers live in the same rules file as everything else — not a comment, not a Slack message from three weeks ago that nobody can find, a line of policy an agent reads before it decides how many of itself to spawn. When I wanted a large mechanical rename to run without hammering rate limits, I set the cap low on purpose: six disjoint agents editing non-overlapping files, deliberately conservative, because the task was wide and shallow and did not need sixteen agents stepping on each other's rate-limit budget. The number is not always the ceiling. Sometimes the number is a floor I set below the ceiling because the ceiling would have been the wrong tool for that particular job. And when a read-only architecture review genuinely needed more hands than the standing cap allowed, the exception was explicit and scoped — stated as "exceeding the previous cap is authorized for this task" — not a quiet workaround, a recorded exception with a reason attached to it.

Getting the number wrong in the conservative direction is its own kind of failure, and it is just as real as getting it wrong by being too permissive. A CI suite of 692 files ran with concurrency capped at one, dead serial, because at some point that felt like the safe setting. It turned a test run into a 60-plus-minute wait, every single time, on every single push. Raising the cap back to a real number — the platform default of 20 — cut the same suite down to five to ten minutes with zero new failures, because the isolation the low cap was supposedly protecting was never actually at risk. A limit that exists out of caution instead of evidence is not governance. It is superstition with a config flag.

Not every limit is a plain integer either. Some are ranges, clamped on both ends. Our dispatch tool computes how long to wait before hitting Enter after pasting a prompt into an agent's pane — a delay based on word count and character count — but the formula is wrapped in a floor and a ceiling: never under 150 milliseconds, never over 4,000, regardless of how short or how long the prompt is. A QA tool that fans work out across sandboxes bounds its worker count to a range, one to twelve, and a request body to a size, one mebibyte, so a caller cannot accidentally ask for unlimited parallelism or send a payload that chokes the service on the other end. None of these ranges were derived from a formula proving they are optimal. They were derived from watching what broke below the floor and what broke above the ceiling, then writing both numbers down so nobody has to relearn the lesson by breaking it again.

The same instinct shows up in a place I did not expect it to: a UI block registry — the finite set of block types a canvas is allowed to render — capped at sixteen. Not because sixteen is a magic number, but because an unbounded registry is an invitation for every feature request to become a new block type nobody consolidates, until the canvas is a museum of one-off components that all do roughly the same thing. Auditing that registry became its own standing task, because a cap only works as governance if someone occasionally checks that what is inside it still deserves to be there. That audit was not a one-off either — it sat as one line item on a tracking task alongside five unrelated fixes, the same list any of them would show up on, because a cap that never gets re-examined just becomes sixteen slots of accumulated debt with a number stapled to the front.

Other limits are not numeric at all, they are binary lists, and the number that matters is zero exceptions. The exec path an agent uses to run shell commands refuses a fixed set of destructive patterns outright: formatting a filesystem, a fork bomb, curl piped straight into bash, anything that rewrites a file under ~/.ssh. No agent gets to argue its way past that list with a sufficiently good reason, because the entire premise of the list is that a sufficiently good-sounding reason is exactly how a destructive command gets executed by something that does not actually understand the consequences. A hard limit that can be argued down is not a hard limit. It is a suggestion with better formatting.

Some limits are not about a count at all, they are about refusing a middle setting. MCP mutations across our tools are off by default, and turning them on requires typing an exact confirmation string rather than answering yes to a prompt — because a yes/no gate is something an agent can talk itself past in the flow of a task, and an exact string it has to reproduce character-for-character is not. This is the same shape as a stance I have argued in public more than once: either you sandbox agents running on your machine properly, or you accept the risk of running them unsandboxed. There is no partial safety. A hard limit is the code version of that same refusal — not a dial you can nudge a little past comfortable, a wall you either clear the explicit condition for or you do not pass at all.

Heartbeats and stamina work the same way, applied to time instead of to a count. An agent that has to check in on a schedule and gets paused the moment it misses one cannot quietly run away with a task for six hours unsupervised. An agent that burns through actions faster than its stamina budget allows gets slowed down before the burn becomes a runaway loop draining a rate limit or a wallet. Both are biological metaphors for the same underlying fact: agents will use all the room you give them, so the room has to have a wall, and the wall works better as a number than as a feeling.

The counterintuitive part is that all of this discipline is what makes real throughput possible instead of what slows it down. Agents that talk to each other and coordinate through a bounded set of caps and file-size ceilings do not collide as often, which is the actual reason a fleet of them can push tens of thousands of lines of committed, working code in a day without turning into a merge-conflict factory. The limits are not the tax on speed. They are the reason speed does not collapse into chaos the moment you add a second agent to the room.

None of these numbers are permanent and none of them were right on the first guess. 700 lines might be 600 next year. Sixteen subagents might become thirty on a bigger machine. The concurrency default that fixed a 60-minute CI run might need retuning the day the test suite doubles. What stays constant is the shape: a hard number, written down, checkable by a script, changeable through the same rules file everyone reads instead of through a one-off conversation nobody archives. A limit that lives only in someone's memory is not governance, it is a mood, and moods do not hold at 3am when an agent is three levels deep in a delegation tree and the person who could have said no is asleep.

That is the actual argument for hard limits over guidelines. A guideline needs a human paying attention to enforce it. A number needs nothing — the agent hits it, the check fails, the run stops, and nobody had to be awake for any of that to work correctly. Agents will happily generate infinite complexity, spawn infinite subagents, and run infinitely long if nothing stops them, not out of malice, just because nothing told them where the edge was. Numbers are how you tell them, in a language that does not need to be re-explained every session and does not care whether you are watching.

Write the number down. Make it checkable. Let it say no so you do not have to.

← Back to the articles

Newsletter

What we shipped, what broke,
and what we learned