The Org Chart Lives in tmux

01-06-26

Named machines, named agents, a coordinator that never writes code, and a hard cap on how many can run at once. Not a metaphor. An actual operating structure.

Every repository gets a tmux session. Every session name matches its window name. Every machine has a name — spark01, spark02, apple01, apple03, and however many more the registry grows to. None of that is decoration. It is the addressing scheme that makes it possible to say "work on this" to a specific pane, on a specific machine, and have that instruction land on the thing it was meant for instead of on whatever happened to be running there last.

Agents get names too, and the naming is not whimsy, it is collision avoidance. Before touching a single task, an agent checks which identities are already active and picks an unused one from a fixed pool — Roman emperors for one fleet, Greek gods for another, scientists and philosophers for a third — then registers that identity across every coordination tool it will use: the task tracker, the message channel, the memory store. Only after that registration does it look at what it is supposed to do. Skip that step and you get two agents both quietly claiming to be the same worker, writing to the same task, and neither noticing the other exists.

The clearest demonstration of why any of this matters happened during what got called the Codebase Freeze. Eight agents were mid-sprint on one codebase, each with an explicit, narrow lane — one held lead, git, and deploy and nothing else, one owned features, one owned the server, one owned the public app, one did browser QA, one did tests, one did general development, one did regression. Deploys started failing anyway — parallel agents were independently "improving" shared type files without committing the consumers in the same breath — and a coordinator identity broadcast a freeze to all eight at once: stop refactoring, read-only analysis and additive tests only, wait for an explicit "clear" from the one agent holding deploy authority. Classic multi-agent-on-one-repo failure, and the fix was not a smarter type-checker. It was an order with a name on it.

The same coordinator caught something a static check never would have: one of the eight had been spinning on the identical rate-limit error for over 30 minutes, retrying the same failing call in a loop that looked, from the outside, like active work. The intervention was direct — stop, mark the task blocked with a note explaining why, move to a different pending task that does not need a live server, and do not stop working entirely just because one path is blocked. That is "never trust an agent self-report" happening in real time, enforced not by the human watching a dashboard but by another layer of the fleet whose entire job is to watch worker output and un-stick whatever is quietly failing.

Getting the headcount right turned out to matter more than getting the agents smart. A full exhaustive rename across a codebase got dispatched as 24 agents at once — 20 edit shards plus 4 verification agents — and every single one of the 24 died on its first message: rate limited, zero tokens spent, no real model turn ever happened. The fleet's own failure log was misleading about why, reporting a downstream symptom instead of the actual cause. The fix was not a better retry strategy. It was fewer agents: the same rename, resubmitted as six disjoint agents editing non-overlapping files, explicitly framed as low concurrency to avoid the exact rate limit that had just killed the first attempt. All six finished cleanly, real tokens spent, real commits landed, in under two hours. Twenty-four instantly self-destructed. Six succeeded completely. Somewhere between those two numbers is the actual ceiling for how many agents that account tier can run at once — and the only way to find it was to hit it and watch two dozen agents die on message one.

Machine identity has to be right for any of this addressing scheme to mean anything, and it is easier to get wrong than it sounds. One fleet's own status bar reported the wrong machine name in its footer — spark01 displaying spark02 — which sounds cosmetic until you remember that cross-machine coordination depends on a human or a script reading that label to decide where an instruction should go. The obvious fix, pulling the hostname from the operating system directly, turned out to be wrong too: checking it against every machine in the fleet first revealed that one of them reports its literal system hostname as something generic and unhelpful, which would have just relocated the same hardcoding one layer down instead of removing it. The actual fix had to fall back through multiple sources and was verified against all five machines with a before-and-after table, not just the one machine where the bug was first noticed. A fleet where machine identity can silently drift is a fleet where "check spark02" and "fix it on spark01" stop being reliable instructions.

The tmux layer itself needs to survive being the single point of failure it obviously is. On one machine, the entire tmux server vanished without warning, taking 53 sessions and 75 windows with it in one instant — including 18 panes running live agents, some of them with days of accumulated work in their scrollback. No crash signal, no OOM kill, nothing in the system logs to explain it. What made this a recoverable incident instead of a catastrophe was a session snapshot taken 26 minutes earlier by a background auto-save. The entire layout got rebuilt from that snapshot instead of from memory of what 53 sessions were supposed to contain. The lesson generalizes past that one crash: the tool coordinating everything else needs the same durability discipline everything else gets held to. A tmux server is a process. Processes die. If the only record of your fleet's layout lives inside the process that just died, you have built exactly the single point of failure the rest of the architecture was designed to avoid everywhere else.

Respecting another agent's work is not a courtesy in this setup, it is a structural requirement, and the failure mode when it breaks is almost funny in how literal it is. A standing coordinator's own tmux window — the one running the process responsible for watching the rest of the fleet — got quietly repurposed. Someone else's unrelated fix work for a completely different tool ended up running in that same pane, and the coordinator sat silently stalled for the better part of an hour with nobody noticing, because from the outside an occupied pane looks exactly like a working one. Two scheduled checks that coordinator was supposed to fire came and went overdue with nothing watching for them. This is the same failure the public conversation about agents deleting each other's files is pointing at, just at the level of infrastructure instead of files: coordination between agents is crucial, and respecting the space another agent is actively using is step one, whether that space is a source file or a terminal pane.

Standing duties get assigned the same way a human manager assigns an on-call rotation, and they stick. One agent got told, mid-conversation, to take ownership of loop oversight for its machine going forward — check the scheduled infra, security, and health tiers, then post exactly one digest to a shared channel, with an explicit instruction not to spam the channel with more than that. It was not a one-off request. It was a standing duty, handed to a specific named agent, expected to persist across sessions without being re-explained every time. A fleet that only knows how to execute one-shot instructions cannot hold that kind of ongoing responsibility. One that assigns durable ownership to a named identity can.

Even physical logistics get folded into the same fleet structure when a machine cannot do something on its own. A headless machine with no interactive browser cannot complete an OAuth login by itself, so the login code gets relayed by hand across the fleet — an authorization URL opened on a different, browser-capable machine, the resulting code copied back, the same pattern repeated for whichever service needs a human to click through a consent screen. It is a manual bridge, not an elegant one, but it is still part of the addressing scheme: the fleet knows which of its machines can see a browser and routes the parts of the job that need one there, instead of pretending every machine can do everything.

Hygiene across all of this needs its own watchdog, because agents left running for hours will generate side effects nobody explicitly asked for. A scheduled guard checking one repository's working tree found ten separate nested project clones sitting in the repo root, six gigabytes total, every one of them created the same afternoon by a batch of parallel investigation agents that had each been given its own full checkout to work in. The guard correctly read the timestamps and file shapes as active work rather than abandoned rot and left it alone, but it flagged the disk footprint and the risk of a careless `git add .` picking up six gigabytes of someone else's clone by accident. Nobody was watching that disk fill up in real time. The loop was.

All of this sits inside a structure that looks less like a script and more like an org chart. At the top, a human who does not type code directly into any of these panes. Below that, a layer whose entire job is oversight — sweeping every pane on a schedule, classifying each one as working, idle-and-done, stuck, errored, or blocked on account usage, and never accepting a worker's own claim of "done" as sufficient evidence. Below that, coordinators scoped to one repository or one initiative, forbidden by standing rule from writing product code themselves — their output is delegation, verification, and a report, not a diff. Below that, the workers who actually touch files, each holding one narrow lane, each registered under one name, each expected to report status rather than go silent, because silence is not a valid status in a system where the only way to know an agent is stuck is for something else to notice.

None of the individual pieces here are exotic. Named sessions, named machines, named agents, a coordinator layer that watches instead of codes, and a headcount that gets tuned by hitting its ceiling and backing off — every one of those is a small, boring decision on its own. What they add up to is a fleet that can survive a rate limit, a vanished tmux server, a hardcoded hostname, and one agent quietly wandering into another's space, because every one of those failure modes was hit once, named, and fixed at the layer where it actually lives instead of papered over with a retry. Scale a fleet of agents far enough and you stop needing better agents. You need better addressing.

← Back to the articles

The Org Chart Lives in tmux

What we shipped, what broke,and what we learned

What we shipped, what broke,
and what we learned