Systems

Big Systems Thinking

Big Systems Thinking

Tasks, memory, conversations, knowledge — each gets its own CLI-owned store. Ad hoc markdown is banned, and silence is not a valid status.

Every durable fact about this operation lives in a CLI-owned store. Tasks live in todos. Memory lives in mementos. Agent-to-agent chatter lives in conversations. Durable notes live in knowledge. Nothing durable lives in a markdown file some agent decided to drop in a home directory because writing a stray file felt faster than calling the right tool. That rule sounds pedantic until you watch what happens the one time it gets skipped.

A prompt bar bug in one of our terminal apps is the cleanest case I have. It broke once, got fixed, and the fix got posted to the conversations store specifically so it would be findable later — not filed away in someone's memory, not left as a comment in a pull request nobody rereads, but written into the system whose entire job is being searchable. Months later the same bug came back. The instruction to fix it a second time was not just fix it. It was go read how we fixed it last time first. The record existed. It got consulted. That is the entire point of a system of record: not that failures never repeat, but that when they do, the second fix takes five minutes of reading instead of another hour of rediscovery.

A feedback-triage loop shows the other side of the same discipline: what a system of record is for when the news is bad. For two straight weeks, every single run of that loop hit the identical wall — an authentication scope that could not reach the admin endpoint it needed, a stale vault credential nobody had rotated. Instead of re-investigating from zero eight separate times, the loop pulled up a saved memory of the exact diagnosis and short-circuited straight to the same conclusion: still blocked, nothing changed, here is why. It also refused an available shortcut — a live production credential that would have unblocked it immediately — because that credential was not sanctioned for an unattended process. By day fourteen its own note read plainly: production feedback now untriaged for over two weeks. That sentence sitting in the record, honest and searchable, is worth more than a green checkmark would have been. A system of record's job is not to make you look good. It is to make the actual state of the world impossible to lose.

Silence is not a valid status, and I mean that as an operating rule, not a saying. A coordinator caught a worker mid-fleet spinning on the exact same rate-limit error for thirty straight minutes, retrying a call that was never going to succeed, and intervened directly: you have been on this for thirty-plus minutes with the same failure, mark it blocked with a note, move to something that does not need a live server, never stop working, pick a different task now. Nobody was watching that pane personally. The coordinator caught it because the worker's actual state was visible in a shared store instead of trapped inside a terminal nobody happened to be looking at. A worker that goes quiet while technically still running is not making progress. It is producing silence, and silence does not get to pass as a status just because the process is still alive.

That same demand for a checkable state is why every agent has to register itself before it is allowed to touch anything. At the start of a session, before reading a single unread message or picking up a single task, an agent logs into the task system, picks an identity, and announces itself into the shared store. That is not ceremony. It is the difference between a fleet where I can ask who is doing what right now and get a real answer, and a fleet where the honest answer is I have no idea, check the panes yourself.

Memory gets the same structural treatment instead of being left as a vague sense that an agent learned something. Every session distills down into a rollout summary written under three fixed headings — user preferences, reusable knowledge, and failures and how to do differently — filed into the memory store instead of evaporating when the session closes. A lesson that lives only in a closed session is not a lesson. It is a thing that happened once. The same heading structure every time is what makes a hundred of those summaries searchable as one body of institutional memory instead of a hundred unrelated diary entries.

Even self-organization gets routed through the record instead of happening informally in someone's head. The instruction that set up our automation layer was explicit about this: every scheduled skill has to post its work into a shared space, a dedicated area gets created specifically for that category of work, and the agents figure out the division of labor themselves from inside it. Nobody assigns loops to agents by hand. The space itself, populated with everyone's actual postings, is the coordination mechanism. Self-organization only works when there is a shared, durable place to organize around — an idea nobody can act on without a record to look at.

A system of record is also not exempt from the same durability standard it enforces on everyone else, and I have watched that lesson get proven the hard way. The database underneath our own orchestration layer started throwing lock errors under the load of its own fleet writing to it concurrently — the tool built to remember everything briefly could not keep up with the volume of agents trying to tell it something at once. A system of record that cannot survive its own write load is not a system of record. It is a bottleneck wearing a record's job title, and the fix was the same standard applied one layer down: durable, concurrent-safe storage for the thing that stores everything else.

Standing duties get the same treatment, and the mechanism is almost insultingly simple: an artifact or it did not happen. When an agent is handed ongoing ownership of something — watch this subsystem, post one digest, do not spam the channel — the follow-up question is always the same. Did you actually post it. If yes, give me the message ID. A claim of I posted it is worth nothing on its own. A message ID pointing at an actual message in an actual store is the only version of that claim I accept, because the first one is a self-report and the second one is evidence.

A watchdog loop turned up something nobody had asked it to look for and still it went straight into the record instead of getting quietly cleaned up: roughly six gigabytes of untracked working clones, ten separate repos, all created the same afternoon by agents mid-investigation. The loop did not delete them and did not stay silent about them either. It logged exactly what it found, classified it correctly as active work rather than rot, and flagged the disk footprint as a real risk worth watching. That is what a system of record is supposed to do with anything unexpected: write it down accurately, in a place other agents and I can both find it, rather than silently fixing it, silently ignoring it, or silently forgetting it existed by tomorrow.

Ad hoc markdown fails at exactly the things a CLI-owned store is built for. A file dropped under a home directory has no owner, no schema, no way to be queried except by remembering its path and grepping it by hand. Nobody logs into it. Nothing else in the fleet knows it exists unless someone happens to mention it out loud. It cannot be checked programmatically by a loop, cannot be searched across machines, and it rots the instant the one person who knew about it stops thinking about it. A knowledge CLI has commands, a workspace model, and an actual API surface other agents and tools can call. A stray README does not compete with that. It just quietly becomes the one fact nobody else in the system can see.

This is also why the durable stores are plural instead of one giant catch-all. Tasks are not memories. Memories are not conversations. Conversations are not knowledge. Each domain gets its own CLI because each one answers a different question — what needs doing, what have we learned, what did we say to each other, what do we know — and collapsing them into one undifferentiated wiki is how you end up back at ad hoc markdown with extra steps, just with a nicer folder name.

Even the systems of record get checked against reality instead of being trusted by default. A scheduler that fires loops on a timer proves it is actually alive with the cheapest possible test: schedule a run that must reply with one exact string at one exact minute, then check the transcript for that literal string. If the canary is late or missing, the clock the entire record-keeping apparatus depends on is lying, and every timestamp downstream of it is now suspect. A record is only as trustworthy as the clock that stamps it, so the clock gets tested too, on purpose, the same way everything else does.

None of this is a ban on thinking in prose. I dictate walls of intent constantly, in messy run-on paragraphs, and that is fine, because thinking out loud is not the same claim as remembering something durably. The rule is narrower and harder to argue with: once a fact needs to survive past the conversation that produced it, it moves into a store built to hold it, indexed, ownable, queryable by name. A stray file in a home directory is not a record. It is a thought that never finished moving.

None of this is about distrust dressed up as process. It is about making the fleet's actual state something other than a rumor. A human running one team can walk the floor and ask people how it's going. A fleet running across five machines around the clock cannot be walked. The only way to know what is actually happening is to make every durable fact land somewhere queryable, by rule, every time, with no exceptions for the task that felt too small to bother logging.

I have watched what happens when this discipline slips even once — a fix that lived only in someone's head instead of the record, a bug that came back because nobody could find how it was solved the first time. The fix is never clever. It is always the same boring instruction: write it down in the system that is built to remember, not the one that happens to be open.

Silence is not a status. An untracked file is not a record. If it did not land in a store somebody else can query, it did not happen.

← Back to the articles

Newsletter

What we shipped, what broke,
and what we learned