Systems

SQLite under a predictable directory, daemons that survive restarts, idempotency keys so a retried loop can't create duplicates. Agents crash. The state does not get to.
Agents crash. Sessions die mid-task. Machines reboot for a kernel update nobody scheduled. Rate limits cut a run off mid-sentence. None of that is a special case in a fleet of coding agents — it is Tuesday. So the rule for every package I publish is that the substrate does not get to lose data because a process ended.
The default is SQLite under a predictable directory: `~/.hasna/<package>` for almost everything, `~/.codewith` for the coding CLI itself. Not a cloud database, not a remote API call that fails the moment Wi-Fi hiccups — a file on disk, local-first, that works exactly the same on a laptop with no internet as it does on a machine wired straight into a datacenter. Cloud sync is opt-in, layered on top, never the only copy of the truth. I have close to 100 of these local SQLite stores running across a handful of machines right now, and the fact that I can name most of them off the top of my head — one holds threads, one holds goals, one holds sessions — is itself the point. A predictable file in a predictable place beats a clever abstraction every time you actually need to go find your data at 1am. The coding CLI's own thread store holds over 3,100 threads going back a month. Its goals store holds 480 native goals. A separate sessions database holds more than 2,100 Claude sessions and 621 Codex sessions, with well over 100,000 individual messages between them. Three plain SQLite files, none of them a managed cloud product, all of them queryable with the same `sqlite3` binary that ships on every machine I own.
Daemons built on top of that SQLite layer have to survive restarts, and "survive" is a specific, testable claim, not a hope. One fix that shipped read exactly like this in the commit log: survive SQLITE_BUSY and BUSY_SNAPSHOT contention instead of failing the run. That is the unglamorous reality of local-first storage — two processes occasionally reach for the same file at the same time, and the correct behavior is to retry and continue, not to crash and lose whatever the run was in the middle of doing. A daemon that cannot tolerate its own database being briefly busy is not a daemon. It is a script that happens to keep running until the first coincidence.
Every run gets recorded, not just the successful ones. Our loops daemon logs every scheduled execution, including the ones that failed, because a system that only remembers its wins cannot tell you why a nightly job silently stopped firing three weeks ago. The value of durable local state is not the happy path — it is being able to answer "what actually happened at 3am on Tuesday" without asking an agent to remember, because agents do not remember, and self-reports about what happened are exactly the thing you should trust least.
Idempotency keys are what make retries safe instead of dangerous, and the clearest version of this I have shipped lives inside a billing system, where getting it wrong costs actual money. A Stripe webhook can legitimately fire twice for the same event — networks retry, so does Stripe — and the fix is not "try to avoid duplicate webhooks," it is a dedupe key on the event id with a conflict clause that does nothing on a repeat: the second delivery finds the row already there and quietly no-ops. Credit grants carry their own dedupe keys shaped around the thing they are granting — a monthly-grant key tied to the invoice, a top-up key tied to the payment intent — so replaying the exact same event twice produces the exact same ledger state, not a double credit. Idempotency here is not a nice-to-have abstraction. It is the difference between a webhook retry and a customer getting free money because a network blip fired the same event twice.
Fingerprint upserts do the same job for less financially dangerous, more frequent cases: a task-tracking loop that runs every fifteen minutes should not create a new duplicate task every fifteen minutes just because the underlying condition it is watching for is still true. The fix is a stable fingerprint computed from the content of the task, and an upsert against that fingerprint instead of a blind insert. Run the loop a thousand times against an unchanged condition and you get one task, updated in place a thousand times, not a thousand tasks. That is the whole trick, and it is the trick that makes loops safe to run unattended in the first place — a loop that cannot tell "still the same problem" from "a new problem that looks the same" will flood its own task list within a day.
The same discipline shows up in our workflow engine's design doc, stated almost as a warning to future contributors: external systems should not write rows into that daemon's SQLite database directly, no matter how tempting the shortcut looks. The stable contract is an idempotent CLI or SDK upsert, keyed by an idempotency key and a spec hash, with dry-run, preflight, and commit modes layered in front of it so a caller can check what would happen before it happens. Once you let one caller reach around the contract and write rows directly, every guarantee the rest of the system depends on — the fingerprints, the dedupe, the safe retries — stops being guaranteed for anyone.
Test isolation rides on the same substrate, and it is not optional the way it sometimes gets treated. Every package that reads its state directory from `~/.hasna/<package>` also reads an environment variable override — `<PACKAGE>_DATA_DIR` — specifically so a test run points at a throwaway SQLite file instead of the real one. I have chased a CI failure that reproduced locally but not on the runner, and the actual fix was forcing the test environment's home directory explicitly rather than trusting whatever the runner happened to default to — because a test suite that quietly shares state with anything else, even accidentally, stops being a test suite and becomes a source of flaky failures that look like product bugs. An env-overridable state directory is a small thing to add to a CLI. It is the difference between a test that proves something and a test that occasionally lies to you depending on what ran before it.
Scheduled work adds one more constraint on top of all this: a daemon has to be certain it is the only copy of itself running, or two copies will both grab the same due job and run it twice. The fix is a pidfile-backed single-instance guard — the daemon checks for its own pidfile on start, and refuses to start a second copy if one is already alive. Combine that with SQLite state and you get a scheduler that can be killed, restarted, moved to a different machine, and resumed without either losing a scheduled run or duplicating one. Machines reboot. The schedule should not care.
Agent-to-agent messaging needs the exact same guarantees a human's inbox needs, and it gets them from the same substrate. The coding CLI's cross-session mailbox — the mechanism that lets one agent message another agent in a completely different session, not just its own spawned subagents — is backed by tables for messages, delivery attempts, and dead letters, sitting in the same local SQLite file as everything else. A message that cannot be delivered does not vanish. It lands in a dead-letter table where something can retry it or a human can go look at why it stalled. Treat agent-to-agent communication as fire-and-forget and you get the same failure mode email had in the 1990s before delivery receipts existed: messages that silently disappear, and nobody finds out until the task that depended on them never happens.
One of our backup tools shipped in a single session and is the cleanest proof that this stack holds together end to end: bootstrapped, validated against real AWS S3 and RDS accounts with read-only audits so it never touched production data while proving itself, published to npm, and installed on two machines the same day. Nothing about that flow required a staging environment or a multi-week rollout plan, because the durability guarantees were already sitting in the substrate — the tool only had to be correct once, and the local-first daemon underneath it did the rest.
Durability extends to testability in a direction that is easy to miss: a tool that depends on a live API key to run its own test suite is a tool that cannot be verified offline, and offline verification is exactly what you want when the thing under test manages your own machines. Our multi-agent framework's budget governor falls back to a deterministic offline model the moment credentials are missing, specifically so the whole system runs and tests without ever touching a real provider. Nothing in that fallback path is hardcoded to a happy case — it exists so a contributor with no API keys at all can still prove the state machine behaves correctly, which is the same local-first instinct applied to trust instead of to storage: do not make correctness depend on something outside your own machine if you can help it.
None of this is exotic. It is closer to plumbing than architecture, and that is deliberate — the interesting part of an agent system should be what the agent decides, not whether the storage underneath it survives a crash. Models are already probabilistic: the same prompt can produce a materially different result twice in a row, and that is the nature of the technology, not a bug to be designed away. Which is exactly why the layer underneath the model cannot also be probabilistic. If the model's judgment already varies run to run, the state it writes to, retries against, and resumes from has to be the one part of the system that behaves exactly the same way every single time.
Local-first is not a nostalgia trip back to files-on-disk for its own sake, and it is not an argument against the cloud either — plenty of these tools sync upward once the local copy is safe. It is the fastest way to get a guarantee a network call cannot make on its own: the data is there because it never depended on anything else being there first. Add durability, idempotency, and an override for tests, and you get a substrate that can absorb crashes, restarts, retries, and reboots without anyone noticing from the outside — which is the only way an agent fleet gets to run unattended at 3am and still be trustworthy at 9am.