AI

Building My Own AGI Lab

Building My Own AGI Lab

An AGI lab of one builder, a fleet of agents, and close to 100 open-source building blocks. Not a metaphor — a floor plan.

I run Hasna as an AGI lab of one builder and a fleet of AI coding agents. Not a metaphor. The tmux sessions are the lab, the agents are the staff, and the building blocks are close to 100 open-source repositories I ship under Apache 2.0.

People hear "AGI lab" and picture a research campus with badge readers and a compute cluster the size of a warehouse. Mine is a laptop, a handful of named machines, and agents that never stop working. No campus, no cafeteria, no headcount slide for an investor deck — just a delegation tree with me at the top and a fleet underneath that does not sleep.

Here is the moment it stopped being a joke to me. I posted this in April, half mocking the industry's favorite phrase: AGI achieved internally, I'm running 8 agents and 1 god agent to build my app, they all work in tmux, they all have their own tasks, messages, memory, and browser, outside of Claude, shared between each other. They communicate, they do tasks, and they watch over each other's work. I wrote it as a bit. Then I actually looked at what was running on my screen and it stopped being a bit.

Eight agents plus one coordinator, each with its own task list, its own memory, its own browser, in separate tmux panes, none of them touching a keyboard I own. That is not a metaphor for a lab. That is a lab, and it fits on one screen.

The machines have names because a lab needs a floor plan. spark01, spark02, apple03, apple06, remote-spark06 — a small fleet connected over Tailscale, each one running its own agents against its own repos, all reachable from wherever I happen to be. I run sixteen or seventeen subscription profiles across that fleet at once, because a coding subscription is not a seat you sit in, it is a unit of compute, and a fleet needs enough units that no agent ever sits idle waiting on someone else's rate limit.

That is the part people miss when they picture a solo AI operation as one guy with one chat window open. Compute is the leash. The size of your lab is the number of agents you can keep fed at once, and feeding them means machines, profiles, and a scheduler that treats a twenty-dollar subscription as an allocatable resource, not a tool you open when you feel like coding that day.

The primitives are the other half of the lab, and they came before most of the products did. Close to 100 open-source repositories under @hasna/* — todos, knowledge, conversations, mementos, secrets, dispatch, loops, machines, browser, economy — each one a CLI, an MCP server, and an SDK wrapped around the same core engine. I call them building blocks for AGI because that is the literal job they do: when I need a new product, I do not start from a blank file. I assemble.

The naming is not decoration, it is a taxonomy. Package names are plural nouns — todos, secrets, mementos, machines, sandboxes, guardrails — and the rule behind the naming is one noun, one domain, one package that owns it. A lab where every tool has an obvious name and a single owner is a lab you can actually navigate at 2am. A junk drawer of cleverly named internal modules is not.

Alumia is the proof that the primitives pay off. When the team sat down to rebuild it for what turned out to be the fourth time, the instruction to the fleet was never "write a chat app." It was to use our own SDKs without depending on their internal data, because all of it is like legos, interconnected, so we do not have to rebuild everything into Alumia from scratch. The task manager, the memory layer, the agent-to-agent messaging, the secrets vault — all of it already existed as its own package, tested, published, running quietly on other machines. Alumia ended up being mostly wiring, and wiring is fast. Rebuilding a task manager from zero every time you start a new product is not ambition. It is amnesia.

None of this makes me smarter than the model. It makes the lab bigger than one context window. A frontier model with no memory of yesterday and no access to a fleet is a very capable intern who forgets your name every morning. The lab is the part that remembers, schedules, and hands that intern the right tool at the right moment, a hundred times a day, across five machines, without me typing the same instruction twice.

Division of labor inside the lab is explicit, down to the names. Codewith's own explorer subagents are named after scientists and philosophers — Tesla, Lovelace, Arendt, Hume, Poincare, Chandrasekhar, Helmholtz, Singer — and each one gets exactly one job: research, or design, or verification, never all three. One agent, one job is not a slogan I put on a slide. It is how you keep a fleet from turning into eight versions of the same generalist stepping on each other's files.

The lab also has an ops floor, and it looks nothing like a dashboard. Agents get named — mostly Roman, Marcus, Cicero, Seneca, Cato — and a Telegram bridge lets them post into a group the way a real team posts into Slack, complete with voice notes read out in per-agent voices when I am away from a screen. When I hand an agent a standing duty — watch this subsystem, post one digest, do not spam the group — the follow-up question is always the same: did you actually post it? Reply with the message ID. A lab where the staff cannot prove they showed up is not a lab. It is a rumor.

The lab keeps its own books too. Every dollar gets tracked — cost per account, per agent, per machine, reconciled against the actual billing sources, not estimated after the fact. That sounds like accounting because it is. Token economics is an operating discipline in the pre-AGI era, and a lab that cannot tell you what yesterday cost is not a lab, it is a hobby with an invoice arriving later.

Quality control runs on a sweep, not a vibe. A recurring fleet check classifies every pane on every machine into one of five states — working, idle-done, stuck, errored, or blocked on account usage — and the standing instruction attached to it is five words: verify everything, never trust an agent self-report. A pane that says it finished and a pane that actually finished are two different facts, and a real lab does not confuse them just because both look the same from across the room.

I do not write much code myself anymore, and I say that as a fact, not a boast. Someone asked me recently how I actually run things day to day, and the honest answer was: I never run code on my main computer. I use it as an orchestrator and code on the others. My job is deciding what gets built, dispatching it to the right agent on the right machine, and reading the evidence when it comes back. That is a different job from the one I had five years ago, and it is the job running an AGI lab of one actually requires.

The delegation tree only holds together because coordinators do not touch product code. That is not a courtesy, it is architecture. The moment a coordinator starts editing files instead of dispatching them, you lose the one thing a solo lab cannot survive without: a clean line between the layer that decides and the layer that executes. I enforce that line the boring way, in writing, synced across every machine — coordinators delegate, verify, and report, and nothing else. When a real architecture question came up this year — whether a piece of infrastructure should live as its own product or fold into an existing one — I did not settle it over coffee. I dispatched twelve read-only reviewers across two model tiers, six roles each, and made every one of them post the strongest argument against their own recommendation before the group got to vote. That is not a meeting. That is a lab running an experiment on its own decision.

The lab also has to defend itself, because open source cuts both ways. One of our own npm packages got hijacked by someone who was not us, and we reclaimed it a patch release later. The lesson did not stay a war story — it became a standing loop that checks for supply-chain attacks before anything else is allowed to update. A lab that ships in the open has to run security like a lab, not like a hobbyist crossing fingers on every install.

170-plus billion tokens is the number I give people when they ask how real any of this is. Not because the count itself is impressive — token totals are like frequent flyer miles, mildly interesting and mostly beside the point — but because it is the only honest proxy for how much of this fleet has actually run, rather than merely been designed on a whiteboard. Plans that never execute do not spend tokens. Mine have spent more tokens this year than most funded startups will spend in five.

What actually separates a lab of one from a lab of two hundred people is not ambition. It is the ratio of judgment to execution. A frontier model decides what a fix should look like exactly once. A cheap model, a loop, or a piece of deterministic code then executes that decision a thousand times without me anywhere near the room. Smart model writes the plan, cheap model or plain code runs it. That ratio is the entire economics of operating something real on one person's attention and one person's card on file.

I do not think this stays a curiosity limited to people who already run infrastructure companies. AGI is not going to be as abundant or as evenly distributed as the keynote slides suggest, and it is not going to arrive pre-assembled as a subscription you inherit for free, sized to your needs. Somebody still has to wire the primitives together, name the machines, keep the fleet fed, and decide what gets built next. It is up to each of us to build our own personal AGI, not to wait for someone else's lab to hand us a finished one. Building your own AGI lab is not a privilege reserved for the well-funded. It is available to anyone with a laptop, a stack of open building blocks, and the discipline to let the agents do the typing while you do the deciding.

I built mine with a laptop, five machines, and a stubborn habit of publishing the primitives instead of hoarding them.

Are you building yours?

← Back to the articles

Newsletter

What we shipped, what broke,
and what we learned