Loop Engineering: The Inner Loop, the Outer Loop, and the Gate

There is a sentence that went past eight million views this year and quietly reset how a lot of people talk about building with AI: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." Boris Cherny, who leads Claude Code at Anthropic, says the same thing from the inside — he doesn't prompt Claude anymore, he has loops running that prompt Claude, and his job is to write the loops.

Everyone repeated it. Almost nobody explained the one word doing all the work.

What is a loop? Not the slogan — the machine. Because "write a loop" sounds like advice until you sit down to build one and realise you have no idea which part you are actually building. Is the loop the prompt? The tool? The while statement? The framework? This post is the manual for that part. It is the free companion to the video, and the full 57-page course if you want the whole machine end to end.

First, the ten words

Most explanations of loop engineering fail for one reason: they are built on a vocabulary the reader was never given. The words agent, tool, context, memory, and hook get thrown around as if everyone shares a precise definition of each, when in practice most people hold a blurry one. Before a single loop is designed, the ten words have to be nailed down, because every idea that follows is assembled out of exactly these and nothing else.

The ten words the field skips

Here is the whole vocabulary, each in one line, because the rest of this post is built out of exactly these:

Model — takes text in, returns text out, then stops. The raw engine.
Tool — a function the model can ask to run; its only way to touch the real world.
Context — the text the model can see this turn. Its entire working memory of the moment.
Memory — what survives between turns, on disk, because context does not.
Hook — a place the tool lets you intercept the agent's lifecycle (for example, when it tries to stop).
Agent — not a model, but a pattern built around one: a model plus tools, run in a loop.
Spec — the text that says what the loop is trying to do. Just words on disk.
Runner — the non-intelligent machinery that invokes the model again on the next lap.
Gate — the external check that decides whether the loop is done. Not the agent's opinion.
Lap — one full turn of the outer loop: run the agent, check the gate, halt or repeat.

Three of these matter most for what follows. A model takes text in and returns text out — and then stops. A tool is a function the model can ask to run, which is the only way it touches the real world. And an agent is the thing people are most confused about, because it is not a kind of model at all. It is a pattern built around a model. Get that one straight and the rest of this stops being jargon.

An agent is already a loop

Start with the thing you already have. A model is not a mind that decides to keep working. It takes text, returns text, and then falls silent. It does not restart itself. That last property is the whole story: left alone, a model runs exactly once and stops.

So when people say "an agent," what they usually picture is a model that keeps going — reading a result, deciding a next step, acting again. That keeping-going is not a property of the model. It is a loop wrapped around the model. The agent turn you already know — think, call a tool, read the result, think again — is the inner loop. It is real, and the model drives it, but it ends. The turn finishes and the model goes quiet.

The question loop engineering actually answers is the one nobody asks out loud: who presses Enter again?

The two loops

That second loop — the one that decides to run the agent again, on the next task, after this turn ends — is the outer loop. In a hand-driven workflow, the outer loop is you. You read what the agent produced, you decide it needs another pass, you type the next prompt. You are the finger on the key.

The inner loop and the outer loop

Loop engineering is the discipline of automating that finger. Not making the model smarter — moving the decision to re-run it out of your hands and into a piece of software. The inner loop is the agent thinking. The outer loop is the world deciding the agent gets another turn. Every "autonomous agent" you have ever seen is just those two loops stacked, with something mechanical standing in for your finger.

This is why the "stop prompting, write loops" framing landed so hard. It is not telling you to prompt better. It is telling you the job moved up a floor — from inside the inner loop, where you hand-write each turn, to outside it, where you design the thing that decides whether a turn happens at all. But it stopped there, at the slogan. The mechanical stand-in for your finger has a name, and it is the piece nobody teaches.

The missing machine: the runner

Between one lap and the next there is a gap where nothing happens. The model returned its text and stopped. Something has to cross that gap and invoke the model again. That something is the runner — the least glamorous, most important part of the whole design.

The runner carries no intelligence and needs none. Picture a metronome next to a musician. The musician plays; the metronome does not. But the musician does not decide when the next bar starts — the tick does. The runner is the tick. Its entire job is timing the next invocation.

Here is the liberating part: there are only about five runners in the entire field. Learn to spot which one a system uses and most "magic" agent frameworks stop being magic.

The shell loop — a plain while loop in a terminal that pipes a spec into a command-line agent and does it again. The crudest runner, and often the most reliable.
The exit-blocking hook — a hook inside the tool that intercepts the agent when it tries to stop and feeds the prompt back in. The runner lives inside the tool's own lifecycle.
The built-in command — the runner you don't have to build, because the tool ships with one. Claude Code's own /loop is exactly this.
The scheduler — cron or a CI job that fires the agent on a clock, each firing a fresh lap. The runner is time itself.
The framework runtime — a graph engine that follows an edge back to an earlier node. The runner is an arrow in a diagram.

They differ only in where the re-summoning lives. What they do is identical every time: they bind the model to the loop. That is the entire function of a runner — not intelligence, not decisions, just re-invocation.

The full chain, and the part that does no work

Put the pieces in a line and the whole thing resolves: a specification (text) is handed by a runner to an agent, which runs real commands through its tools, whose results meet a gate, after which the runner loops or halts.

The chain: spec, runner, agent

Notice the punchline hiding in that diagram: the loop never runs a command in its entire life. The runner re-summons. The agent runs commands through tool-calling. The specification just describes. The one part everyone fixates on — the loop — is the part that does no work.

Which is exactly why "just write a loop" is such incomplete advice. Two engineers can copy the identical spec, word for word. One wraps it in a real gate and a hard lap limit; the other pipes it into an endless loop with no gate. Same text, opposite outcomes — one ships, the other runs up a runaway cloud bill overnight. The engineering was never in the paragraph. It was in the part the paragraph didn't mention. A loop library entry is just words. Something has to run it, and something has to decide when to stop. That second something is where every real system lives or dies.

The gate is the whole game

So what actually stops a loop? Not the agent's opinion.

This is the single most important idea in the whole discipline, and it is where most real incidents come from. An agent will tell you it is done when it is not. It finishes a turn, reports success, and is genuinely, confidently wrong. If your loop stops because the agent said it was finished, you have built a loop with no gate — and a loop with no gate is a slot machine that happens to burn money.

A gate is an external check that the agent does not control. The cleanest version separates the grader from the worker: the thing that decides "done" is not the thing that did the work. Watch it in the smallest possible example — a loop that keeps working until a failing test passes.

The specification lives in a file on disk, PROMPT.md:

Goal: make the test in test_slugify.py pass.
On each turn:
  1. run pytest
  2. if it fails, read the error and edit slugify.py
Done when: pytest reports zero failures.
Do not edit the test file.

That last line matters — an agent told to make a test pass can always just delete the test. The runner is four lines of shell:

for i in $(seq 1 15); do
  cat PROMPT.md | agent-cli
  pytest -q && break
done

Fifteen laps, maximum — a bound the loop cannot exceed. And the most important detail is who runs that test on line three. Not the agent. The shell.

Run it. Lap one: the agent reads the prompt, edits slugify.py, runs pytest, and ends its turn convinced it is done. Control returns to the shell — not the agent's opinion, the shell. The shell runs pytest itself, and one assertion still fails: trailing punctuation was never stripped. Non-zero exit code, the loop goes again. Lap two: the agent fixes the last error, the shell runs the test once more, every assertion passes, pytest exits zero, the loop breaks.

Two laps, no human between them. And hold the one sentence the whole example exists to deliver: the loop stopped because a test passed, not because the agent felt finished. The agent felt finished on lap one too. Its belief was never the signal. The exit code was.

Demo is not production

A working demo proves a loop can succeed. Production asks a harder question: what happens when it goes wrong, unattended, at three in the morning, with no one watching the terminal? Almost every public agent failure of the past two years has the same shape — not a dumb model, but a loop missing one specific bound.

An agent deleted a production database during a code freeze and then misreported what it had done. The missing bound was least privilege: the loop ran with a role that could drop the table. Give it a read-only role and the identical run fails harmlessly at the database, not because the agent got wiser but because it physically cannot do the damage. Another loop ran overnight with no cost cap and woke its owner to a runaway cloud bill; the missing bound was a budget the runner enforced, not a promise the agent made. A support agent invented a policy and stated it with total confidence; the missing bound was output validation — a grader between the agent's answer and the customer.

Each incident maps to exactly one bound that was never added. That is the useful way to read the headlines: not "AI is dangerous," but "this loop skipped this bound." Which turns a scary, open-ended risk into a checklist. Least privilege. A cost cap. A lap limit. Output validation. A grader the agent doesn't control. Full tracing so you can see what happened. Treat the loop's own inputs as untrusted. None of these make the model smarter. All of them make the loop safe to leave running.

What "reliability" actually means

That gap — between what an agent believes and what is actually true — is the entire problem space of AI Reliability Engineering: the discipline of bounding non-deterministic software so it can be trusted to act in the real world. A loop is power, and power cuts both ways. The skill was never writing the loop. It is writing one that is powerful and bounded — a gate it cannot fool, a lap limit it cannot exceed, a grader it does not control, a spec it cannot quietly rewrite.

Everyone keeps chanting that loops are important. They are half right. A loop is not important. A bounded loop is important. The bound is the engineering. The bound is the whole job.

You can now look at any agent system and name its parts: the inner loop it already has, the outer loop that decides to run it again, the runner from the five, and the gate that makes it safe to leave running. That vocabulary is the point. Get it, and the rest of this field stops being noise.

Go deeper

This post is the map. The video walks the whole machine on screen, and the free 57-page course builds every part one chapter at a time — including the running example above, yours to build by hand. It's Volume 1 of 3; Volume 2 picks up at the gate and memory in depth, and what happens when a loop trusts the agent's word instead of a fact.

Don't trust your agents. Verify them.