Loops All the Way Up: What Loop Engineering and Harness Engineering Actually Replace

2026-06-10

Last updated 2026-06-10

There's a loop you've been running for two years without naming it. The agent does a chunk of work. You read what came back. You decide whether it's any good and what should happen next. You type the next instruction. The agent does another chunk. That cycle, check then decide then re-prompt, is a loop, and the thing running it is you.

Once you can see that loop, the two terms everyone keeps repeating in 2026 stop sounding like competing buzzwords. Harness engineering and loop engineeringgo after two different loops. One makes the agent's own loop run longer. The other comes for yours.

Three loops, nested

It's loops all the way up. Three of them, stacked, each wrapping the one inside it.

The inner loop, ReAct.Inside a single run, the agent reasons, calls a tool, observes the result, and reasons again. This is the agent's own loop, the one it turns through by itself across one task.
The harness. Not a loop you run, but the machinery that decides how far the inner loop gets before it stalls: tools, state, context management, hooks, recovery.
The outer loop, you. When a run finishes or gets stuck, control comes back to a human who checks the output, sets the next goal, and starts another run. This is the loop nobody named.

Hold onto that picture, because the whole question is which loop each idea is trying to change.

Three nested loops

ReAct → harness → loop engineering, and where the human goes

inner loop · the agentharness · one run, furtherloop engineering · the outer loop

The inner loop is the agent's own. The harness decides how far one run gets. Loop engineering automates the outer loop, the one that used to be you, and folds your moment-to-moment checking into a separate verifier.

Harness engineering goes after the inner loop

A raw model isn't an agent. It becomes one when you wrap it in scaffolding, the harness. Viv Trivedy's one-liner captures the shape of it: Agent = Model + Harness; if you're not the model, you're the harness. Harness engineeringis the discipline of treating that scaffolding as a real artifact: any time the agent makes a mistake, you change the environment so that mistake can't happen again, then tighten from there.

A harness is basically everything that isn't the model:

A system prompt and rulebook (think AGENTS.md or CLAUDE.md) that gets injected every turn.
Tools, skills, and connectors, plus the descriptions that tell the model when to reach for each one.
Context policies: compaction, offloading big tool outputs, revealing instructions only when a task needs them, all to fight context rot.
A sandbox to run code safely and let the agent verify its own work.
Hooks that enforce rules deterministically: block a destructive command, run the tests after an edit, require approval before a push.
Observability: logs, traces, and cost and latency metering.

One foundational result keeps showing up: the same model in a better harness performs dramatically better. Teams have moved a coding agent from mid-pack to top-five on a benchmark by changing the harness alone. The model was never the only variable.

Look at what that actually buys you. A better harness lets a single run go further: more steps, more coherence, fewer face-plants, before the agent has to hand control back to the human loop. It stretches how far one push gets and how reliably it gets there. Anthropic's engineering team puts the sharp version of this well: every harness component encodes an assumption about what the model can't do on its own, so as models improve, the scaffolding doesn't disappear. It moves to wherever the new ceiling sits.

What harness engineering does notdo is take you out of the outer loop. However good the harness, the run ends eventually, and something has to look at the result and say "good, now do this next." For two years that something was a person.

Loop engineering comes for the outer loop

Loop engineering, a phrase Addy Osmani popularized and Peter Steinberger and Claude Code's Boris Cherny echoed, automates thatloop. The one you've been running by hand. Instead of checking and re-prompting after every chunk, you define a recursive goal once and build a system that drives the agent until the goal is actually met.

And it's more than auto-typing the next prompt. Your loop was never only "decide the next step." It also did the quieter job of noticing there was work to do. So a real outer loop has to take over discovery and triage too: a scheduled or event-driven trigger surfaces what needs doing, the loop dispatches it, and only the leftovers land back in your inbox. You stop being the scheduler, the dispatcher, and the turn-by-turn driver all at once.

Strip a working loop down and you find five moving parts plus a memory:

Automation: a scheduled or event-driven trigger that surfaces work. The heartbeat.
Isolation: worktrees or separate contexts so parallel agents don't collide.
Skills: project knowledge written down once, so the loop doesn't re-derive your conventions every cycle.
Connectors: access to the real tools and data the work actually touches.
Sub-agents: the maker-checker split, which is important enough that it gets its own section below.
Memory on disk: a file or a board that survives between runs, because the model forgets everything otherwise.

None of this is hypothetical anymore. Claude Code ships /loop and /goal; OpenAI Codex ships Automations and its own /goal command. The primitives now live inside the tools you already use, which is part of why the term caught on when it did.

That's where the real advantage shifts. It goes from how good this one prompt is to how good the system is that writes and judges prompts for you. A well-designed outer loop multiplies a good engineer. A badly designed one multiplies a bad decision just as fast, with fewer eyes on it.

The catch: the human loop was also doing the checking

This is where the naive version of loop engineering falls apart. If you just delete the human and let the system re-prompt itself toward the goal, you haven't only removed the person who said "do this next." You've removed the person who said "wait, that last step is wrong." The outer loop was always two jobs wearing one hat: decide the next step, and verify the last one.

So a loop that actually works has to take verification inside itself. The standard move is the maker-checker split: a separate verifier agent grades the intermediate work and decides whether the goal is met. Pointedly not the agent that did the work, because a model grading its own output skews positive every time. The verifier is what makes "the loop says it's done" mean anything. It's the same principle behind serious agent evaluation: keep generation and judgment apart, or you're just trusting a confident guess.

Even with a verifier, the human doesn't fully leave the checking seat. You move out of the per-turn loop and up to two places:

Before the run, you design the loop.The recursive goal, the done-condition the verifier checks against, the cadence, and the approval gates for anything irreversible. Writing down what "done" means is where most scope drift gets caught.
After the run, you review what matters.Not every turn. The consequential, hard-to-reverse outputs. "Done" is a claim the loop makes; merging it is still your call.

So the honest one-liner isn't "the human only checks the final state." It's closer to this: loop engineering takes you out of driving every turn and moves you up to designing the loop and reviewing its consequential outputs, while a verifier agent handles the moment-to-moment checking you used to do by hand.

What this looks like on a phone: Memex

This isn't just vocabulary to us. Memex has run a version of the outer loop on-device since before either term was trending: a local-first, no-backend app where a multi-agent system turns your records into structured cards, extracts knowledge, and surfaces insights, all on your phone. The harness is dart_agent_core, our open-source Dart framework that implements the full ReAct loop plus tool use, sessions persisted with FileStateStorage, context compression, and recovery. The twist is the constraint: no server, no shell, no CI runner. The outer loop has to survive the app being killed and the phone restarting, which is exactly why state-on-disk and event-driven triggers are load-bearing here, not nice-to-haves. (More on all of this in the engineering behind Memex.)

That outer loop runs on events, not chat. You don't check a card and then tell the system what to do next. A record event fires through a global event bus to whichever agent subscribed to it, with the trigger, working directory, and skills living in a CustomAgentConfig on disk. The Super Agent orchestrates specialized workers (Card, PKM, Insight, Comment, Memory) and delegates through a delegate_task tool, each in isolated state. No human types the next prompt.

And the done-condition is written in code, not taken on faith. When the Card Agent finishes, Memex runs CardRunCompletionEvidence: does the card exist on disk, does its fact_id match, is the status completed, does it have a title and UI configs, and was there a matching successful save_timeline_card call. Only then is the run complete; if not, the agent is re-prompted with the exact missing requirements and runs again, up to a retry cap. Add AgentController hooks that pause before destructive actions and a loop detector for repeated tool calls, and you have the verifier and the done-condition from earlier in this post, running in production on a phone.

The risks don't go away, they move up a layer

The problems that lived in the human loop don't vanish when you automate it. They move, and a couple of them get sharper.

Verification is still the hard part. You moved it from your own eyes to a verifier agent plus your review of the consequential outputs. If the verifier is weak, the loop ships wrong work with confidence, faster than the old human loop ever could.
Comprehension debt piles up.The faster the loop produces changes you didn't make by hand, the wider the gap between what exists and what you understand.
Cost scales with the loop, not the task.A scheduled loop with a verifier after every turn and several sub-agents burns tokens whether or not it finds much. Start slow, watch the bill, and widen the cadence only once it's producing work you keep.

The takeaway isn't "automate the human loop and walk away." It's that the job changed shape. Harness engineering makes one run reliable. Loop engineering makes a reliable run repeat without you driving it. Both pay off only if the verification you used to do by hand is faithfully rebuilt inside the loop, and only if you stay the engineer who designed it instead of the one who just presses go.

FAQ: loops, harnesses, and what gets replaced

What is the 'human loop' in agentic coding?

It's the loop you run without naming it: the agent finishes a chunk of work, you check it, you decide the next goal, and you re-prompt. For two years this human-driven outer loop was the default way to use a coding agent. Loop engineering automates it, so you define the goal once and a system drives the agent instead of you typing the next step every time.

How do harness engineering and loop engineering divide the work?

They go after different loops. Harness engineering works on the inner loop, making a single agent run go further and stay coherent before it has to hand back to you, using tools, state, context policies, hooks, and recovery. Loop engineering works on the outer loop, replacing you as the thing that checks the result and decides what to prompt next. A strong harness lengthens each push; a loop removes you from between the pushes.

If loop engineering replaces the human, who does the checking?

Not nobody, and that's the catch. The human loop was never just 'decide the next step,' it was also 'verify the last step.' So a real loop has to take verification inside itself: a separate verifier agent grades intermediate work and decides whether the goal is met, because the agent that did the work skews positive when grading itself. The human doesn't leave verification entirely; you move to reviewing consequential or irreversible outputs and to designing the done-condition.

So what does the human actually do in a loop-engineered setup?

Two things, both shifted upstream. Before the run, you design the loop: the recursive goal, the done-condition, the verifier, the cadence, and the approval gates. After the run, you review the consequential and irreversible results, not every turn. The advantage moves from how good your prompt is to how good the system is that prompts and verifies for you.

Does Memex use this pattern?

Yes, scoped to a phone. Memex's custom agents are event-driven: a record event triggers an agent through a global event bus and a local task executor, with no human typing the next prompt, so the outer loop is automated. Verification is embedded as controller hooks, which can pause for confirmation before destructive actions, and a loop detector. State persists to disk so the loop survives the app being killed. The human stays at the consequential gates.