Agent Harness

Brief definition

The deterministic code that wraps a language model and turns it into an autonomous agent — handling retries, tool dispatch, memory, and context shaping so that the model itself can focus narrowly on next-step reasoning.

What it is

A bare LLM is a single-shot text predictor. A useful agent — one that runs for hours, recovers from tool errors, manages a finite Context Window, and produces reliable work — needs a substantial layer of non-model code surrounding it. That wrapping layer is the harness.

Rohit Verma makes the scale concrete by reverse-engineering Claude Code itself: the published source contains 331 modules, with an 823-line retry system alone. The contrast he draws is sharp — most teams shipping agents wrap the model in try/catch and call it done; production-grade harnesses encode hundreds of behavioural decisions the model would otherwise have to rediscover on every run.

A harness typically owns: tool-call dispatch and timeout handling, retry policies for transient failures, conversation compaction and sub-agent forking, memory read/write strategy, hook execution, and the token-budget arithmetic that decides what gets included in each model call. Boris Cherny, who created Claude Code at Anthropic, has described nine patterns that waste 73% of users’ tokens — CLAUDE.md bloat, re-reading old chat history, forgotten hooks — all of which are choices the harness makes before the model reads a single character of the user’s prompt. The harness, in other words, determines whether token usage is principled or accidental.

Why it matters

Agent harnesses are the difference between a demo and a deployable product. Sarah Wooders on the Letta Code team illustrates the granularity at which harness design pays off: their recall sub-agent was rewritten from an isolated process into a fork of the main agent specifically to maximise prefix caching — a structural change that reduces cost without touching model weights. Ramachandran frames this as the bigger lesson: harnesses “are here to stay and memory isn’t an optional plugin” — they belong to the architecture of the agent, not its tooling.

For practitioners, the implication is that improvements at the harness layer compound faster than waiting for a new model. The harness is the part you actually control.

What it is

Why it matters

Related concepts