An architectural layer between application code and the model API that decouples business logic from any single provider, prompt format, or model version — turning model swaps and prompt changes into configuration rather than code rewrites.
What it is
The simplest version of an LLM application calls a model API directly: client.complete(prompt) lives inside the same function that handles the user’s request. Santiago Valdarrama describes this as the single largest architectural mistake practitioners make — and proposes the fix as “the simplest trick you can use to improve the architecture of whatever you are building”: don’t talk directly to a model. Add one intermediate layer between application code and the model.
That layer typically owns prompt assembly (combining system instructions, retrieved context, user input), provider routing (Claude vs GPT-4 vs an open model, chosen by task or cost), version pinning, response parsing and validation, and observability hooks. The application calls something like agent.respond(intent, payload); the layer below handles every detail of how that maps onto a specific model call. When the model API changes, when a new provider becomes preferable, or when prompts need iterating, only the layer changes — application logic stays untouched.
The same architectural impulse explains why earlier monolithic LLM tooling has aged poorly. Sam Hogan catalogues a long list — “RAG, GraphRAG, Multi Agent Orchestration, ReAct frameworks, prompt management tools, LLMOps tooling, eval tools, gateways, fine-tuning libraries” — that he argues were obsoleted by the agentic-coding shift. The frameworks that bound applications tightly to their assumptions broke when the assumptions changed; lightweight indirection survived.
Why it matters
The indirection layer is the practical mechanism behind the harness mindset. It is also the difference between a prototype that ships and a system that adapts: a model call buried in business logic costs hours to migrate; the same call behind an interface costs minutes. For teams running production agents, indirection is what makes provider competition usable rather than disruptive.
Related concepts
- Agent Harness
- Claude Code
- AI Wiki