The dominant architecture of modern LLMs and the structural source of their reliability problems.
What it is
An auto-regressive network produces output sequentially. At each step it predicts the next element — a token, a pixel, a value — by conditioning on every element it has produced so far. The previous prediction is fed back as input to the next prediction, and so on, until the sequence terminates.
Every modern transformer-based LLM is auto-regressive at inference time: each next token is sampled from a distribution conditioned on the prompt plus everything the model has already generated. Image generators, audio models, and time-series predictors share the same recursive pattern.
Why it matters
The recursive structure is what gives auto-regressive networks their generality — they can produce sequences of arbitrary length over arbitrary domains — but it is also the structural source of their reliability problems. Small numerical errors at each step compound across the sequence; computer scientist Leslie Valiant named this the accumulation-of-errors problem decades ago.
When chain-of-thought prompting produces synthetic intermediate reasoning, it does so by sampling from the same auto-regressive network whose drift is the original problem. Any input perturbation can produce divergent reasoning paths. Hallucination is the visible consequence: an unbounded sampler running without a verifier will eventually go off-piste, and there is no architectural mechanism inside an auto-regressive network to stop it.
The neurosymbolic argument starts here. The reliability gap is structural, not anecdotal, and cannot be closed by scaling the same architecture.