The LLM Knowledge Base — Karpathy's Wiki Compilation Pattern

Type
Article
Published
2026-04-29
Aliases
LLM wiki, personal AI knowledge base, Karpathy wiki workflow
A scientist of the future records experiments with a tiny camera fitted with a universal-focus lens. The small square in the eyeglass at the left sights the object.
Alfred D. Crimi's illustration from \"As We May Think\" — Life, September 1945. Vannevar Bush's memex described a machine that could compile, cross-reference, and trail-link a personal library; the LLM Knowledge Base pattern is the first architecture that delivers it.
Summary

Andrej Karpathy’s “LLM Knowledge Bases” workflow turns an LLM from a chat partner into a wiki compiler. You drop raw sources into a folder; the model writes, indexes, and lints a markdown wiki you can later query. The pattern works because retrieval beats fancy RAG at small-to-medium scale, and because the wiki itself becomes the agent’s externalised long-term memory.

Overview

In April 2026, Andrej Karpathy described a workflow he had been quietly using for personal research: instead of asking an LLM to manipulate code, he was using it to manipulate knowledge. Source documents — papers, articles, repos, datasets, images — go into a raw/ directory. An LLM then incrementally “compiles” them into a wiki of markdown files: summaries, categorised concepts, articles, and backlinks. Obsidian acts as the IDE. The LLM rarely loses track of what it has already written, because it owns the entire wiki and rewrites it as new material arrives.

Once the wiki reaches a certain size — Karpathy reports ~100 articles and 400,000 words on one research topic — it becomes a queryable second brain. Complex questions get answered by an agent that reads the relevant articles, cross-references them, and renders the output as new markdown, slides, or matplotlib figures, which then get filed back into the wiki to enrich future queries.

The pattern matters because it is the architecture this very AI Wiki is built on. It also represents a quietly important shift in how teams are thinking about agent memory: instead of stuffing everything into ever-larger context windows, give the agent a clean, inspectable, self-maintained file system and let it query its own files.

Key Concepts

The Karpathy workflow

Karpathy breaks the workflow into five stages:

  1. Ingest. Web articles get clipped into markdown using the Obsidian Web Clipper extension; papers, repos, and datasets go into raw/. Related images are downloaded locally so the LLM can reference them directly.
  2. Compile. An LLM reads raw/ and incrementally writes the wiki: summaries of every source, concept entries, articles, and backlinks. The LLM owns the wiki — Karpathy says he “rarely touches it directly.”
  3. View. Obsidian is the frontend. Marp plugins render slides; standard markdown handles the rest.
  4. Query. Once the wiki is large, the LLM agent answers complex questions by reading its own articles. Karpathy expected to need “fancy RAG” but found that the LLM’s auto-maintained index files and brief summaries were enough at the ~400,000-word scale.
  5. Lint. Periodic LLM-driven “health checks” find inconsistencies, impute missing data via web search, and suggest new article candidates — incremental cleanup rather than a one-shot pipeline.

The architectural insight, which Charly Wargnier emphasises in his breakdown, is that the agent maintains its own memory layer rather than being given a memory layer. The LLM compiles its own indexes, lints its own data, and routes its own Q&A. The user dumps raw sources and asks questions; everything in between is the model’s job.

Why this beats stuffing context

Context windows are getting larger, but they remain finite, expensive, and lossy. As the context window article argues, every long-running session ends up fighting the same constraints: tokens consumed by old material, slow recall under pressure, degraded reasoning when the buffer fills.

A markdown wiki sidesteps the problem. The agent does not need to hold the entire knowledge base in its context — it needs to hold the index of the knowledge base, then read individual articles on demand. Kevin Nguyen, describing the open-source ByteRover system that explicitly implements this pattern, reports 50–70% token savings versus dumping documents directly into prompts. The mechanism is tiered retrieval: only the chunks the agent actually needs are loaded.

This also aligns with the empirical finding from God of Prompt that retrieval drives accuracy far more than write strategy. Once the agent has clean file organisation and the ability to query its own indexes, the question of how memories were written matters much less than whether the right ones can be found.

Sleep-time compute

A related idea is Sleep-time Compute: the agent does work between user turns. shira summarises a 2025 paper showing that offline reasoning between interactions delivered roughly 5× test-time compute reduction and up to 18% accuracy gains. The model anticipates likely next questions from the existing context and pre-computes answers, much like the wiki-compilation pattern pre-computes summaries and indexes before any query is asked.

The Karpathy workflow is sleep-time compute writ large. The wiki itself is the pre-computed answer cache. New raw material triggers a fresh compile pass. Linting passes find dead links and inconsistencies before the user notices. By the time a query arrives, the agent has already done most of the synthesis work — it just needs to assemble the answer from the existing structure.

Practical applications

For legal and academic research, the Karpathy pattern is particularly well-suited:

  • Bounded domains scale well. A research project on a specific topic — Northern Ireland data sovereignty law, AI pedagogy in legal education, a single ongoing case — produces a corpus that comfortably fits the “wiki + LLM” pattern without hitting Semantic Collapse limits.
  • Inspectable. Every claim in the wiki is traceable to a source in raw/. Unlike a vector store, you can read what the agent wrote and correct it.
  • Cheap and portable. No vector database, no embedding pipeline, no infrastructure. Markdown files in a folder. Version-controllable. Synchable. Searchable by ordinary tools.
  • Compounding over time. Each query that produces new output gets filed back into the wiki, enriching future queries. The system gets better as it gets used.

The trade-off is scale. At Karpathy’s reported size (~100 articles, ~400k words), naive file reading works. At a million articles, you need genuine retrieval infrastructure. The pattern is designed for personal and small-team knowledge bases, not enterprise corpora.

Limitations and open questions

  • Quality control. The LLM writing the wiki is also the LLM you are about to query. Errors compound: a misstated summary becomes the source for a future article. Periodic linting helps but does not fully solve this.
  • Source provenance. Wikis written by LLMs are easy to read but hard to audit. Every claim should link back to a raw/ source, but enforcing that requires discipline (or hooks).
  • Open versus closed. Karpathy’s setup is one-person. ByteRover and similar projects are exploring multi-user wikis where multiple agents and humans co-edit, which raises new questions about merge conflicts and authoritative versions.
  • Synthetic data and fine-tuning. Karpathy hints at the natural extension: as the wiki grows, fine-tune the model on it so the knowledge lives in weights rather than files. The pattern stops being “LLM querying wiki” and becomes “LLM trained on its own wiki.” This is unexplored territory at the personal-knowledge scale.
  • Tooling immaturity. The pattern is currently held together by Obsidian, custom prompts, and shell scripts. Karpathy’s own framing — “an incredible new product instead of a hacky collection of scripts” — suggests this is where the gap is.

Sources

  • @karpathy — Original LLM Knowledge Bases workflow description
  • @DataChaz — Self-improving second brain framing; agent owning its own memory layer
  • @kevinnguyendn — ByteRover paper, tiered retrieval, 50–70% token savings
  • @shiraeis — Sleep-time compute, 5× test-time reduction and 18% accuracy gains
  • @godofprompt — Retrieval drives 20-point accuracy swings; write strategy only 3–8