Compile-Time Knowledge: A Technical Argument Against RAG

A technical argument against RAG

If you have built a retrieval-augmented generation system, you know the architecture intimately: an embedding model, a vector store, a top-k retrieval call, a context-window stuffing step, an LLM completion. Maybe a re-ranker. Maybe a small graph of MCP tools around it. The pipeline is mature, the libraries are good, and the pattern works for a class of problems.

This article is about the class of problems where it does not work — and about the architectural alternative Trail implements, which we describe as compile-time knowledge.

The short version is this: RAG performs work at query time. Trail performs work at ingest time. That single difference cascades through every other property of the system — latency, accuracy, provenance, cost, accumulation, and the size of the corpus you can usefully reason over. We will walk through each of these.

We will also explain why this distinction is not novel. Brains do compile-time knowledge work. They have done it for hundreds of millions of years. RAG, despite the surface-level analogy to memory retrieval, is structurally closer to a search engine than to a cognitive system. The reason this matters is practical: certain workloads accumulate value over time only if the system itself accumulates structure over time, and RAG does not.

The mechanics of RAG, briefly

A standard RAG pipeline:

query → embed(query) → vector_store.search(top_k) → 
  rerank(results) → format_context(results) → 
  llm.complete(query + context) → response

Every component here is stateless across queries. The vector store contains document chunks, but those chunks are not connected to one another in any meaningful semantic structure — they are connected only by approximate proximity in embedding space. The LLM never accumulates understanding across queries. The context window is the only memory, and it dies when the request returns.

When the corpus grows, RAG's behavior degrades in predictable ways:

Retrieval precision drops. With more chunks, semantic neighbors become noisier. Top-k retrieval starts surfacing irrelevant material. The standard fix is hybrid search (BM25 + vector) and re-ranking, which adds latency and cost.
Context window pressure increases. More candidates means harder filtering decisions. The system either retrieves too few (missing relevant material) or too many (diluting the LLM's attention).
Contradictions get retrieved together. If two sources disagree on a fact, RAG happily retrieves both and asks the LLM to reconcile them at query time, with no awareness that the contradiction exists or has been seen before.
Source updates do not propagate. If a source is corrected, the chunks remain in the vector store. There is no concept of "this older fact is now stale because the source it came from has been updated."

These are not bugs in any specific RAG implementation. They are consequences of doing knowledge work at query time, on raw fragments, with no persistent compiled artifact.

The mechanics of Trail's compile-time architecture

Trail flips the pipeline. Work happens at ingest time. Queries read from a compiled artifact:

ingest:
  source → extract → chunk → 
    propose_candidate(kind, payload, confidence) → 
    queue → policy_check → 
    [auto_approve | curator_review] → 
    compile_to_wiki(touches 5-15 pages) → 
    emit_wiki_event(payload, prev_event_id)

query:
  query → identify_relevant_pages → 
    read_wiki(pages) → 
    synthesize_with_citations → response

The unit of persistent storage is a wiki page, not a chunk. Wiki pages are markdown documents with structured [[wiki-link]] references to other pages, and stable {#claim-xx} anchors for individual claims. Each wiki page is the result of compiling N source chunks into a coherent narrative, with provenance attached to each claim.

The compile step is where the work happens. When a new source enters the system, the compiler:

Identifies which existing wiki pages are relevant
Extracts new claims from the source
Cross-references them against existing claims
Detects contradictions explicitly
Proposes updates to affected pages
Routes proposals through the Curation Queue (auto-approved or human-reviewed based on policy)
Writes the resulting wiki page change as an event with full payload

This is more expensive than a RAG ingest. A RAG ingest is chunk → embed → upsert. A Trail ingest is closer to a small reasoning task per source. The trade-off is that the cost is paid once at ingest, not repeatedly at every query.

Where the costs actually live

A common objection: "compile-time work is just shifted cost — you pay the LLM either way." This is true in the trivial sense and false in the practical sense.

In RAG, every query incurs:

Embedding cost on the query
Vector search cost (cheap)
Re-ranking cost (moderate, often LLM-based)
LLM completion cost on query + N retrieved chunks

The LLM completion is the dominant cost, and it scales with N × chunk_size. A typical production RAG system pulls 10-20 chunks of 500-1000 tokens each, putting 5K-20K tokens of context into every completion. Multiply by query volume.

In Trail, every query incurs:

Identify relevant wiki pages (small classification task)
Read those pages (typically 2-5 pages, much denser than raw chunks)
LLM completion on query + relevant pages

The compiled wiki pages are an order of magnitude denser per token than raw chunks. A wiki page that took 50 source chunks to compile might fit in 1500 tokens because the LLM did the integration work at compile time — removing redundancy, resolving contradictions, structuring the narrative.

The compile-time cost is paid once per source ingest, not once per query. For a corpus of 1000 sources answering 10K queries per month, the math favors compile-time architecture by a wide margin within weeks.

Why this maps onto how brains work

The biological precedent for compile-time knowledge work is extensive and well-studied. Three mechanisms in particular map cleanly onto Trail's architecture.

Memory consolidation. Human memory is not unitary. The hippocampus encodes new experiences in a fast, high-bandwidth format. During sleep — particularly slow-wave sleep — these encodings are replayed in compressed form and transferred to the neocortex, where they are integrated with existing long-term memory structures. This is not a backup operation. It is an active integration: contradictions are resolved, patterns are extracted, schemas are updated. By morning, yesterday's experience has been compiled into the existing structure of what you know.

Trail's compile step is structurally analogous. Sources enter a fast queue (the curation candidates). A background process (auto-approval policy + scheduled lint) integrates them into the persistent wiki, resolving conflicts and updating cross-references. The wiki is the long-term store. Sources are the short-term encoding.

Synaptic plasticity. Knowledge in the brain is encoded in the strength of synaptic connections, not in the neurons themselves. A concept is a distributed pattern of connection weights across thousands or millions of neurons. Learning strengthens specific connections; forgetting weakens them. The Hebbian principle — "neurons that fire together, wire together" — operates at compile time, not query time. By the time you need to recall something, the connections are already in place.

In Trail, the equivalent is the wiki link graph. The strength of a connection between two wiki pages is encoded in how many times they cross-reference each other, how many sources support the connection, and how often the curator has approved that connection. This structure is built at compile time. Queries traverse it; they do not construct it.

Spreading activation. When you think of a concept, related concepts activate automatically — even ones you did not consciously search for. Think "hospital," and "doctor," "illness," "ambulance," "nurse" are primed before you are aware of them. This happens because the connection weights between related concepts are already in place. Activation spreads through the network in milliseconds.

RAG attempts to reproduce this with vector similarity search, but it does so at query time on raw chunks with no compiled structure. The result is approximate semantic neighborhood matching, not true associative activation. Trail's wiki link graph supports actual spreading activation because the connections are persistent and explicit. A query for "stress" doesn't just retrieve stress-related chunks; it can traverse to "cortisol regulation," "sleep disruption," and "treatment protocols" through compiled links.

The four properties that change

Compile-time architecture changes four measurable properties of a knowledge system.

Accumulation. In RAG, the value of the system grows linearly with corpus size — and degrades as retrieval precision drops. In Trail, the value grows superlinearly because each new source can update existing wiki pages, strengthen existing links, and resolve previous contradictions. The tenth source improves the wiki in light of the previous nine. The hundredth source benefits from the integrated structure of the previous ninety-nine.

This is the property that matters most for use cases where knowledge should compound: clinical practice, legal research, scientific synthesis, organizational learning. RAG handles these poorly because it has no integration step. Trail handles them well because integration is the entire point of the architecture.

Provenance. RAG provenance is correlational: "the model said X, and we retrieved chunk Y, so probably X came from Y." Trail provenance is structural: every claim on every wiki page is linked to the specific source revisions that support it. When a source is updated, the affected pages are flagged for re-review automatically. When a claim is contested, the lineage is explicit.

For regulated domains — healthcare, legal, financial — this is not a nice-to-have. It is a baseline requirement that RAG cannot meet without bolting on a separate provenance layer that does the work Trail does natively.

Latency. Trail queries are faster because the work has already been done. A wiki page is denser than the equivalent raw chunks, so less context needs to be passed to the LLM. There is no re-ranking step. The retrieval step is identifying relevant pages, not searching a chunk space. Production Trail deployments typically see 30-60% latency reduction on equivalent queries compared to RAG baselines, and the gap grows as corpus size grows.

Cost. As described above, the cost calculus shifts from per-query to per-ingest. For high-query-volume use cases (which is most of them), this is a substantial reduction. The ingest cost is also more predictable — it is bounded by source volume, not user behavior.

What RAG is good for

To be fair: RAG is the right architecture for a class of workloads. If your corpus is large, frequently updated, and you primarily need lookup-style queries with no need for accumulated understanding, RAG works well. Customer support over product documentation, search across a code repository, Q&A over recent news — these are RAG-shaped problems.

The mistake is using RAG for problems that are not RAG-shaped: building a research wiki that should accumulate insight, maintaining clinical knowledge that should compound across patients, supporting expert practice where the depth of the integrated structure matters more than the speed of any individual lookup. These are compile-time problems being forced into a query-time architecture.

Trail is built for the second class. RAG remains the right answer for the first.

The 1945 reference, briefly

Vannevar Bush's 1945 essay As We May Think described a hypothetical machine called the memex — a desk-sized device with microfilm storage and projected screens that would let a researcher build associative trails through their personal knowledge corpus. Bush's central argument was that the human mind operates by association, not by hierarchical classification, and that information systems built on classification (the library, the card catalog) were structurally mismatched with how thinking actually works.

The microfilm reels in the base of Bush's memex desk — chosen as the visual mark for the Trail engine — were not a technical detail. They were the storage substrate for trails: sequences of frames linked in meaningful order, with annotations along the way. The Trail was the unit of compiled knowledge. The memex was the machine that compiled and held it.

Eighty-one years later, language models are the integration mechanism Bush could not build. The LLM Wiki pattern, articulated by Andrej Karpathy in late 2025, is the modern restatement of Bush's idea: use the language model as a continuous compiler of knowledge, not as a query-time oracle. The wiki is the artifact. The model maintains it.

Trail is the engine that does this work. The SaaS successor to memxcloud is the commercial product built on trail.

Implementation notes

A few specifics about how Trail's compile-time architecture is implemented:

The Curation Queue is the only write path to wiki documents. No code path bypasses it, including auto-approval. This guarantees that every wiki state change is auditable and reversible.
Wiki events are full-payload, not deltas. Each event records the complete new content of the affected page, not just the change. This makes time-travel queries trivial (replay events up to a timestamp) and ensures schema migrations don't lose history.
Stable claim anchors ({#claim-xx}) are emitted by the compiler at compile time and remain stable across re-compilations of the same source material. This makes it possible to add a first-class claims table later without re-parsing every wiki page.
Source ↔ wiki is a bidirectional index. Sources know which wiki pages they affect; wiki pages know which sources support each claim. Cascading invalidation is a database query, not an inference step.
Lint runs as a periodic background job. Stale page detection, contradiction surfacing, and gap analysis happen during quiet hours, not during user queries.

These are the structural commitments that make compile-time architecture work. They are not optional features bolted onto a RAG pipeline. They are the architecture.

Closing

The argument against RAG is not that it does not work. It is that it works for the wrong shape of problem if your goal is accumulating knowledge over time. A system that performs all of its reasoning at query time, on raw fragments, with no persistent compiled structure, is structurally a search engine — regardless of how sophisticated the language model attached to it.

The argument for compile-time knowledge is that integration, consolidation, and persistent structure are the actual value-producing operations in a knowledge system. Brains do this work continuously, mostly outside of conscious attention. Trail does it explicitly, in a queue-driven pipeline with full provenance.

The infrastructure cost is higher per source. The infrastructure cost is dramatically lower per query. The accumulated value, over time, is qualitatively different.

Bush saw this in 1945. Karpathy named it again in 2025. Trail implements it now.