Context Loading

January 30, 2026

How you give agents access to information matters more than you think. The choice between passive context (always available) and active retrieval (fetched on demand) can mean the difference between 100% and 53% accuracy.

The Core Tradeoff

Passive context: Embed knowledge directly in the prompt or system context. The agent doesn't need to decide to look it up — it's already there.

Active retrieval: Give the agent tools to fetch information when needed. Skills, RAG, MCP servers, function calls.

The intuition says active retrieval should win. Why waste tokens on information the agent might not need? Let it pull what's relevant.

But Vercel's research found the opposite.

LLMs Are Lazy

In Vercel's evals, agents with access to skills invoked them only 44% of the time. More than half the time, the agent had access to documentation that would help — and chose not to use it.

This isn't a bug. It's a known limitation of current models. Agents don't reliably use available tools, even when those tools would help.

The fix was counterintuitive: compress the docs and embed them directly. An 8KB compressed index in AGENTS.md achieved 100% accuracy, while skills maxed out at 79%.

The What Matters More Than The How

As Vercel put it: "Context that's always available beats context that requires a decision to access."

This applies broadly:

AGENTS.md beats skills for framework knowledge
Inline context beats RAG for critical information
Smaller, always-present beats larger, on-demand

The cost is tokens. Passive context uses tokens every turn, whether needed or not. But if the information is frequently relevant, the reliability gain outweighs the cost.

When Active Retrieval Makes Sense

Active retrieval isn't dead. It's still the right choice when:

Scale: The knowledge base is too large to fit in context
Freshness: Information changes frequently
Specialization: Knowledge is only relevant for rare tasks
Actions: The agent needs to do something, not just know something

MCP servers, function calls, and tools are still essential for agents that interact with the world. The insight is about knowledge, not capabilities.

Memory: The Third Dimension

Context loading isn't just prompt vs retrieval. There's also memory:

Short-term memory: The conversation history. Coherence within a session.
Long-term memory: Persisted across sessions. Vector DBs, knowledge graphs, files.

As one researcher noted:

Most agents shine in-session, then restart from zero. No persistent identity = no true long-horizon growth.

The question becomes: if the agent restarts, is it still the same agent?

Practical Guidelines

Embed directly when:

Knowledge is needed on most turns
Accuracy matters more than cost
You can compress to ~10-50KB
The agent shouldn't have to "decide" to use it

Use retrieval when:

Knowledge base exceeds context limits
Information is rarely needed
Freshness matters
You're okay with some retrieval failures

Hybrid approach:

Critical knowledge embedded (AGENTS.md)
Extended knowledge in skills/tools
Explicit instructions to use tools when embedded context is insufficient

Sources

AGENTS.md outperforms skills in our agent evals — Vercel
How to build agents with filesystems and bash — Vercel