Context Loading
January 30, 2026
How you give agents access to information matters more than you think. The choice between passive context (always available) and active retrieval (fetched on demand) can mean the difference between 100% and 53% accuracy.
The Core Tradeoff
Passive context: Embed knowledge directly in the prompt or system context. The agent doesn't need to decide to look it up — it's already there.
Active retrieval: Give the agent tools to fetch information when needed. Skills, RAG, MCP servers, function calls.
The intuition says active retrieval should win. Why waste tokens on information the agent might not need? Let it pull what's relevant.
But Vercel's research found the opposite.
LLMs Are Lazy
In Vercel's evals, agents with access to skills invoked them only 44% of the time. More than half the time, the agent had access to documentation that would help — and chose not to use it.
This isn't a bug. It's a known limitation of current models. Agents don't reliably use available tools, even when those tools would help.
The fix was counterintuitive: compress the docs and embed them directly. An 8KB compressed index in AGENTS.md achieved 100% accuracy, while skills maxed out at 79%.
The What Matters More Than The How
As Vercel put it: "Context that's always available beats context that requires a decision to access."
This applies broadly:
- AGENTS.md beats skills for framework knowledge
- Inline context beats RAG for critical information
- Smaller, always-present beats larger, on-demand
The cost is tokens. Passive context uses tokens every turn, whether needed or not. But if the information is frequently relevant, the reliability gain outweighs the cost.
When Active Retrieval Makes Sense
Active retrieval isn't dead. It's still the right choice when:
- Scale: The knowledge base is too large to fit in context
- Freshness: Information changes frequently
- Specialization: Knowledge is only relevant for rare tasks
- Actions: The agent needs to do something, not just know something
MCP servers, function calls, and tools are still essential for agents that interact with the world. The insight is about knowledge, not capabilities.
Memory: The Third Dimension
Context loading isn't just prompt vs retrieval. There's also memory:
- Short-term memory: The conversation history. Coherence within a session.
- Long-term memory: Persisted across sessions. Vector DBs, knowledge graphs, files.
Most agents shine in-session, then restart from zero. No persistent identity = no true long-horizon growth.
The question becomes: if the agent restarts, is it still the same agent?
Practical Guidelines
Embed directly when:
- Knowledge is needed on most turns
- Accuracy matters more than cost
- You can compress to ~10-50KB
- The agent shouldn't have to "decide" to use it
Use retrieval when:
- Knowledge base exceeds context limits
- Information is rarely needed
- Freshness matters
- You're okay with some retrieval failures
Hybrid approach:
- Critical knowledge embedded (AGENTS.md)
- Extended knowledge in skills/tools
- Explicit instructions to use tools when embedded context is insufficient