Context Engineering for OpenClaw Agents — Why External Memory Beats Bigger Context Windows
Context windows keep getting bigger. Claude offers 200K tokens. Gemini pushes past a million. The temptation is obvious: just shove everything into the system prompt and let the model sort it out.
If you’ve tried this with an OpenClaw agent, you already know why it doesn’t work. The agent gets slower. Responses drift. Costs balloon. And somehow, despite having all that context, it still forgets the thing you told it yesterday.
The problem isn’t context window size. It’s context engineering — deciding what goes in the prompt and when. And for agents that need to remember things across sessions, external memory beats brute-force context stuffing every time.
The Cost of Context Bloat
Let’s put real numbers on this. Say your OpenClaw agent loads these files on every session start:
SOUL.md— 800 tokensAGENTS.md— 1,200 tokensMEMORY.md— 15,000 tokens (3 months of accumulated context)USER.md— 400 tokens- Recent
memory/*.mdfiles — 8,000 tokens
That’s 25,400 tokens before the user says anything. Every session. Here’s what that costs:
Token costs (Claude Sonnet, input pricing at ~$3/MTok):
- Per session: ~$0.076
- 20 sessions/day: ~$1.52/day
- Monthly: ~$45.60
And that’s just the input. The model also processes all that context for every response, making each reply slower and more expensive.
But the real cost isn’t money — it’s quality degradation. Liu et al. (2023) demonstrated this in their paper “Lost in the Middle: How Language Models Use Long Contexts”: LLMs perform significantly worse at retrieving information placed in the middle of long prompts compared to information at the beginning or end. The effect is consistent across models and context lengths.
Your 15,000-token MEMORY.md contains maybe 500 tokens of information relevant to any given query. The other 14,500 tokens are noise that actively hurts performance.
What Context Engineering Actually Means
Context engineering is the practice of giving a model exactly the information it needs, when it needs it — no more, no less. For an OpenClaw agent, this means:
- Static context stays in the prompt (SOUL.md, AGENTS.md instructions, tool definitions)
- Dynamic context gets pulled on demand (user preferences, project details, past conversations)
The first category is small and constant. The second category grows forever and should live in an external memory system — not in a markdown file that gets loaded wholesale.
Here’s the difference in practice:
Before (everything in context):
Session starts → Load all files → 25K tokens in prompt → Ask model to find relevant bits
After (external memory):
Session starts → Load instructions (2K tokens) → User asks question →
Recall relevant memories (3-5 results, ~500 tokens) → Respond with focused context
The second approach uses 90% fewer tokens and gives the model better signal.
Making the Switch with MemoClaw
MemoClaw is a memory-as-a-service API designed for exactly this pattern. Instead of loading a file, your agent stores and recalls memories through semantic search.
Here’s what the workflow looks like:
Storing Context (Replace File Appends)
Instead of appending to MEMORY.md:
<!-- Old way: append to MEMORY.md -->
## 2026-03-07
- User prefers dark mode in all code examples
- Working on auth migration, using Postgres + Drizzle ORM
- User timezone is UTC-3
Your agent stores each piece individually:
memoclaw store "User prefers dark mode in all code examples" \
--importance 0.8 --tags "preference,ui" --namespace my-agent
memoclaw store "Working on auth migration using Postgres and Drizzle ORM" \
--importance 0.6 --tags "project,current" --namespace my-agent
memoclaw store "User timezone is UTC-3" \
--importance 0.75 --tags "personal" --namespace my-agent
Recalling Context (Replace File Reads)
Instead of the agent reading the entire memory file and hoping to find relevant bits:
# Agent needs to help with a database question
memoclaw recall "database setup and ORM" --namespace my-agent
# Returns: "Working on auth migration using Postgres and Drizzle ORM"
# Agent needs to format a code snippet
memoclaw recall "code display preferences" --namespace my-agent
# Returns: "User prefers dark mode in all code examples"
The agent gets exactly what it needs. No scanning, no noise, no wasted tokens.
Before and After: A Real Session
Let’s trace through a realistic OpenClaw agent session both ways.
The Old Way
[Session start]
→ Read SOUL.md (800 tokens)
→ Read AGENTS.md (1,200 tokens)
→ Read MEMORY.md (15,000 tokens)
→ Read USER.md (400 tokens)
→ Read memory/2026-03-06.md (3,000 tokens)
→ Read memory/2026-03-07.md (2,000 tokens)
Total context: 22,400 tokens
[User asks: "Can you help me set up the new API endpoint?"]
→ Model processes 22,400 tokens of context
→ Finds relevant project info somewhere in MEMORY.md
→ Generates response
→ Total input: ~23,000 tokens
The New Way
[Session start]
→ Read SOUL.md (800 tokens)
→ Read AGENTS.md (1,200 tokens)
→ Read USER.md (400 tokens)
Total context: 2,400 tokens
[User asks: "Can you help me set up the new API endpoint?"]
→ memoclaw recall "API endpoint project setup" (returns 3 memories, ~400 tokens)
→ Model processes 2,400 + 400 + user message
→ Generates response with precise context
→ Total input: ~3,200 tokens
That’s 85% fewer input tokens. The response is faster, cheaper, and — because the model isn’t wading through 15,000 tokens of noise — more accurate.
The Semantic Advantage
Text files give you exact-match search at best. MemoClaw uses vector embeddings, which means your agent can recall by meaning, not by keyword.
Consider these stored memories:
"User works with React and Next.js for frontend development"
"The main project uses PostgreSQL 16 with pgvector extension"
"Team uses GitHub Actions for CI/CD, deploys to Railway"
Now the user asks about “setting up the deployment pipeline.” A file search for “deployment pipeline” would miss all three of these. Semantic recall surfaces the Railway/GitHub Actions memory because it understands the concept of deployment, not just the exact words.
This is the fundamental limitation of markdown-based memory: it only works when the agent can pattern-match on exact phrases. Real conversations don’t work that way.
Updating Your OpenClaw Agent
Here’s the practical migration path:
1. Install MemoClaw
npm install -g memoclaw
Or install the OpenClaw skill:
clawhub install anajuliabit/memoclaw
2. Migrate Existing Memory
memoclaw migrate ~/openclaw/workspace/MEMORY.md --namespace my-agent
This parses your markdown file into individual memories with automatic importance scoring.
3. Update AGENTS.md
Remove the “read MEMORY.md on start” instruction. Replace with:
## Memory
Use MemoClaw for persistent memory (namespace: my-agent).
- DON'T preload all memories at session start
- DO recall relevant context when you need it for a specific task
- Store important new information immediately (don't wait for session end)
- Score importance: corrections=0.95, preferences=0.8, context=0.6, observations=0.3
4. Retire MEMORY.md
Keep a backup, stop writing to it, and let MemoClaw handle persistence going forward.
When File-Based Memory Still Makes Sense
External memory isn’t always the right answer:
- SOUL.md and AGENTS.md should stay as files. They’re small, rarely change, and define core behavior. Loading them every session is fine.
- Truly static reference like a list of team members or project repos might be better as a small file than hundreds of individual memories.
- Sensitive data shouldn’t go in any external service. Keep secrets in local, encrypted storage.
The sweet spot is: instructions and identity stay local, dynamic knowledge goes to MemoClaw. If it grows over time, it belongs in external memory.
The Numbers
The exact savings depend on your usage patterns, but here’s a realistic comparison based on the session trace above:
| Metric | MEMORY.md | MemoClaw |
|---|---|---|
| Startup context | 15-25K tokens | 2-3K tokens |
| Per-query context added | 0 (already loaded) | 300-500 tokens |
| Monthly token cost (memory loading) | ~$45 (at 20 sessions/day, math above) | ~$5 (fewer input tokens) |
| Monthly MemoClaw API cost | $0 | $2-5 (at $0.005/call) |
| Net monthly savings | — | ~$35-40 |
The recall quality improvement is harder to quantify, but the “lost in the middle” research gives a clear signal: less noise in the prompt means the model finds and uses the right information more reliably. Semantic search lets you pull exactly what’s relevant instead of hoping the model notices it in a wall of text.
Start Small
You don’t have to migrate everything at once. Start with one agent:
- Install the CLI or skill
- Run
memoclaw migrateon the biggest memory file - Update that agent’s instructions
- Watch recall quality for a week
The free tier gives you 100 API calls — enough to migrate and test without spending anything. After that, it’s $0.005 per store/recall. Set up USDC on Base when you’re ready to go beyond the free tier.
Bigger context windows are a crutch, not a solution. The agents that perform best are the ones that load less context but load the right context. If you want to try it: npm install -g memoclaw, migrate your biggest memory file, and compare the results after a week. The token savings and recall quality speak for themselves.