The context window tax — measuring what MEMORY.md actually costs you per message
Every OpenClaw agent starts the same way: a MEMORY.md file in the workspace root. It’s simple, it works, and it’s eating your context window alive.
I wanted actual numbers on this, so I measured it.
What the tax looks like
When your agent loads MEMORY.md at session start, the entire file gets injected into the context window. A modest MEMORY.md after a few months of use hits 3,000 to 5,000 tokens. A well-used one? 10,000+.
That’s not free. Every token in context costs money and displaces tokens that could carry actual conversation or reasoning.
Here’s a quick way to check your own damage:
# Rough token count (1 token ≈ 4 characters for English text)
wc -c ~/.openclaw/workspace/MEMORY.md | awk '{printf "~%.0f tokens\n", $1/4}'
A typical OpenClaw workspace after 3 months:
| File | Size | ~Tokens |
|---|---|---|
| MEMORY.md | 18 KB | 4,500 |
| AGENTS.md | 3 KB | 750 |
| SOUL.md | 1 KB | 250 |
| USER.md | 2 KB | 500 |
| Total context overhead | 24 KB | 6,000 |
That 6,000 tokens rides along with every single message. Exchange 40 messages in a session and you’ve burned 240,000 tokens on static context, most of it irrelevant to what you’re actually talking about.
How it compounds
The cost hits you in ways that aren’t immediately obvious.
Direct cost. You pay per input token. At Claude Sonnet pricing (~$3/M input tokens), 6,000 tokens across 40 messages is $0.72 per session just for MEMORY.md overhead. That adds up if your agent runs all day.
Displacement. Those 6,000 tokens could hold ~4,500 words of actual conversation. Long sessions hit context limits faster, forcing summarization or dropping earlier messages.
Attention dilution. LLMs don’t weight all context equally. Dumping 4,500 tokens of loosely-related memories means the model spreads attention across your grocery preferences and your deployment configuration. Retrieval quality drops, sometimes noticeably.
It only gets worse. MEMORY.md grows. Three months from now, it’s bigger. Six months? You’re either truncating it manually or burning even more tokens.
What selective retrieval looks like
Instead of loading everything, retrieve what’s relevant to the current conversation:
memoclaw recall "deployment configuration for the staging server"
A typical recall returns 3 to 5 relevant memories, maybe 200 to 400 tokens total. That’s over 90% less context than the full MEMORY.md dump.
The token math:
MEMORY.md approach:
4,500 tokens × 40 messages = 180,000 tokens/session
Semantic recall:
~300 tokens × 40 messages = 12,000 tokens/session
Savings: 168,000 tokens/session (93% reduction)
The relevance gap
Token savings matter, but the bigger win is relevance.
Your agent needs to remember your preferred deploy target. With MEMORY.md, it sifts through entries about your timezone, your pet’s name, last week’s meeting notes, and three different project contexts to find the one line about deploys.
With semantic search:
memoclaw recall "deploy target preference" --top 3
Returns:
[0.94] "User prefers deploying to Railway for production, Fly.io for staging"
[0.87] "Always run migrations before deploying to staging"
[0.82] "Staging URL: staging.memoclaw.com"
Three results. All relevant. ~150 tokens total. The model gets exactly what it needs without the noise.
Making the switch
You don’t have to go cold turkey.
1. Install the CLI
npm install -g memoclaw
2. Migrate your existing MEMORY.md
memoclaw migrate ~/.openclaw/workspace/MEMORY.md
This parses your markdown, splits it into individual memories, generates embeddings, and stores them. One command.
3. Update your agent’s behavior
In your AGENTS.md, replace “read MEMORY.md” with recall:
# Before
Every session: read MEMORY.md for context.
# After
When you need context about a topic, use memoclaw recall "<topic>"
to retrieve relevant memories instead of loading everything.
4. Keep MEMORY.md as a buffer (optional)
Some people keep a slim MEMORY.md for truly critical context (name, timezone, top 3 preferences) and use MemoClaw for everything else. This hybrid approach works. Just keep the file under 500 tokens.
The numbers
| Metric | MEMORY.md | Semantic recall | Difference |
|---|---|---|---|
| Tokens per message | ~4,500 | ~300 | 93% less |
| Relevance | Low (full dump) | High (targeted) | Night and day |
| Scales with time | Gets worse | Stays constant | Sustainable |
| Setup effort | Zero | 5 minutes | One-time cost |
The context window tax grows every day your agent accumulates memories. Semantic recall isn’t about fancy technology. It’s about not wasting tokens on context your agent doesn’t need right now.
Your context window is expensive real estate. Stop filling it with everything and start filling it with what matters.
MemoClaw offers 100 free API calls to get started. No registration, no API keys. Get started