Why your agent is burning tokens on context it already knows
Every time your OpenClaw agent wakes up, it loads your MEMORY.md into the prompt. The whole thing. Every preference, every past decision, every lesson you wrote down six months ago about a project that shipped in November.
All of it. Every single turn.
What this actually costs
A typical MEMORY.md grows to about 5KB within a few weeks. That’s roughly 1,500 tokens.
At $3 per million input tokens (Claude Sonnet pricing), that’s $0.0045 per turn just for memory. Sounds like nothing. But agents don’t do one turn — they do thousands.
| Usage | Turns/month | Memory tokens burned | Cost/month |
|---|---|---|---|
| Light | 1,000 | 1.5M | $4.50 |
| Moderate | 5,000 | 7.5M | $22.50 |
| Heavy | 15,000 | 22.5M | $67.50 |
That’s memory alone. Not your actual work. Not tool calls. Just loading the same file over and over.
And most of that context is irrelevant. You ask the agent to check your calendar and it’s reading three paragraphs about your SSH server configuration.
How MEMORY.md works (and where it breaks down)
OpenClaw’s AGENTS.md tells your agent to load MEMORY.md every main session. It’s a flat file — no structure, no filtering, no relevance scoring. The agent gets everything or nothing.
This worked fine when agents were simple and memories were short. But memories pile up. A few weeks in, your file looks like:
## User preferences
- Prefers dark mode
- Timezone: UTC+1
- Uses neovim
## Project: API migration
- Switched from REST to GraphQL in January
- Database is on Neon
- Deployment via Railway
## SSH hosts
- home-server: 192.168.1.100
- staging: api-staging.example.com
## Lessons learned
- Don't run migrations during peak hours
- The calendar API rate-limits at 100 req/min
- Ana prefers bullet points over long paragraphs
When you ask “what meetings do I have tomorrow?” the agent loads all of this. The SSH hosts, the migration notes, the neovim preference. All of it costs tokens, none of it helps answer the question.
What selective recall looks like
Instead of loading a flat file, you store memories in a service that can search by meaning. When the agent needs context, it asks for what’s relevant and gets back only that.
Here’s the difference with MemoClaw installed as an OpenClaw skill:
Before — every turn loads everything:
[System prompt loads MEMORY.md: 1,500 tokens]
User: "What's the status of the API migration?"
Agent: *reads through SSH configs, editor preferences,
calendar quirks to find the migration notes*
After — agent recalls what it needs:
User: "What's the status of the API migration?"
Agent calls: recall("API migration status")
→ Returns: 2 relevant memories, ~200 tokens
200 tokens instead of 1,500. Per turn. Just for the memory portion.
In practice, this cuts memory token usage by 60-80%. The agent fetches 2-5 relevant memories instead of loading 40+ unrelated ones.
Setting it up
Install the MemoClaw skill on your OpenClaw agent:
openclaw skill install memoclaw
Migrate your existing memories:
memoclaw migrate --file ~/.openclaw/workspace/MEMORY.md
This parses your MEMORY.md, splits it into individual memories, and stores each one with semantic embeddings. Your agent can now search by meaning instead of loading by file.
After migration, your agent uses recall and store instead of reading and editing a markdown file. No more stale context in every turn.
The math with selective recall
Same usage tiers, but now the agent only pulls relevant memories:
| Usage | Turns/month | Memory tokens/turn | Cost/month | Savings |
|---|---|---|---|---|
| Light | 1,000 | ~300 | $0.90 | 80% |
| Moderate | 5,000 | ~300 | $4.50 | 80% |
| Heavy | 15,000 | ~300 | $13.50 | 80% |
MemoClaw API calls cost $0.005 each for recall. At 5,000 turns that’s $25 in API costs. You’re comparing that against $22.50 in pure waste — tokens your model reads and ignores. And recall cost stays flat regardless of how large your memory grows, while MEMORY.md only gets bigger.
Where this really matters is at scale. A 15KB MEMORY.md (not unusual after months of use) burns ~4,500 tokens per turn. That’s $202.50/month on a heavy-use agent, just for context loading. Selective recall keeps it at ~300 tokens regardless.
What this doesn’t fix
A few honest caveats:
- Each memory is capped at 8,192 characters. If you’re storing entire documents, you want a RAG solution instead.
- You need a crypto wallet with USDC on Base for anything past the free tier (100 calls).
- Recall quality depends on how well you stored the memories. Bad input, bad output.
- There’s no real-time sync. Your agent calls the API when it needs context.
So why keep paying for it?
MEMORY.md is a fine starting point. But it doesn’t scale. Every byte you add makes every turn more expensive, whether the agent needs that context or not.
Semantic recall flips it: store everything, fetch only what matters. Your agent’s context stays small, your costs stay predictable, and you stop paying to load the same SSH config 5,000 times a month.
Migration takes about two minutes.
MemoClaw — memory for AI agents. Docs · Install skill · CLI