Mar 9, 2026

Your MEMORY.md is eating your tokens

If your MEMORY.md is 3,000 tokens and you’re on Claude Sonnet, you’re paying about $0.009 per message just to load memories. Not to use them. Not to think about them. Just to have them sitting in context.

$0.009 doesn’t sound like much. Multiply it by 50 messages a day: $0.45. Over a month: ~$13.50. That’s the memory tax — tokens spent on context your agent skimmed past to find the one line that actually mattered.

Here’s the thing: $13.50/month isn’t going to bankrupt anyone. But it’s $13.50 for something that gets worse over time, not better. And the token cost is only part of the problem.

How MEMORY.md works (and why it doesn’t scale)

Every OpenClaw agent follows roughly the same pattern. AGENTS.md says “read MEMORY.md at session start.” The agent loads the whole file into context. Every session. Every message.

When MEMORY.md is 20 lines, this is fine. After a month of daily use, it’s 2,000-4,000 tokens. I’ve seen files over 10,000.

Here’s a typical token budget for an active session:

System prompt:       ~500 tokens
AGENTS.md:           ~800 tokens
MEMORY.md:          ~3,000 tokens
USER.md:             ~200 tokens
Conversation so far: ~2,000 tokens

MEMORY.md is almost half the input. On any given message, maybe 5% of those memories are relevant to what the agent is actually doing.

Less room for the actual conversation. Less room for tool outputs. Less room for thinking.

What selective recall looks like

MemoClaw stores memories as individual records with vector embeddings. Instead of loading everything, your agent queries for what’s relevant:

memoclaw recall "user's deployment preferences" --limit 3

Returns:

┌────┬───────────────────────────────────────┬────────────┬───────┐
│ ID │ Content                               │ Importance │ Score │
├────┼───────────────────────────────────────┼────────────┼───────┤
│ 42 │ Deploy to Railway, not Fly.io.        │ 0.8        │ 0.93  │
│ 15 │ Always use Dockerfile, never buildpacks│ 0.7        │ 0.87  │
│ 73 │ Production branch is 'main', not      │ 0.5        │ 0.81  │
│    │ 'master'. Changed on 2026-02-10.      │            │       │
└────┴───────────────────────────────────────┴────────────┴───────┘

Three memories. ~80 tokens. Exactly what the agent needed for this conversation, nothing else.

The math

Let’s compare a realistic day. Your agent handles 40 messages, and about half need memory context.

MEMORY.md:

3,000 tokens loaded every message (whether needed or not)
40 messages × 3,000 tokens = 120,000 tokens/day on memory
Claude Sonnet at $3/M input: $0.36/day
Monthly: ~$10.80

MemoClaw:

20 recalls/day × $0.005 = $0.10/day
Each recall returns ~100-200 tokens instead of 3,000
Monthly recall cost: ~$3.00

The savings scale with file size. A 6,000-token MEMORY.md doubles the waste. A 10,000-token file triples it. MemoClaw costs stay flat because you’re only pulling what’s relevant.

Speed, not just cost

More input tokens means longer time-to-first-token. If your agent has been getting slower over the weeks, check how big MEMORY.md has gotten. Shaving 2,800 tokens off every request won’t make responses instant, but it adds up — especially on models with higher latency per input token.

A MemoClaw recall takes ~200-400ms. That’s a network round trip the agent didn’t have before. But the model’s thinking time is measured in seconds, so the recall latency is usually invisible in practice.

Setting it up

Install the skill:

clawhub install anajuliabit/memoclaw

Update your AGENTS.md. Replace:

Read MEMORY.md at session start.

With:

Use memoclaw recall to fetch relevant context when needed.
Don't load all memories at once — query for what's relevant to the current task.

Your agent now has store and recall tools. It calls recall when it needs context, store when it learns something new. No bulk file loading.

Migrating existing memories

If you’ve got a MEMORY.md with useful stuff in it, you don’t have to start from scratch. MemoClaw can ingest markdown files:

memoclaw migrate ~/path/to/MEMORY.md

This parses the file into individual memories with importance scores and stores them. After migration, each memory is a separate searchable record instead of a line in a big file.

Review the results after migration. The parser does a reasonable job, but you’ll probably want to adjust some importance scores and clean up a few entries that didn’t split well.

The free tier

MemoClaw gives you 100 free API calls. Store and recall each cost one call. At 20 recalls/day, you’ll hit the limit in about 5 days.

After that, pay-per-request with USDC on Base. No subscription. $0.005 per recall, $0.005 per store. The payment happens automatically via x402 — your agent’s wallet handles it.

memoclaw status

When to stick with MEMORY.md

If your file is under 500 tokens and isn’t growing, the file approach works fine. The overhead is small enough that the simplicity wins.

The crossover point is around 1,000-1,500 tokens. Below that, a flat file is simpler. Above that, you’re paying an increasing tax on every message for context your agent mostly ignores.

If your MEMORY.md is already past 3,000 tokens, you’re past the crossover. The longer you wait, the more tokens you burn.

Your agent’s context window is expensive real estate. MEMORY.md fills it with everything whether you need it or not. MemoClaw fills it with just what’s relevant. The difference shows up in your API bill and your response times.

Start free at memoclaw.com or install the OpenClaw skill.