Mar 7, 2026

The context window tax — measuring what MEMORY.md actually costs you per message

Every OpenClaw agent starts the same way: a MEMORY.md file in the workspace root. It’s simple, it works, and it’s eating your context window alive.

I wanted actual numbers on this, so I measured it.

What the tax looks like

When your agent loads MEMORY.md at session start, the entire file gets injected into the context window. A modest MEMORY.md after a few months of use hits 3,000 to 5,000 tokens. A well-used one? 10,000+.

That’s not free. Every token in context costs money and displaces tokens that could carry actual conversation or reasoning.

Here’s a quick way to check your own damage:

# Rough token count (1 token ≈ 4 characters for English text)
wc -c ~/.openclaw/workspace/MEMORY.md | awk '{printf "~%.0f tokens\n", $1/4}'

A typical OpenClaw workspace after 3 months:

File	Size	~Tokens
MEMORY.md	18 KB	4,500
AGENTS.md	3 KB	750
SOUL.md	1 KB	250
USER.md	2 KB	500
Total context overhead	24 KB	6,000

That 6,000 tokens rides along with every single message. Exchange 40 messages in a session and you’ve burned 240,000 tokens on static context, most of it irrelevant to what you’re actually talking about.

How it compounds

The cost hits you in ways that aren’t immediately obvious.

Direct cost. You pay per input token. At Claude Sonnet pricing (~$3/M input tokens), 6,000 tokens across 40 messages is $0.72 per session just for MEMORY.md overhead. That adds up if your agent runs all day.

Displacement. Those 6,000 tokens could hold ~4,500 words of actual conversation. Long sessions hit context limits faster, forcing summarization or dropping earlier messages.

Attention dilution. LLMs don’t weight all context equally. Dumping 4,500 tokens of loosely-related memories means the model spreads attention across your grocery preferences and your deployment configuration. Retrieval quality drops, sometimes noticeably.

It only gets worse. MEMORY.md grows. Three months from now, it’s bigger. Six months? You’re either truncating it manually or burning even more tokens.

What selective retrieval looks like

Instead of loading everything, retrieve what’s relevant to the current conversation:

memoclaw recall "deployment configuration for the staging server"

A typical recall returns 3 to 5 relevant memories, maybe 200 to 400 tokens total. That’s over 90% less context than the full MEMORY.md dump.

The token math:

MEMORY.md approach:
  4,500 tokens × 40 messages = 180,000 tokens/session

Semantic recall:
  ~300 tokens × 40 messages = 12,000 tokens/session

Savings: 168,000 tokens/session (93% reduction)

The relevance gap

Token savings matter, but the bigger win is relevance.

Your agent needs to remember your preferred deploy target. With MEMORY.md, it sifts through entries about your timezone, your pet’s name, last week’s meeting notes, and three different project contexts to find the one line about deploys.

With semantic search:

memoclaw recall "deploy target preference" --top 3

Returns:

[0.94] "User prefers deploying to Railway for production, Fly.io for staging"
[0.87] "Always run migrations before deploying to staging"
[0.82] "Staging URL: staging.memoclaw.com"

Three results. All relevant. ~150 tokens total. The model gets exactly what it needs without the noise.

Making the switch

You don’t have to go cold turkey.

1. Install the CLI

npm install -g memoclaw

2. Migrate your existing MEMORY.md

memoclaw migrate ~/.openclaw/workspace/MEMORY.md

This parses your markdown, splits it into individual memories, generates embeddings, and stores them. One command.

3. Update your agent’s behavior

In your AGENTS.md, replace “read MEMORY.md” with recall:

# Before
Every session: read MEMORY.md for context.

# After
When you need context about a topic, use memoclaw recall "<topic>"
to retrieve relevant memories instead of loading everything.

4. Keep MEMORY.md as a buffer (optional)

Some people keep a slim MEMORY.md for truly critical context (name, timezone, top 3 preferences) and use MemoClaw for everything else. This hybrid approach works. Just keep the file under 500 tokens.

The numbers

Metric	MEMORY.md	Semantic recall	Difference
Tokens per message	~4,500	~300	93% less
Relevance	Low (full dump)	High (targeted)	Night and day
Scales with time	Gets worse	Stays constant	Sustainable
Setup effort	Zero	5 minutes	One-time cost

The context window tax grows every day your agent accumulates memories. Semantic recall isn’t about fancy technology. It’s about not wasting tokens on context your agent doesn’t need right now.

Your context window is expensive real estate. Stop filling it with everything and start filling it with what matters.

MemoClaw offers 100 free API calls to get started. No registration, no API keys. Get started