The case against MEMORY.md: when flat files stop scaling


You’ve probably got one. A MEMORY.md sitting in your OpenClaw workspace, growing a little fatter each session. Your agent reads it on startup, appends stuff at the end, and somehow this is supposed to be “persistent memory.”

It works. Until it doesn’t.

I hit the wall around the 800-line mark. My agent’s MEMORY.md had become a graveyard of outdated preferences, duplicated context, and things I’d corrected months ago that it kept re-reading anyway. Every session burned tokens loading the whole thing, and half of it was irrelevant to whatever I was actually doing.

If this sounds familiar, here’s what’s going wrong and what you can do about it.

The problem isn’t the file. It’s the access pattern.

MEMORY.md is a sequential text file. Your agent reads it top to bottom. That means:

  1. Everything gets loaded every time. Working on a Python project? Cool, your agent is also loading memories about your SSH config, your pizza preferences, and that one time you told it to stop using em dashes. Every single session.

  2. There’s no relevance filtering. Line 47 might be exactly what your agent needs right now, but it has no way to know that without reading all 600 other lines first. And by “reading,” I mean spending your tokens on it.

  3. Old memories never die. You corrected your agent’s behavior in March. The original wrong behavior is still in the file from February. Both get loaded. Your agent has to figure out which one is current, and it doesn’t always get that right.

  4. There’s a hard ceiling. Context windows are big now, but they’re not infinite. A 50KB MEMORY.md eats roughly 12,000 tokens before your agent even starts thinking about your actual request. That’s real money if you’re on a pay-per-token model, and real quality degradation even if you’re not.

”I’ll just organize it better”

I tried this. Sections, headers, date stamps, periodic cleanup. It helps for a while. But you’re fighting the fundamental issue: a flat file has no semantic structure. Your agent can’t search it. It can only read it.

You end up becoming a memory janitor. Every week you’re in there pruning stale entries, reorganizing sections, wondering if it’s safe to delete that note from October. This is maintenance work that your agent should be doing for you, not the other way around.

What semantic memory actually looks like

The fix is to stop treating agent memory as a document and start treating it as a database you can query. Store memories individually, tag them, score them by importance, and let your agent pull only what’s relevant to the current conversation.

Here’s what that looks like with MemoClaw:

# Store a memory with importance and tags
memoclaw store "Ana prefers dark mode in all editors" \
  --importance 0.7 \
  --tags preferences,editor

# Store a correction with high importance
memoclaw store "CORRECTION: Use spaces not tabs for Python files" \
  --importance 0.9 \
  --tags preferences,python,correction

When your agent needs context, it recalls by relevance instead of loading everything:

# Only pulls memories relevant to the query
memoclaw recall "What are Ana's Python coding preferences?"

That recall returns maybe 3-5 memories, scored by semantic similarity. Not 800 lines. Not your SSH config. Just the stuff that matters for this specific question.

Importance scoring changes everything

The part that surprised me most was importance scoring. In a flat file, every line has equal weight. Your pizza topping preferences sit alongside your deployment credentials format, and your agent treats them the same.

With scored memories, your agent can store a casual preference at 0.3 and a critical correction at 0.9. When context is tight, the important stuff surfaces first.

# Low importance - nice to know
memoclaw store "Ana likes her coffee black" \
  --importance 0.2 --tags personal

# High importance - affects work output
memoclaw store "Always run tests before committing to main branch" \
  --importance 0.9 --tags workflow,git

Namespaces keep projects from bleeding into each other

Another thing that flat files can’t do: isolation. If you work on multiple projects, MEMORY.md becomes a soup of mixed context. Namespaces let you keep memories separate:

# Store in a project-specific namespace
memoclaw store "API uses JWT auth with RS256" \
  --namespace project-atlas \
  --tags auth,api

# Recall only searches within the namespace
memoclaw recall "How does auth work?" --namespace project-atlas

Your agent gets clean, focused context for each project without cross-contamination.

The migration takes about 10 minutes

If you’re sitting on an existing MEMORY.md, you don’t have to start from scratch. MemoClaw has a migrate command that parses your file, splits it into individual memories, and scores them:

memoclaw migrate ./MEMORY.md --namespace default

It’s not perfect. You’ll want to review the importance scores it assigns and adjust a few. But it gets you 80% of the way there in one command.

What about cost?

Fair question. MEMORY.md is free (ignoring the token cost of loading it every session, which you’re already paying). MemoClaw gives you 100 free API calls to try it out. After that, it’s $0.005 per store or recall. Your wallet is your identity, no registration or API keys needed.

For context: if your agent stores 5 memories and recalls 10 per session, that’s $0.075 per session. Compare that to the token cost of loading a large MEMORY.md into context every time, and it’s roughly a wash. Except now your agent actually gets relevant context instead of everything.

When to stick with MEMORY.md

I’m not going to pretend flat files are always wrong. If your agent’s memory fits comfortably in under 200 lines and you only work on one project, MEMORY.md is fine. It’s simple, it’s local, and there’s nothing to set up.

The tipping point is when you notice your agent forgetting things it should know, or when you find yourself manually editing the file to keep it useful. That’s when the flat file model breaks down, and a queryable memory store starts earning its keep.

Getting started

Install the CLI and try a few stores and recalls against your existing workflow:

npm install -g memoclaw
memoclaw store "Testing semantic memory" --importance 0.5
memoclaw recall "semantic memory"

If you’re using OpenClaw, you can also install the skill from ClawHub and let your agent use it directly: https://clawhub.ai/anajuliabit/memoclaw

The docs have more detail on tags, namespaces, and batch operations: https://docs.memoclaw.com