Cognitive compression: why your agent should summarize memories, not replay transcripts


There are two ways to give an AI agent memory. The first is transcript replay: append the full conversation history to every prompt, growing the context window until you hit the limit or the model starts losing track of things buried in the middle. The second is compressed memory: distill each interaction into compact entries and retrieve only what’s relevant.

Most OpenClaw agents start with transcript replay because it’s the default. Your agent reads the last N messages, maybe some markdown files, and builds context from there. It works for short sessions. It falls apart over days and weeks.

This article is about why compressed memory works better in practice, what it costs, and how to set it up with MemoClaw.

The problem with replaying everything

Transcript replay has three failure modes that get worse over time.

Context window costs. Every token in your prompt costs money. If your agent replays the last 50 messages as context, you’re paying to process all of that text on every interaction. With GPT-4 class models, a 10,000-token conversation history at $0.01/1K input tokens costs $0.10 per message just for context. Do that 100 times a day and you’re spending $10/day on context alone.

The lost-in-the-middle problem. Research (arXiv 2601.11653) shows that language models handle the beginning and end of long contexts well, but struggle with information in the middle. Your agent might ignore a correction you made three days ago because it’s sandwiched between 40 other messages.

Drift and contradiction. Long transcripts contain your old opinions, outdated decisions, and superseded instructions alongside current ones. The model has to figure out which version is current. It often guesses wrong.

The MEMORY.md pattern that many OpenClaw agents use is a step up from raw transcript replay. At least it’s curated. But it still eats context, has no semantic search, and requires you to maintain it by hand. Every byte of MEMORY.md gets loaded into your prompt whether it’s relevant to the current query or not.

What cognitive compression looks like

Instead of replaying history, a compressed memory approach works like this:

  1. Something worth remembering happens during a conversation
  2. The agent extracts the key information into a compact memory entry
  3. That entry gets stored with an importance score and tags
  4. On future interactions, the agent recalls only the memories relevant to the current context

The difference is retrieval vs. replay. With replay, you load everything and hope the model finds what it needs. With compressed memory, you search first and load only what matches.

In practice with MemoClaw, this means your agent stores something like:

memoclaw store "User prefers dark mode in all code editors" --importance 0.7 --tags preferences,editor

And later, when the topic comes up:

memoclaw recall "editor preferences"

The recall returns semantically similar memories ranked by relevance. Your agent gets “User prefers dark mode in all code editors” without loading hundreds of other unrelated memories into context.

Using extract for automatic compression

Storing memories manually works, but the real win is automating extraction. MemoClaw’s extract command takes a block of text (a conversation chunk, a session summary) and pulls out the memorable parts.

In your OpenClaw agent’s workflow, you might add extraction at the end of each session:

memoclaw extract "Session summary: User asked about deploying to Railway. Discussed environment variables, database connection strings, and the importance of not committing .env files. User mentioned they prefer Railway over Fly.io for simple projects. Corrected previous assumption that user uses Heroku."

Extract uses GPT-4o-mini to identify what’s worth storing from that text. It might produce two or three focused memory entries from that paragraph: the Railway preference, the Heroku correction, and the .env concern.

This costs $0.01 per call (GPT-4o-mini + embeddings). More expensive than a raw store ($0.005), but you get intelligent extraction rather than having to decide what’s worth keeping yourself.

The cost comparison

Some rough math.

Transcript replay approach:

  • Average conversation context: 5,000 tokens
  • Cost per interaction (input tokens on GPT-4): ~$0.05
  • 50 interactions/day: $2.50/day in context costs
  • Monthly: ~$75 just for pumping old context through the model

Compressed memory approach:

  • Store 5-10 memories per day: $0.025-$0.05/day
  • Recall 10-20 times per day: $0.05-$0.10/day
  • Weekly consolidation: $0.01/week
  • Monthly extract runs: ~$0.10
  • Monthly total: ~$3-5

That’s roughly 15-20x cheaper. The numbers vary with usage, but the direction is consistent. Compressed memory costs less because you’re storing small entries and retrieving only what’s relevant, instead of feeding the full history through the model every time.

The tradeoff: you lose full conversational context. Your agent won’t remember the exact phrasing you used three weeks ago, just the gist. For most use cases, that’s fine. For legal or compliance work where exact wording matters, it’s not. Know your use case.

Setting this up in OpenClaw

A practical setup for an OpenClaw agent using compressed memory:

1. In your AGENTS.md, point memory to MemoClaw:

## Memory
Do NOT use local markdown files for storing memories.
Use MemoClaw for ALL durable memory.

### How to remember things
- Store: `memoclaw store "what you learned" --importance 0.8 --tags tag1,tag2`
- Recall: `memoclaw recall "query"` before making assumptions
- Always recall before storing to avoid duplicates

2. Add a session-end extraction cron:

openclaw cron add --cron "0 */4 * * *" \
  --message "Review recent conversation context. Use memoclaw extract to distill any new learnings, preferences, or corrections into stored memories." \
  --name "session-extraction"

3. Set up weekly consolidation:

openclaw cron add --cron "0 3 * * 0" \
  --message "Run memoclaw consolidate for all namespaces to merge duplicate and outdated memories." \
  --name "weekly-consolidation"

4. Scope recalls by namespace and tags when you can:

memoclaw recall "deployment preferences" --namespace work --tags infrastructure

Scoped recalls are faster and more accurate because they search a smaller set of memories.

Limitations worth knowing

MemoClaw stores text memories up to 8,192 characters each. Plenty for compressed entries, but you can’t dump an entire document into a single memory. If you need document-level retrieval, you want a RAG solution, not MemoClaw.

The free tier gives you 100 API calls per wallet. Enough to test the pattern, but a moderately active agent will burn through it in a week or two. After that, you’re on pay-per-use with USDC on Base.

Extract and consolidate both rely on GPT-4o-mini. It’s good at summarization but sometimes drops nuance or merges things that should stay separate. Review the output periodically, especially when you’re first setting things up.

Why this matters

The agents that work well over long periods have good memory hygiene. Transcript replay is the path of least resistance, and it works until it doesn’t. Usually right around the point where your agent starts contradicting itself or forgetting corrections you made weeks ago.

Compressed memory takes more setup, but the payoff is an agent that gets sharper over time instead of noisier. Recall stays fast and relevant. Costs stay predictable. And you stop wasting tokens on context your agent doesn’t need for the current conversation.

The research backs this up, the cost math backs this up, and if you’ve ever watched an agent hallucinate because it got confused by its own conversation history, your experience backs it up too.