The three-tier memory stack: working memory, session recall, and long-term storage
Your brain doesn’t dump everything into one giant pile. It runs a layered system — sensory input flows into working memory, important bits get consolidated into short-term recall, and the stuff that matters makes it to long-term storage. Your OpenClaw agent should work the same way.
Most agent builders treat memory as a single problem: either cram everything into the context window or bolt on a database and hope for the best. Neither works. What you need is a three-tier architecture where each layer has a specific job, specific constraints, and specific tools.
The Three Tiers
Tier 1: Working Memory (The Context Window)
This is what your agent can see right now. The current conversation, the system prompt, any files loaded into context. It’s fast, it’s immediate, and it’s expensive.
Every token in the context window costs compute. On a 128K model, you might think space is cheap — until your agent loads MEMORY.md, three skill files, a conversation history, and suddenly you’ve burned 40K tokens before the user even says hello.
Working memory should contain:
- The current conversation
- Active task context (what you’re doing right now)
- System prompt and persona
- A small set of relevant memories pulled from Tier 2
What it should NOT contain:
- Your agent’s entire history
- Every preference ever recorded
- Full project documentation
- Raw conversation logs from past sessions
Tier 2: Session Recall (MemoClaw Queries)
This is the bridge. During a conversation, your agent queries MemoClaw to pull relevant context on demand. Instead of pre-loading everything, you search for what matters when it matters.
# Agent needs to remember user's deployment preferences
memoclaw recall "deployment preferences" --limit 5
# Agent is working on the billing project — get project context
memoclaw recall "billing project architecture" --namespace projects --limit 3
Session recall costs $0.005 per query. That’s half a cent to pull exactly the memories your agent needs, instead of burning thousands of tokens loading everything into context.
The key insight: recall is cheap, context is expensive. A single recall query costs $0.005. Loading the same information as static context costs tokens every single turn for the rest of the conversation.
Tier 3: Long-Term Storage (The Full Memory Store)
Everything your agent has ever learned lives here. User preferences, project context, corrections, session summaries, learned patterns. This is the archive — searchable, tagged, namespaced, but never loaded wholesale into context.
# Store a session summary at end of conversation
memoclaw store "User prefers TypeScript over Python. Uses Vercel for deployment. \
Timezone is PST. Prefers concise responses over detailed explanations." \
--tags "preferences,user-profile" --importance 0.9
# Store a project decision
memoclaw store "Decided to use PostgreSQL instead of MongoDB for the billing service. \
Reason: need ACID transactions for payment processing." \
--tags "architecture,billing-project" --namespace projects --importance 0.8
Storage costs $0.005 per memory. Store what matters, tag it well, and let semantic search do the work later.
Wiring It Together in OpenClaw
Here’s how to implement the three tiers in your agent’s AGENTS.md:
Step 1: Keep Working Memory Lean
In your AGENTS.md, resist the urge to load everything at startup:
## Every Session
1. Read `SOUL.md` — who you are
2. Read `USER.md` — who you're helping
3. Read `memory/today.md` — what happened recently
That's it. Don't load MEMORY.md wholesale. Query MemoClaw for what you need.
Step 2: Build Recall Triggers
Teach your agent when to query Tier 2. Common triggers:
- New topic introduced → recall related context
- User mentions a project → recall from that namespace
- Agent is uncertain → recall past corrections
- Task requires preferences → recall user profile
In practice, your agent’s instructions might include:
## Memory Protocol
When the user brings up a topic you haven't discussed this session:
1. Run `memoclaw recall "<topic>" --limit 3` to check for relevant context
2. If relevant memories exist, factor them into your response
3. Don't mention that you "looked this up" — just know it
Step 3: Store at Session End
The most important moment in memory management is the end of a conversation. That’s when you consolidate:
# Summarize what happened and store it
memoclaw store "Session 2026-03-10: Helped user refactor auth middleware. \
Key decisions: switched from JWT to session tokens, added rate limiting at 100 req/min. \
User was frustrated with the existing code quality — be gentler with feedback next time." \
--tags "session-summary,auth-project" --importance 0.7
# Store any new preferences discovered
memoclaw store "User prefers seeing test examples before implementation code" \
--tags "preferences,communication" --importance 0.8
Step 4: Use Importance Scoring to Separate Signal from Noise
Not all memories are equal. Importance scoring (0 to 1) determines what surfaces first in recall:
| Importance | What Goes Here |
|---|---|
| 0.9-1.0 | Core identity, critical corrections, immutable facts |
| 0.7-0.8 | User preferences, project decisions, session summaries |
| 0.4-0.6 | Useful context, patterns noticed, general knowledge |
| 0.1-0.3 | Nice-to-know, temporary context, experimental notes |
# Critical correction — always surface this
memoclaw store "NEVER deploy to production on Fridays — user's company policy" \
--importance 1.0 --tags "rules,deployment"
# Lock it so it can't be accidentally modified
memoclaw lock <id>
The Economics
Let’s do the math for a typical day:
| Operation | Count | Cost Per | Total |
|---|---|---|---|
| Session recalls | 20 queries | $0.005 | $0.10 |
| End-of-session stores | 5 memories | $0.005 | $0.025 |
| Context queries | 2 deep lookups | $0.01 | $0.02 |
Total: $0.145/day for an agent with excellent memory.
Compare that to stuffing a 10K-token MEMORY.md into every conversation. At typical API rates, you’re paying for those 10K tokens on every single turn. Over 50 turns in a day, that’s 500K tokens of repeated context. The three-tier approach queries only what’s needed, when it’s needed.
And remember: every wallet gets 100 free API calls. That’s enough to run the three-tier pattern for several days of light usage before any payment kicks in.
Common Mistakes
Loading everything into Tier 1. If your AGENTS.md says “read MEMORY.md every session,” you’re treating Tier 3 as Tier 1. Stop it.
Never storing to Tier 3. Some agents recall beautifully but never store. They’re consumers, not learners. Build store triggers into session-end routines.
Skipping Tier 2 entirely. Going straight from “I know nothing” to “let me load the entire database” misses the point. Recall should be surgical — 3-5 relevant memories, not a data dump.
Ignoring namespaces. If your agent works on multiple projects, namespace your memories. A recall for “database schema” shouldn’t return results from every project you’ve ever touched.
# Good: namespaced recall
memoclaw recall "database schema" --namespace billing-project --limit 3
# Bad: searching everything
memoclaw recall "database schema" --limit 3
The Mental Model
Think of it this way:
- Tier 1 (Working Memory) = Your desk. Only what you’re actively working on.
- Tier 2 (Session Recall) = Your filing cabinet. Walk over and grab what you need.
- Tier 3 (Long-Term Storage) = The warehouse. Everything’s there, indexed and searchable, but you don’t bring the whole warehouse to your desk.
The best agents are the ones that know when to reach for the filing cabinet and when to leave the warehouse alone. Build yours the same way.
MemoClaw is Memory-as-a-Service for AI agents. Store and recall memories with semantic search — no API keys, no registration, just a wallet. Start with 100 free calls at memoclaw.com.