Mar 9, 2026

Pre-loading memories: onboarding a new agent with institutional knowledge

You set up a new OpenClaw agent. It reads AGENTS.md, SOUL.md, USER.md. It has personality, it has instructions. What it doesn’t have is any idea what happened last week, what your codebase looks like, or that you hate tabs.

The cold-start problem is real. Every new agent is a blank slate with a job description. It knows what to do but not how you do it.

Most people solve this by stuffing AGENTS.md full of context. Coding conventions, project history, past decisions, client preferences. The file grows to 30KB, 50KB. The agent loads all of it every single session, whether it needs any of it or not.

There’s a better approach.

The AGENTS.md trap

AGENTS.md is great for instructions. It’s terrible for institutional knowledge.

Instructions are things like “always use Railway for deployments” or “commit messages follow conventional commits.” Short, directive, relevant every session.

Institutional knowledge is different. It’s the hundred decisions you made over six months. Why you chose Postgres over MongoDB. That the payments service has a known race condition on concurrent refunds. That client X prefers weekly updates on Mondays.

Putting all of this in AGENTS.md means your agent burns tokens reading about the payments race condition when it’s writing a blog post. The context window fills with irrelevant history and the agent has to sort signal from noise before it even starts working.

At 50KB, AGENTS.md is roughly 12,000 tokens. That’s loaded before every single turn. Over a hundred-turn session, your agent has read 1.2 million tokens of the same file. Most of those reads were wasted.

memoclaw migrate

MemoClaw’s migrate command lets you bulk-import markdown files as memories with semantic embeddings. Instead of cramming everything into one file, you break knowledge into individual memories that get recalled only when relevant.

memoclaw migrate ./docs/decisions/ --tags onboarding,decisions --importance 0.7

This walks the directory, reads each markdown file, and stores the contents as separate memories. Each one gets embedded for semantic search. When your agent needs to know why you chose Postgres, it recalls that specific memory. When it’s writing a blog post, that memory stays out of the way.

You can also migrate a single file:

memoclaw migrate ./LEGACY_CONTEXT.md --tags legacy,project-alpha --importance 0.6

The importance score matters here. Not all institutional knowledge is equal. Active project decisions should sit at 0.7-0.8. Historical context that rarely comes up can be 0.5-0.6. Corrections and hard-won lessons should be 0.9+.

Structuring pre-loaded memories

Dumping everything into one namespace with the same tags defeats the purpose. You want your agent to recall the right memories at the right time.

Here’s how I structure onboarding imports:

Project decisions (importance 0.7-0.8):

memoclaw store "Chose Railway over Fly.io for deployments because Railway handles monorepo builds natively. Decided 2026-01-15." \
  --tags decisions,infrastructure --importance 0.7

Coding conventions (importance 0.8):

memoclaw store "All API responses use camelCase keys. Snake_case in DB, camelCase in JSON. No exceptions." \
  --tags conventions,api --importance 0.8

Known issues (importance 0.7):

memoclaw store "Payments service has a race condition on concurrent refunds to the same order. Use distributed lock on order_id before processing." \
  --tags bugs,payments,known-issues --importance 0.7

Client preferences (importance 0.8):

memoclaw store "Client X wants weekly status updates delivered Monday mornings. Prefers bullet points over prose. CC their PM on everything." \
  --tags client-x,preferences --importance 0.8

The tags matter because your agent can filter recalls:

memoclaw recall "how should I handle refunds" --tags payments

This returns the race condition warning without pulling in unrelated memories about client preferences or deployment choices.

The 50KB file vs targeted recall

Let me make this concrete. Say your agent needs to fix a bug in the payments service.

With a 50KB AGENTS.md: The agent loads everything. Blog post guidelines, client communication preferences, frontend color palette decisions, deployment procedures, and somewhere in the middle, the note about the refund race condition. The model might catch it. It might not. Research on “lost in the middle” shows that LLMs perform worst on information buried in large contexts.

With pre-loaded memories: The agent describes the task, recalls relevant memories, and gets back the specific race condition warning plus any related payments context. Nothing about blog posts or color palettes. Focused, relevant context.

The token math is straightforward. A targeted recall might return 500-1000 tokens of relevant context. The 50KB file loads 12,000 tokens of everything. And the recall result is actually about what the agent is working on.

A practical onboarding workflow

When I set up a new agent for an existing project, the process looks like this:

First, I write a minimal AGENTS.md with just instructions. Personality, communication style, core rules. Under 5KB.

Then I prepare the institutional knowledge. I go through old decision docs, past agent memories, project READMEs, and anything that represents learned context.

# Import project decisions
memoclaw migrate ./docs/decisions/ --tags onboarding,decisions --importance 0.7

# Import known issues
memoclaw migrate ./docs/known-issues/ --tags onboarding,known-issues --importance 0.7

# Import coding standards
memoclaw store "$(cat ./CONVENTIONS.md)" --tags onboarding,conventions --importance 0.8

# Import key corrections from the previous agent
memoclaw migrate ./old-agent/corrections/ --tags onboarding,corrections --importance 0.9

Finally, I add a recall step to the agent’s workflow. In AGENTS.md:

Before starting any task, recall relevant memories:
  memoclaw recall "<brief description of the task>"

That’s it. The new agent has access to everything the old one learned, without loading it all into context every turn.

What stays in AGENTS.md

Not everything should be a memory. Some things belong in AGENTS.md because they’re needed every session:

Identity and personality
Core behavioral rules
Communication style guidelines
Safety boundaries
The instruction to use MemoClaw for recall

If you find yourself referencing something in fewer than half your sessions, it’s probably a memory, not an instruction.

The cost

Pre-loading isn’t free. Each store operation costs $0.005, and migrate costs $0.01 per file. Onboarding a project with 50 decision documents costs about $0.50. That’s less than the token cost of loading a bloated AGENTS.md for a single long session.

The ongoing cost is a recall per task, at $0.005 each. Compare that to 12,000 tokens of irrelevant context loaded on every turn.

The math works out fast.

Your agent doesn’t need to know everything all the time. It needs to know the right things at the right time. Pre-loading memories with proper tags and importance scores gives a new agent institutional knowledge without the context window tax. Keep AGENTS.md lean. Let recall do the heavy lifting.