The extract endpoint: turning messy text into structured memories
Your agent just sat through a 45-minute meeting. Someone pasted the raw notes into chat. There are action items buried in there, a decision about switching to Railway, and a correction about the API port being 5433, not 5432.
You could have your agent parse all that manually. Pull out facts one by one. Assign importance. Pick tags. Call store for each.
Or you could throw the whole mess at MemoClaw’s extract endpoint and let GPT-4o-mini do the sorting.
What extract does
POST /v1/memories/extract takes unstructured text and returns structured memories. It reads the input, pulls out facts worth remembering, scores them by importance, assigns tags, and stores them. One call.
memoclaw extract \
--text "Meeting notes from sprint planning: we're switching deploys from Heroku to Railway. Database stays on Neon. Alice is taking the auth migration, targeting Friday. Budget approved for 2 contractors starting next month." \
--namespace project-planning
Output:
{
"memory_ids": ["a1b2c3...", "d4e5f6...", "g7h8i9...", "j0k1l2..."],
"facts_extracted": 4,
"facts_stored": 4
}
Four discrete facts pulled from one blob of text:
- Switching deploys from Heroku to Railway (decision, importance 0.8)
- Database on Neon (project, importance 0.6)
- Alice owns auth migration, deadline Friday (project, importance 0.8)
- Budget approved for 2 contractors next month (decision, importance 0.7)
Each one is self-contained. Each one has tags. Each one your agent can recall later without needing the original meeting notes.
Extract vs. manual store
Here’s the tradeoff:
store | extract | |
|---|---|---|
| Cost | $0.005 per memory | $0.01 per call (any number of facts) |
| You decide what to store | Yes | No, the LLM decides |
| Handles messy input | No, you clean it first | Yes |
| Importance scoring | You set it | Auto-assigned |
| Tags | You pick them | Auto-assigned |
If you know exactly what to store and how important it is, store is cheaper and more precise. One fact, $0.005, done.
But most real-world input isn’t one clean fact. It’s a paragraph. A chat log. Error output. Meeting notes. Extracting the facts by hand is work your agent shouldn’t be doing.
The math: if your messy text contains 3+ facts, extract is cheaper per fact than calling store individually. And you don’t have to write the parsing logic.
What kinds of input work well
I’ve been running extract against different input types. Here’s what works and where it stumbles.
Meeting notes
This is the sweet spot. Meetings produce scattered, semi-structured text with decisions, action items, and context mixed together.
memoclaw extract \
--text "Standup 3/10: Backend team shipping new rate limiter today. Frontend blocked on design review — Sarah OOO until Wednesday. We agreed to skip the v2.1 release and go straight to v3. Jake raised concerns about the Postgres connection pooling but no action item yet." \
--namespace team-standups
Extract pulls out: rate limiter shipping today, Sarah OOO until Wednesday, skipping v2.1 for v3 (high importance — that’s a decision), and Jake’s pooling concern (lower importance — no action yet).
The LLM is good at distinguishing decisions from observations here.
Chat logs
Dump a conversation between your user and agent. Extract finds the facts worth keeping.
memoclaw extract \
--text "User: hey, btw I changed my email to [email protected]. Also the staging server is 192.168.1.50 now, not .51. Assistant: Updated, thanks. User: oh and can you always use dark mode in code examples? User: thanks" \
--namespace user-prefs
Three facts: email change (correction, high importance), server IP change (correction, high importance), dark mode preference (preference, medium importance). “thanks” gets ignored.
Error dumps
This one’s interesting. You can pipe error output and extract will pull out the actionable information.
memoclaw extract \
--text "Error: ECONNREFUSED 127.0.0.1:5432. Tried reconnecting 3 times. Looks like Postgres isn't running on the default port. Checked docker-compose — turns out the port mapping is 5433:5432. So external connections need port 5433." \
--namespace debugging
Extracts: Postgres external port is 5433 not 5432 (correction, importance 0.9). The reconnection attempts and error message itself aren’t stored — they’re transient details.
Documentation snippets
Works but be careful. If you extract from a long doc, you’ll get a lot of facts and the importance scoring gets noisy.
memoclaw extract \
--text "$(head -50 README.md)" \
--namespace project-docs
Better approach: extract from specific sections you care about, not entire documents. For bulk documentation, use memoclaw migrate instead — it’s designed for markdown files.
Wiring extract into your OpenClaw agent
Automatic: use the hooks
If you have memoclaw-hooks installed, extract runs automatically when your session ends (on /new or context compaction). You don’t need to do anything.
openclaw hooks install memoclaw-hooks
openclaw hooks enable memoclaw
openclaw gateway restart
The hook calls extract on the recent conversation, so your agent’s context gets preserved even after a session reset.
Manual: agent-triggered extract
Sometimes your agent recognizes that a specific chunk of text has important information. Add this to your agent’s SOUL.md or AGENTS.md:
When users share meeting notes, error logs, or multi-fact updates,
use `memoclaw extract` to store the key facts automatically.
Don't try to parse them manually.
Then in conversation:
memoclaw extract \
--text "paste of the user's message with multiple facts" \
--namespace relevant-namespace
Cron: periodic extraction
For long-running sessions, extract recent conversation context on a schedule:
# Every 2 hours, extract from recent messages
openclaw cron add "extract-session" \
--schedule "0 */2 * * *" \
--command "memoclaw extract --text '$(openclaw session export --last 30)' --namespace active-session"
What about ingest?
MemoClaw also has /v1/ingest, which does everything extract does plus deduplication and auto-relations between extracted facts. Same price ($0.01).
Use extract when you want simple fact extraction and storage. Use ingest when you want the full pipeline: extract + dedup against existing memories + link related facts together. For most ongoing workflows, ingest is the better choice. For one-off text dumps, extract is simpler.
Cost reality
$0.01 per extract call. If your agent processes 5 chunks of text per day, that’s $0.05/day, about $1.50/month.
Compare with manually calling store for each fact: if those 5 chunks contain an average of 3 facts each, that’s 15 stores at $0.005 = $0.075/day. More expensive, and you had to write the extraction logic.
The free tier gives you 100 calls, enough to test for a couple weeks of normal use.
Limits and gotchas
- Max input: 8192 characters. For longer text, split it up or use
migratefor full documents. - The LLM occasionally misses implicit facts or over-extracts obvious ones. Check results with
memoclaw listafter your first few extracts. - Corrections (facts that override previous knowledge) get high importance automatically. This is usually what you want.
- Extract doesn’t deduplicate against existing memories. If you call it twice with overlapping text, you’ll get duplicates. Use
ingestif dedup matters.
When to reach for something else
- One specific fact you already know →
memoclaw store($0.005, more precise) - Full markdown files →
memoclaw migrate(designed for docs) - Need dedup + relations →
memoclaw ingest(same price, does more) - Quick keyword lookup →
memoclaw search(free, no embeddings)
Extract is for the middle ground: you have messy text with multiple facts in it and you don’t want to parse it yourself. That covers a surprising amount of what agents deal with day to day.
Resources: