Mar 10, 2026

Smart Ingest: Zero-Effort Memory Extraction from Agent Conversations

Most agents store memories the hard way: manually picking out facts, assigning importance scores, choosing tags, and calling store for each one. It works, but it’s tedious and error-prone. You miss things. Your agent misses things.

MemoClaw’s /v1/ingest endpoint does all of that in a single call. Dump a conversation in, get structured memories out — extracted, scored, tagged, deduplicated, and optionally related to each other.

This tutorial walks through how ingest works, when to use it, and how to wire it into your OpenClaw agent’s workflow.

What Ingest Actually Does

Under the hood, ingest is a pipeline:

Extract — GPT-4o-mini reads the conversation and pulls out discrete, self-contained facts
Score — Each fact gets an importance score (0–1) and a memory type (preference, correction, decision, project, observation, or general)
Tag — The LLM assigns 1–5 relevant tags per fact
Deduplicate — Before storing, each fact is checked against your existing memories. Duplicates are skipped
Relate — If auto_relate is enabled, facts extracted from the same conversation are linked with related_to relations

One API call. $0.01 USDC.

Extract vs. Ingest: Which One?

MemoClaw has two “smart” endpoints:

Endpoint	What it does	Price
`POST /v1/memories/extract`	Extract facts and store them	$0.01
`POST /v1/ingest`	Extract + deduplicate + auto-relate	$0.01

Same price. Ingest just does more. Use extract if you want raw fact extraction without relations. Use ingest for the full pipeline.

For most use cases, ingest is what you want.

Your First Ingest Call

With the CLI

The simplest way to try it:

memoclaw ingest \
  --messages '[
    {"role": "user", "content": "I just migrated our database from MySQL to PostgreSQL 16. We are using pgvector for embeddings. The team decided to deploy on Railway instead of Heroku."},
    {"role": "assistant", "content": "Got it — PostgreSQL 16 with pgvector on Railway. I will update my notes on your infrastructure."}
  ]' \
  --auto-relate \
  --namespace project-infra

Output:

{
  "memory_ids": [
    "a1b2c3d4-...",
    "e5f6g7h8-...",
    "i9j0k1l2-..."
  ],
  "facts_extracted": 3,
  "facts_stored": 3,
  "facts_deduplicated": 0,
  "relations_created": 3,
  "tokens_used": 210
}

Three facts extracted from two messages:

Database migrated from MySQL to PostgreSQL 16
Using pgvector for embeddings
Deploying on Railway instead of Heroku

All three are related to each other (3 pairs from 3 facts), and they live in the project-infra namespace.

With curl

curl -X POST https://api.memoclaw.com/v1/ingest \
  -H "Content-Type: application/json" \
  -H "x-wallet-auth: $(memoclaw auth-header)" \
  -d '{
    "messages": [
      {"role": "user", "content": "I just migrated our database from MySQL to PostgreSQL 16. We are using pgvector for embeddings."},
      {"role": "assistant", "content": "Got it, PostgreSQL 16 with pgvector."},
      {"role": "user", "content": "Also, we switched from Heroku to Railway for deploys."}
    ],
    "namespace": "project-infra",
    "auto_relate": true
  }'

With raw text

Don’t have a conversation structure? You can pass text instead of messages:

memoclaw ingest \
  --messages '[{"role": "user", "content": "Meeting notes: decided to use TypeScript for the new service, targeting Q2 launch, budget approved for 3 contractors"}]' \
  --namespace project-planning

The LLM handles free-form text just fine. It’ll pull out the individual facts regardless of formatting.

How the LLM Extracts Facts

The extraction system prompt tells GPT-4o-mini to:

Extract only facts worth remembering long-term (not transient details)
Make each fact self-contained (understandable without the conversation)
Assign importance: 1.0 = critical corrections, 0.5 = moderate, 0.1 = trivial
Classify into memory types: correction, preference, decision, project, observation, general
Add 1–5 tags per fact
Deduplicate within the extraction (don’t extract the same fact twice)

This means you don’t need to worry about what to remember. The model decides for you. Corrections get high importance. Casual observations get low importance. Preferences land somewhere in between.

What gets extracted (and what doesn’t)

Extracted:

“User prefers dark mode” → preference, importance 0.7
“Database port is 5433, NOT 5432” → correction, importance 0.95
“Team decided to use Railway” → decision, importance 0.8

Not extracted:

“Sure, I can help with that” → transient, no fact
“Let me think about it” → no actionable information
“Thanks!” → social noise

Deduplication: Don’t Store What You Already Know

The real power of ingest shows up over time. Say your agent has this conversation on Monday:

User: We use PostgreSQL 16.

And this conversation on Wednesday:

User: Our database is PostgreSQL 16 with pgvector.

If you manually store both, you get duplicate memories. With ingest, MemoClaw checks semantic similarity against existing memories before storing. “We use PostgreSQL 16” already exists, so it’s deduplicated. Only “using pgvector” gets stored as a new fact.

The response tells you exactly what happened:

{
  "facts_extracted": 2,
  "facts_stored": 1,
  "facts_deduplicated": 1
}

One new fact. One duplicate skipped. Your memory stays clean.

Auto-Relate: Building a Knowledge Graph

When auto_relate is true, every pair of facts extracted from the same conversation gets a related_to relation. This builds a knowledge graph over time.

Why does this matter? When you later recall one memory, you can traverse its relations to find connected context. “PostgreSQL 16” is related to “pgvector” is related to “Railway deployment” — your agent can follow these links to reconstruct the full picture.

For 3 extracted facts, you get 3 relations (every pair). For 4 facts, 6 relations. The math is n * (n-1) / 2.

Relations use ON CONFLICT DO NOTHING, so running ingest multiple times won’t create duplicate relations.

Wiring Ingest into Your OpenClaw Agent

Option 1: Session-End Hook (Recommended)

The cleanest approach is to ingest your entire conversation when the session ends. If you’re using memoclaw-hooks, this happens automatically — the hook extracts context on /new and on context compaction.

Install the hooks:

openclaw hooks install memoclaw-hooks
openclaw hooks enable memoclaw
export MEMOCLAW_PRIVATE_KEY=0xYourKey
openclaw gateway restart

That’s it. Every session end triggers an ingest call. No code changes.

Option 2: Periodic Ingest via Cron

For long-running sessions, you might want to ingest periodically rather than waiting for the session to end:

# In your OpenClaw cron config — ingest recent context every 2 hours
openclaw cron add "ingest-session" \
  --schedule "0 */2 * * *" \
  --command "memoclaw ingest --messages '$(openclaw session export --last 50 --json)' --auto-relate"

Option 3: Manual Ingest in Agent Code

Your agent can call ingest directly when it recognizes important context:

# Agent decides this conversation had important info
memoclaw ingest \
  --messages '[
    {"role": "user", "content": "Actually, the API endpoint changed from /api/v1 to /api/v2. And we need to use Bearer tokens now, not API keys."},
    {"role": "assistant", "content": "Noted — /api/v2 with Bearer token auth. I will remember that."}
  ]' \
  --namespace project-api \
  --auto-relate

Namespaces: Keep Project Memory Separate

Always use namespaces with ingest. Without them, all memories land in default, and recall across different projects gets noisy.

# Infrastructure decisions
memoclaw ingest --messages '...' --namespace infra

# Frontend preferences
memoclaw ingest --messages '...' --namespace frontend

# Personal preferences (timezone, communication style)
memoclaw ingest --messages '...' --namespace personal

When you recall, you can scope to a namespace:

memoclaw recall "database setup" --namespace infra

Or recall across all namespaces by omitting the flag.

Cost Breakdown

Ingest costs $0.01 per call. That covers:

GPT-4o-mini for fact extraction (~185 tokens typical)
Embedding generation for each extracted fact
Deduplication similarity check
Relation creation (if enabled)

At 10 ingests per day, you’re spending $0.10/day or ~$3/month. The 100 free-tier calls cover your first 10 days of daily use.

When NOT to Use Ingest

Ingest isn’t always the right tool:

Single known facts — If you already know exactly what to store, use memoclaw store directly. It’s cheaper ($0.005) and more precise.
Large documents — Ingest handles conversations, not documents. For markdown files, use memoclaw migrate.
Real-time streaming — Ingest is a batch operation. Don’t call it on every message — batch conversations and ingest periodically.
Secrets — Never ingest conversations containing passwords, API keys, or tokens. MemoClaw memories are not encrypted at rest.

Checking What Got Stored

After an ingest, verify your memories:

# List recent memories in the namespace
memoclaw list --namespace project-infra --limit 5

# Search for a specific fact
memoclaw search "PostgreSQL" --namespace project-infra

# Check relations on a memory
memoclaw recall "database" --namespace project-infra

Use memoclaw suggested --category fresh to see recently created memories.

Wrapping Up

Ingest is the lazy (smart) way to build agent memory. Instead of hand-crafting individual store calls, dump your conversations in and let MemoClaw handle extraction, scoring, deduplication, and relations.

The workflow:

Agent has conversations
Periodically (or on session end), call ingest with the conversation
MemoClaw extracts facts, skips duplicates, builds relations
On next session, recall pulls back the relevant context

Your agent remembers everything worth remembering, and nothing it’s already seen.

Resources: