Agent memory

Agents remember. Past chat messages, past tool outputs, persistent facts about users — all searchable via a single recall tool the agent calls like any other function.

Why this exists

Long conversations grow past what fits in a model's context window. Even with prompt caching, older turns drop out of the active prompt. Agents that need to refer back to "what did the customer say at the start of the call" or "what was the result of that database query 15 turns ago" can't, unless something brings the past back into context.

That something is the recall tool. The agent decides when to call it; Martha decides what's relevant.

How an agent uses it

Same shape as every other tool call:

json

{
  "name": "recall",
  "arguments": {
    "query": "the metabase query about order count",
    "top_k": 5,
    "source_kinds": ["tool_output"]
  }
}

Parameters

Parameter	Type	Default	Description
`query`	string	required	Natural-language search query. Both keyword and semantic similarity are matched.
`top_k`	integer	5	Number of items to return (max 20).
`source_kinds`	list of strings	all	Filter by kind: `chat_message`, `tool_output`, `document_chunk`, `fact`. Omit to search all.
`scope`	string	`"any"`	Where to look. See Scopes below.
`enable_rerank`	boolean	`true`	Cross-encoder rerank for higher precision. Disable only if you specifically want raw scores.

Response

json

{
  "items": [
    {
      "id": "uuid",
      "source_kind": "tool_output",
      "source_ref": "tool_call_id_xyz",
      "content": "...",
      "event_time": "2026-05-02T14:23:01Z",
      "score": 0.87
    }
  ],
  "total": 1,
  "degraded": false,
  "rerank_used": true
}

degraded: true means semantic search is partially indexed in the background — keyword search is complete, semantic recall may be incomplete. Recall is still usable.

rerank_used: false means the precision rerank was skipped — items still come back, just with merge-order ranking.

Scopes

The scope parameter picks which slice of memory to search:

Scope	What it covers	Required for use
`session`	Chat messages and tool outputs from the current chat session	Always available
`tenant`	Document chunks indexed across the whole tenant	Always available
`user`	Facts written specifically as belonging to a user, across all their sessions	Authenticated human user
`agent`	Facts written specifically as belonging to this agent, across all sessions	Agent attached to the chat
`any`	All four, weighted-merged	Default — no extra requirements

Tenant isolation is unconditional in every scope. A recall in tenant A cannot ever return rows from tenant B.

You can tune the merge weights via environment variables if your agent's domain calls for it:

MARTHA_RECALL_WEIGHT_SESSION=1.3
MARTHA_RECALL_WEIGHT_USER=1.1
MARTHA_RECALL_WEIGHT_AGENT=1.0
MARTHA_RECALL_WEIGHT_TENANT=1.0

Setting a weight to 0 suppresses that source class entirely.

Persistent facts

For things you want an agent to remember across sessions, the agent calls remember_fact:

json

{
  "name": "remember_fact",
  "arguments": {
    "content": "User prefers metric units and dark mode.",
    "scope": "user"
  }
}

Returns { "id": "uuid", "was_new": true }. Idempotent — the same content from the same identity in the same scope writes once.

Two scopes available:

user — fact belongs to the human user. Recallable in any future session that user has, regardless of which agent.
agent — fact belongs to this agent. Recallable in any future session this agent runs, regardless of which user.

Subsequent remember_fact calls that contradict an earlier fact retire the older one automatically. The agent doesn't have to track this — when it next calls recall, only the current fact appears. Older versions stay in the audit trail, never in default recall.

What's stored

Source	What gets recorded	When
`chat_message`	Every user message and assistant response	At message persist time
`tool_output`	Every tool result (full or elided preview if oversized)	When the tool returns
`document_chunk`	Every chunk of every uploaded document	When document ingestion finalizes
`fact`	Each `remember_fact` call	When the agent makes it

Tying back to large tool outputs

When a tool returns a large payload, Martha stores it under a tool_output_key (e.g. tout_abc...) and shows the agent a preview. Memory holds the same preview. To re-read the full payload, the agent calls read_tool_output(tool_output_key="tout_abc...") — same key, full content.

Compressed mode for very long conversations

When a conversation gets so long it threatens the model's context window (above 80% of capacity), Martha automatically switches to a 4-message recent tail in the prompt. The agent reaches older content via recall instead.

This is invisible to end users — long sessions just keep working. The recall tool description tells the agent to reach for it when the conversation has been trimmed for length, and in practice agents do.

You don't need to do anything to enable this. It's automatic.

How fresh is recall

New chat messages, tool outputs, and facts are recallable via keyword search immediately. Semantic search becomes available within ~500 ms in the typical path, and within 5 seconds in the worst case (background indexing). The degraded flag in the response surfaces this.

Precision rerank

Recall over-fetches up to 20 candidates and reorders them with a cross-encoder reranker for final ordering. Default model is a 306M-parameter multilingual reranker covering 70+ languages. If the reranker is unavailable, recall falls back to merge-order results — rerank_used: false in the response signals this. Recall always succeeds; rerank is a quality upgrade, not a correctness gate.

You can swap rerank models per deployment via configuration. Higher-quality multilingual options are available; English-only options run roughly twice as fast.

Tuning

Operational knobs surfaced as environment variables:

Variable	Default	What it does
`MARTHA_RERANKER_URL`	unset	Reranker endpoint. Unset disables rerank entirely.
`MARTHA_RECALL_WEIGHT_SESSION`	1.3	Weight for `session` items in `scope=any`
`MARTHA_RECALL_WEIGHT_USER`	1.1	Weight for `user` items in `scope=any`
`MARTHA_RECALL_WEIGHT_AGENT`	1.0	Weight for `agent` items in `scope=any`
`MARTHA_RECALL_WEIGHT_TENANT`	1.0	Weight for `tenant` items in `scope=any`
`MARTHA_MEMORY_JUDGE_MODEL`	small/fast model	Judge model for fact contradiction. Empty string disables.
`MARTHA_MEMORY_JUDGE_TOPK`	5	Candidates compared per pending fact
`MARTHA_MEMORY_JUDGE_THRESHOLD`	0.7	Distance cutoff for "close enough to judge"
`MARTHA_MEMORY_ANONYMIZE_SOFT_TTL_DAYS`	30	Days before retired facts are anonymized

Retention

Retired memory rows are anonymized, not hard-deleted. Content is replaced with [REDACTED], identity and embedding columns are cleared, and an audit-log event records the action. Rows stay in place so any references to them (supersede chains, audit pointers) remain intact. Default recall always excludes anonymized rows.

Per-tenant retention policy is configurable; the system runs in dry-run mode by default for the first observation window in any new deployment.

Tenant isolation

Every recall query filters by tenant at the SQL level. The tenant identifier comes from the request token, never from agent input. An agent cannot recall content from another tenant even if it tries to inject one — the value is overridden server-side.

Document tools — searching uploaded document corpora (different store, different lifecycle from session memory)
Citations — formal source references from agent responses to document chunks

Agent memory ​

Why this exists ​

How an agent uses it ​

Parameters ​

Response ​

Scopes ​

Persistent facts ​

What's stored ​

Tying back to large tool outputs ​

Compressed mode for very long conversations ​

How fresh is recall ​

Precision rerank ​

Tuning ​

Retention ​

Tenant isolation ​

Related ​

Agent memory

Why this exists

How an agent uses it

Parameters

Response

Scopes

Persistent facts

What's stored

Tying back to large tool outputs

Compressed mode for very long conversations

How fresh is recall

Precision rerank

Tuning

Retention

Tenant isolation

Related