Surfaces & capability keys
If you're building a chat surface — a web client, an embed, the CLI, an SMS bridge — this is the contract for telling Martha what your surface can render and answer. Martha is conservative: it only emits a widget or a blocking interactive tool if the surface has declared it can handle it. Everything fails closed.
capability_keys
On POST /api/chat/{session_id}, the request body may include capability_keys — the list of things this surface can render/answer:
{ "content": "show me the air quality stations",
"capability_keys": ["map_widget", "report_widget", "widget_stream", "ask_user"] }The server uses them two ways:
- Display widgets (
map_widget,report_widget,webxr_widget,generate_widget) — declaring a key injects server-owned instructions so the agent may emit that widget. A surface that doesn't declare them never gets them. - Interactive tools (
ask_user) — declaring the key lets the agent issue a blocking question; absent it, the agent gets an immediate fallback instead of hanging.
Fail-closed by design. A surface that sends no
capability_keys(bare CLI, SMS, the minimal embed) gets plain text only — no widgets, no blocking tools. This is intentional: a surface that can't render an approval/question must never be left waiting on one.
Display widgets
Two delivery modes, selected by the widget_stream key:
- Inline (default): the agent emits
<report_widget>{…JSON…}</report_widget>inside the assistant text; your client parses the tags out and renders the component. - Streamed frames (
widget_stream): the server lifts a complete widget out of the token stream and sends it as its own SSE frame, so raw JSON never appears as text mid-stream:Consume it on a dedicated channel (e.g. andata: {"type":"widget","widget_type":"report","data":{ … }}onWidgethandler) and render the widget directly. Persisted/reloaded messages still contain the inline tags, so a text parser remains the fallback for history.
Declare only the widget types your client actually renders — the declared keys are the source of truth for what the agent is told it can produce.
Interactive tools: ask_user
ask_user is a blocking clarifying question (AskUserQuestion-shaped: a question + options). When the agent calls it:
- The chat turn suspends (durably — survives restarts) and an SSE
tool_statusframe is emitted withstatus:"awaiting_input"and the question. - Your surface renders the question and collects the answer.
- You deliver it back:Auth is the session's own principal (session ownership) — answering a clarifying question is not a privileged action, so a service-account chat surface can deliver it. (Contrast: approval decisions are privileged and go elsewhere — see Capability approvals.)
POST /api/chat/{session_id}/human-input { "tool_call_id": "...", "answer": "Staging" } - The agent loop wakes and continues with the answer as the tool result.
If nobody answers within the timeout (24h), the agent receives a structured "timed out" result and proceeds — it never hangs forever.
ask_usercarries no permissions. It's a clarification pause; the agent keeps acting under its own grants the whole time. The human's answer is just data fed back as a tool result.
Requirements recap
For ask_user to work end-to-end: the agent must be granted ask_user (it's a platform function — see Permissions & access), and the surface must send the ask_user capability key. Miss either and it fails closed to a fallback.
See also
- Permissions & access model — grants, the two-surface rule, identities.
- Capability approvals — the privileged, blocking, human-gated sibling of
ask_user.