Architecture overview
How Martha is put together, from the outside in. If you want the developer-facing primitives, Concepts is the right starting point — this page is for when you want to know what's actually moving when an agent runs.
The picture
┌──────────────────────────────┐
│ Your front doors │
│ • Web (embedded chat) │
│ • SMS / WhatsApp │
│ • Voice │
│ • CLI / API / agent harness │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ Conversation layer │
│ Multi-turn chat + streaming │
│ Tool calls + structured │
│ outputs + citations │
└──────────────┬───────────────┘
│
┌──────────────────────┴──────────────────────┐
▼ ▼
┌────────────────────┐ ┌──────────────────────┐
│ Agents │ │ Workflows │
│ Prompt + tools │◄────────────────────► │ Graph of steps │
│ Loop with memory │ call as tools │ Durable execution │
└─────────┬──────────┘ └──────────┬───────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ Capability layer │
│ │
│ Functions Documents Memory Tasks Triggers Approvals │
│ (HTTP / (parse, (recall (queue, (events, (pauses) │
│ platform) chunk, across claim, webhooks, │
│ embed, sessions complete schedules) │
│ search) ) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────┐
│ Foundations │
│ Multi-tenant isolation │
│ Per-client allowlists │
│ Connection / secret store │
│ Audit log │
└──────────────────────────────┘Every box is something you control via configuration; very little of Martha's internals leak into your code.
The four layers
Front doors
Wherever your users are, Martha meets them. A browser embed widget, an SMS or WhatsApp number routed through your messaging provider, a voice line, a partner's website via the same widget, or any custom UI talking to the REST API. The same agent powers all of them — picking a channel doesn't fork your prompt, your tools, or your workflows.
Conversation layer
Multi-turn chat with streaming, tool-call loops, structured outputs (JSON-Schema-constrained responses from any provider), and inline citations back to source documents. You write a prompt and grant tools; the conversation engine handles the loop, the streaming, and the formatting.
Agents and workflows
Two ways to compose work:
- Agents are autonomous loops. You set a prompt, grant a set of functions, and let the loop call tools, observe results, and decide what to do next. Best for open-ended interactions.
- Workflows are graphs of steps. You wire LLM nodes, function nodes, branches, parallel paths, loops, and human-approval pauses. Best for deterministic processes — the kind of thing where you'd otherwise reach for a job runner.
They compose. An agent can call a workflow as a tool. A workflow can have an agent loop as one of its nodes. Pick the shape that fits the problem.
Capability layer
The shared services agents and workflows draw on:
- Functions — HTTP endpoints (REST or GraphQL) and built-in platform tools, callable as agent tools.
- Documents — uploads parsed, chunked, embedded, and exposed via hybrid keyword + semantic search.
- Memory — recall over past chat messages, tool outputs, and persistent facts across sessions.
- Tasks — async work queue with claim, heartbeat, complete semantics. Bidirectional with Linear, GitHub, GitLab.
- Triggers — start workflows on platform events, inbound webhooks, schedules, or absence-of-event timers.
- Approvals — workflow pauses that wait for human OK without losing state.
Foundations
Tenant isolation on every record. Per-client function allowlists so a public-facing chat agent can't accidentally touch your admin tools. A credential store for connections to external systems. An immutable audit log of every action with redaction support.
How a request actually flows
An end user asks a question in your embedded chat widget:
- The widget mints a short-lived token from your server (you control auth).
- The widget opens a streaming connection to Martha's chat endpoint.
- The conversation engine resolves which agent should handle the message based on the client config.
- The agent loop runs: model picks a tool, Martha calls the tool (HTTP function, document search, workflow, whatever), the loop continues until the model emits a final response.
- Tokens stream back to the widget as they're generated. Tool-call status frames stream back too — the user sees what the agent is doing, not just the answer.
- When the loop ends, the message is persisted and indexed for future recall.
If a workflow runs alongside (say, the agent calls a multi-step pipeline as a tool), workflow execution is durable: each step's state is checkpointed, retries are independent, and a long pause for human approval doesn't tie up resources.
Multi-tenancy and isolation
Every record carries a tenant_id. Database queries filter by it. The CLI and API derive it from your token claims — you never pass it explicitly, you never spoof it. A leaked agent token can only see its own tenant; a misconfigured workflow can't reach another tenant's data even if it tries.
Within a tenant, Clients are the consumers — a chat web app, a SMS sender, a voice line — each with its own credentials, system prompts, and per-feature allowlists. A public-facing chat client can be granted only the safe agents and tools; an internal admin client can have access to everything. Allowlists are enforced server-side; the client can ask for tools it doesn't have, but Martha refuses.
Compose for production
A few patterns we see often:
- Workflow-as-tool: Build a complex pipeline as a workflow, then expose it to one or more agents as a callable function. Agents stay simple; the durable execution handles the heavy lifting.
- Trigger-driven ingestion: Map a Cloudflare R2 folder structure to document collections; uploads auto-ingest, agents instantly have new knowledge.
- External agent harnesses: Spin up your own agents (CrewAI, ork, custom Python) and have them claim tasks from Martha's queue. Same task lifecycle, your code.
- Structured outputs into workflows: Use JSON-Schema-constrained LLM nodes early in a workflow to extract intent, then route via choice nodes to the right downstream branch.
- Embedded chat, hosted everywhere else: One agent definition, multiple
Clientrows for each surface — your product, partner integrations, and a Slack ops bot all share the underlying behavior.
What you don't have to think about
Token-by-token streaming. Retry policies. Workflow checkpointing. Document parsing. Vector search. Tool-call ID matching across providers. JSON Schema enforcement on Anthropic vs OpenAI vs LiteLLM. CORS on the chat widget. Tenant scoping on every query. Webhook signature verification. The list goes on. All handled — your job is the prompts and the business logic.