🔗 See also: Maestro — ardeshir.io/maestro/
Continuation Architecture
Beyond RAG, Chat, and the Agent Hype Cycle
A continuation can sleep for three weeks waiting for an arXiv RSS event, wake, do five seconds of reasoning, spawn two children, and go back to sleep — without anyone “running” it.
This single sentence breaks every assumption baked into the current agent stack. There is no session. There is no “context window management.” There is no orchestrator polling in a loop. The unit of work is not a request-response pair — it is a continuation: a resumable, branchable, budgeted computation that persists across arbitrary time horizons.
What follows is the architecture that makes this real.
1. The Context Field (Replacing RAG)
Instead of a retrieval call per turn, each continuation has an associated context field — a continuously-maintained, ranked projection of all available memory sources, recomputed by background workers whenever:
- the continuation’s goal frame shifts
- new data lands in any subscribed source
- a sibling continuation publishes a finding
The field is materialized lazily. The continuation doesn’t “query” it; it just reads the top of the field, which is always fresh.
Implementation
- A streaming materialized view (Materialize, RisingWave, or a custom Flink job) over your vector + structured + graph stores
- Ranked by a small, fast model (the “context kernel”) that scores relevance to the current goal frame
- With eviction-by-decay rather than fixed windows — old context fades unless reinforced
This replaces the four-layer memory hierarchy with one continuously-projected surface. The big model never sees memory; it sees the field.
2. Reactive Tool Mesh (Replacing MCP-as-RPC)
Tools register two interfaces: a command surface (do this now) and a stream surface (notify me when X). Continuations declare interest in streams; the mesh routes events.
Underneath this can be NATS JetStream, Kafka with schema registry, or a CRDT-based gossip layer for federated work across organizations.
The crucial difference from MCP
Tools push, not just pull. A “literature monitor” tool isn’t called — it’s subscribed to, and it wakes continuations when relevant papers appear.
MCP remains useful at the edges — as the protocol tools speak when they’re called imperatively. But the default mode is reactive, not request-response.
3. The Researcher Pane (Replacing Chat UI)
The interface isn’t a chat window. It’s closer to a research notebook crossed with an air-traffic-control display:
- Left rail — live continuations: what’s currently thinking, what’s sleeping, what’s waiting on you
- Center pane — findings: published artifacts from completed sub-goals
- Right rail — the context field: what the system currently thinks is relevant
- Inline controls — fork, kill, re-budget, or merge continuations
The researcher operates on the swarm, not through a model.
4. The Economics Layer
This is the part nobody is doing well yet.
Every continuation carries a budget vector — not just tokens, but a portfolio:
| Resource | Examples |
|---|---|
| Cheap model calls | Haiku-class, small rankers |
| Expensive model calls | Sonnet-class, Opus-class |
| Tool quotas | API calls, search, code execution |
| Human-attention quotas | Review requests, approvals |
| Real-money spend | Paid APIs, cloud compute |
A policy kernel (small, fast, deterministic) decides at each step:
- Should this continuation use Haiku-class, Sonnet-class, or Opus-class reasoning right now?
- Should it run synchronously or yield and come back?
- Should it spawn a child or do the work inline?
- Is the marginal value of the next token positive given the budget?
This is what makes “small models + big models together” actually work as a discipline rather than a vibe. The economics layer is a first-class control plane, not an afterthought.
5. How This Maps to Real Infrastructure
Open Source Build
| Component | Implementation |
|---|---|
| Continuation store | Temporal or Restate |
| Context field | RisingWave + Qdrant + small ranking model via vLLM |
| Reactive mesh | NATS JetStream |
| Policy kernel | Rust service |
| LLM dispatch | LiteLLM |
| Event log | Kafka or Redpanda |
Azure Foundry Build
| Component | Implementation |
|---|---|
| Continuations | Durable Functions or Foundry Agent threads |
| Context field | Azure AI Search + Cosmos DB change feed + Foundry-hosted small ranker |
| Reactive mesh | Event Grid + Service Bus |
| Policy kernel | Container App |
| Observability | App Insights + Foundry tracing |
AWS EKS Build
| Component | Implementation |
|---|---|
| Continuations | Step Functions Express + DynamoDB |
| Context field | OpenSearch + Kinesis Data Streams + Bedrock-hosted small ranker |
| Reactive mesh | EventBridge + MSK |
| Policy kernel | Lambda or Fargate service |
| Tool workers | EKS with Karpenter for elasticity |
Key Insight
On all three stacks, MCP is still useful at the edges — as the protocol tools speak when they’re called imperatively. But the architecture’s center of gravity has shifted from request-response to event-driven continuations with continuous context projection.
The agent doesn’t chat. It thinks — across days, across models, across budgets — and the human steers the swarm.
🔗 Explore the full Maestro framework: ardeshir.io/maestro/
Published on Sepahsalar.org — All Research