🔗 Companion project: Maestro — ardeshir.io/maestro/
Agentic Memory Architecture — The State of the Art in 2026
What the First Wave Got Wrong
The first generation of LLM agent systems (2023–2024):
- had no memory hierarchy
- used stateless HTTP request loops
- relied on monolithic “super agents”
Result:
- token explosions
- timeout storms
- recursive hallucinations
- expensive retries
- no observability
Modern Memory Architecture (2026)
The state-of-the-art architecture now uses layered memory systems.
Layer 1 — Active Working Context
Very small. Usually:
- 8k–32k tokens
- immediate task only
- current execution graph node
This is what Claude or GPT sees directly.
Layer 2 — Episodic Memory
Compressed summaries of prior actions. Examples:
- previous tool calls
- completed subtasks
- decisions made
- execution checkpoints
Usually stored in:
- PostgreSQL
- Redis
- CosmosDB
- DynamoDB
Layer 3 — Semantic Memory
Embedding/vector retrieval. Stored in:
- Pinecone
- Weaviate
- Qdrant
- Chroma
- Milvus
Used for:
- long-term knowledge
- documentation
- research corpora
- codebases
- meeting archives
Layer 4 — Object/File Storage
Large raw data. Stored in:
- Amazon Web Services S3
- Microsoft Azure Blob
- Cloudflare R2
- MinIO
The LLM never directly sees this entire layer. Instead:
- retrieval workers
- parsers
- chunkers
- summarizers
…extract only relevant slices.
The Rise of Graph-Based Orchestration
The dominant trend is moving from “chains” to graphs.
This is why frameworks like:
- LangGraph
- AutoGen
- Semantic Kernel
- OpenAI Agents SDK
- Claude Agent SDK
all shifted toward stateful graph orchestration.
The reason: agent workflows are NOT linear. They are:
- branching
- recursive
- interruptible
- parallel
- resumable
Graph runtimes allow:
- checkpointing
- retries
- human approval
- multi-agent delegation
- persistence
- recovery after crashes
The Dominant Production Architecture
This is now the “serious” enterprise architecture:
Frontend UI
↓
API Gateway
↓
Agent Orchestrator (LangGraph / Semantic Kernel)
↓
Task Queue (Kafka / NATS / RabbitMQ)
↓
Execution Workers
↓
LLM Providers
↓
Memory + Retrieval Layer
↓
Tool/MCP Servers
MCP (Model Context Protocol) Became a Major Standard
Anthropic’s MCP became one of the most important developments in agentic systems.
It standardizes:
- tools
- memory access
- filesystem access
- browser access
- APIs
- databases
…across different models and frameworks.
Meaning — a tool built once can work with:
- Claude
- GPT
- Gemini
- local models
- LangGraph
- CrewAI
- Semantic Kernel
This is becoming the equivalent of:
| Analogy | Domain |
|---|---|
| USB | for AI tools |
| POSIX | for agents |
| Kubernetes API | for LLM execution |
OpenAI’s New Architecture Direction
OpenAI shifted heavily toward:
- Responses API
- stateful sessions
- tool-native execution
- Agents SDK
- realtime orchestration
The key evolution: the orchestrator now stores state outside the prompt.
The model receives:
- only active state
- relevant memory
- current tools
- not the entire conversation
The Responses API + Agents SDK architecture is specifically optimized for:
- persistent sessions
- multi-step execution
- tool chaining
- external memory
- retrieval-augmented reasoning
Claude’s Architecture Direction
Anthropic focused on:
- MCP
- large context reasoning
- tool use
- extended thinking
- agent-native APIs
Claude Opus became extremely strong for:
- recursive planning
- research loops
- code synthesis
- tool orchestration
Anthropic’s big contribution was realizing: context should be dynamically assembled — not statically embedded.
Azure AI Foundry Architecture
Azure enterprise stacks increasingly look like:
Frontend
↓
Azure API Management
↓
Azure AI Foundry Agents
↓
Semantic Kernel / AutoGen
↓
Azure OpenAI
↓
CosmosDB + Azure AI Search
↓
Azure Kubernetes Service (AKS)
↓
Event Grid + Service Bus
Key patterns:
- hybrid vector + keyword search
- CosmosDB memory persistence
- AKS agent workers
- RBAC security
- Entra identity integration
- observability with Application Insights
Microsoft is heavily converging AutoGen, Semantic Kernel, and enterprise orchestration into unified agent infrastructure.
AWS EKS Agentic Architecture
The AWS-native pattern looks like:
Ingress / API Gateway
↓
EKS Orchestrator Pods
↓
Agent Runtime
↓
Kafka / SQS / EventBridge
↓
Tool Workers
↓
Bedrock / OpenAI / Claude APIs
🔗 Companion project: Maestro — ardeshir.io/maestro/