🔗 Companion project: Maestro — ardeshir.io/maestro/

Agentic Memory Architecture — The State of the Art in 2026

What the First Wave Got Wrong

The first generation of LLM agent systems (2023–2024):

had no memory hierarchy
used stateless HTTP request loops
relied on monolithic “super agents”

Result:

token explosions
timeout storms
recursive hallucinations
expensive retries
no observability

Modern Memory Architecture (2026)

The state-of-the-art architecture now uses layered memory systems.

Layer 1 — Active Working Context

Very small. Usually:

8k–32k tokens
immediate task only
current execution graph node

This is what Claude or GPT sees directly.

Layer 2 — Episodic Memory

Compressed summaries of prior actions. Examples:

previous tool calls
completed subtasks
decisions made
execution checkpoints

Usually stored in:

PostgreSQL
Redis
CosmosDB
DynamoDB

Layer 3 — Semantic Memory

Embedding/vector retrieval. Stored in:

Pinecone
Weaviate
Qdrant
Chroma
Milvus

Used for:

long-term knowledge
documentation
research corpora
codebases
meeting archives

Layer 4 — Object/File Storage

Large raw data. Stored in:

Amazon Web Services S3
Microsoft Azure Blob
Cloudflare R2
MinIO

The LLM never directly sees this entire layer. Instead:

retrieval workers
parsers
chunkers
summarizers

…extract only relevant slices.

The Rise of Graph-Based Orchestration

The dominant trend is moving from “chains” to graphs.

This is why frameworks like:

LangGraph
AutoGen
Semantic Kernel
OpenAI Agents SDK
Claude Agent SDK

all shifted toward stateful graph orchestration.

The reason: agent workflows are NOT linear. They are:

branching
recursive
interruptible
parallel
resumable

Graph runtimes allow:

checkpointing
retries
human approval
multi-agent delegation
persistence
recovery after crashes

The Dominant Production Architecture

This is now the “serious” enterprise architecture:

Frontend UI
    ↓
API Gateway
    ↓
Agent Orchestrator (LangGraph / Semantic Kernel)
    ↓
Task Queue (Kafka / NATS / RabbitMQ)
    ↓
Execution Workers
    ↓
LLM Providers
    ↓
Memory + Retrieval Layer
    ↓
Tool/MCP Servers

MCP (Model Context Protocol) Became a Major Standard

Anthropic’s MCP became one of the most important developments in agentic systems.

It standardizes:

tools
memory access
filesystem access
browser access
APIs
databases

…across different models and frameworks.

Meaning — a tool built once can work with:

Claude
GPT
Gemini
local models
LangGraph
CrewAI
Semantic Kernel

This is becoming the equivalent of:

Analogy	Domain
USB	for AI tools
POSIX	for agents
Kubernetes API	for LLM execution

OpenAI’s New Architecture Direction

OpenAI shifted heavily toward:

Responses API
stateful sessions
tool-native execution
Agents SDK
realtime orchestration

The key evolution: the orchestrator now stores state outside the prompt.

The model receives:

only active state
relevant memory
current tools
not the entire conversation

The Responses API + Agents SDK architecture is specifically optimized for:

persistent sessions
multi-step execution
tool chaining
external memory
retrieval-augmented reasoning

Claude’s Architecture Direction

Anthropic focused on:

MCP
large context reasoning
tool use
extended thinking
agent-native APIs

Claude Opus became extremely strong for:

recursive planning
research loops
code synthesis
tool orchestration

Anthropic’s big contribution was realizing: context should be dynamically assembled — not statically embedded.

Azure AI Foundry Architecture

Azure enterprise stacks increasingly look like:

Frontend
  ↓
Azure API Management
  ↓
Azure AI Foundry Agents
  ↓
Semantic Kernel / AutoGen
  ↓
Azure OpenAI
  ↓
CosmosDB + Azure AI Search
  ↓
Azure Kubernetes Service (AKS)
  ↓
Event Grid + Service Bus

Key patterns:

hybrid vector + keyword search
CosmosDB memory persistence
AKS agent workers
RBAC security
Entra identity integration
observability with Application Insights

Microsoft is heavily converging AutoGen, Semantic Kernel, and enterprise orchestration into unified agent infrastructure.

AWS EKS Agentic Architecture

The AWS-native pattern looks like:

Ingress / API Gateway
      ↓
EKS Orchestrator Pods
      ↓
Agent Runtime
      ↓
Kafka / SQS / EventBridge
      ↓
Tool Workers
      ↓
Bedrock / OpenAI / Claude APIs

🔗 Companion project: Maestro — ardeshir.io/maestro/