Vol. 01 · Addendum A · May 2026 — companion to the AgentOS specification. Status: living document · Scope: 2024 – Q2 2026 · Sources: 14 papers, 6 industry projects.
Abstract
The proposition that an operating system can take the agent as its fundamental abstraction — rather than the process, file, or socket — is no longer speculative. Between March 2024 and May 2026, at least three distinct research programs have converged on architectures that look strikingly like what AgentOS describes: an LLM-as-kernel, agent-as-process, tool-as-syscall stack, with dedicated scheduling, memory hierarchies, and increasingly, dedicated silicon.
This addendum catalogs those efforts and aligns them to the six layers of the AgentOS spec. The intent is not exhaustive review, but positional clarity: where the field already has working systems, where it has formal proposals, and where the territory remains open — most notably, in agent-specialized silicon (the SiliAgentPU hypothesis).
§1 — The stack, with research density per layer
Each AgentOS layer is at a different research maturity. The middle layers (kernel, scheduler) are now densely populated with peer-reviewed systems work. The silicon layer is sparsely populated and largely industrial. The governance / observability layer is dominated by analyst frameworks, not yet rigorous architecture.
| Layer | Name | Scope | Status |
|---|---|---|---|
| L6 | Distributed Agent Cloud | Multi-tenant agent fleets, cross-org coordination, agent marketplaces | Early · Industry |
| L5 | Governance & Observability | Permissions, policy, audit, telemetry as first-class signals | Active · Analyst |
| L4 | Agent Userland | Tools, skills, memory APIs, multi-agent workflows | Mature · Open-source |
| L3 | Kernel & Scheduler | LLM-as-core, syscalls for inference / memory / tool, KV-cache scheduling | Mature · Academic |
| L2 | Runtime & Memory Hierarchy | Persistent memory, prefix caching, episodic / semantic / working memory | Active · Academic |
| L1 | Silicon | Agent-aware accelerators, heterogeneous SoCs, in-memory compute for attention | Early · Industry |
The kernel is solved on paper. The silicon is not yet anyone’s.
§2 — L3 Kernel: the agent as first-class process
The clearest academic ancestor of AgentOS is the AIOS line of work out of Rutgers. AIOS proposes a kernel that wraps each LLM instance as a “core,” analogous to a CPU core, with system calls for inference, memory access, storage, and tool use. Subsequent papers (and the von-Neumann-for-agents framing of 2025) push the analogy further, treating planner / memory / tool-use as architectural primitives rather than library code.
[01] AIOS: LLM Agent Operating System — Mei, Li, Xu et al., Rutgers University · arXiv:2403.16971 · v5 · Aug 2025 The canonical reference. Encapsulates each LLM instance as a “core” with standardized syscalls for inference, memory, storage, and tool use. Decomposes agent queries into categorized syscalls. Closest in spirit to the AgentOS L3 spec.
[02] Building LLM Agents by Incorporating Insights from Computer Systems — arXiv preprint, April 2025 Claims the first formal analogy between von Neumann architecture and LLM agents — CPU/memory/IO mapped to planner/context/tool-use. Argues for finer-grained memory and parallelization as the next frontier.
[03] Fundamentals of Building Autonomous LLM Agents — TUM, Trends in Autonomous Agents seminar, October 2025 Pedagogically-oriented but rigorous: positions the LLM as the central processing unit of multimodal agents, with modality encoders as peripheral I/O. Useful as an architectural skeleton.
§3 — L3 Scheduler: agent workloads break the existing assumptions
The most active sub-area in 2025–2026 is scheduling. Conventional inference engines evict KV cache the moment a request finishes — fine for chat, catastrophic for agents that pause for tool calls and resume. A sequence of papers from Berkeley (Stoica’s group) and elsewhere has reframed scheduling around workflow structure rather than per-request fairness.
[04] Continuum / CacheTTL: Multi-Turn Agent Scheduling with KV Cache Time-to-Live — Li, He, Mang, Zhang, Mao, Chen, Zhou, Cheung, Gonzalez, Stoica, UC Berkeley · arXiv:2511.02230 · v4 · May 2026 Identifies the central scheduling pathology of agent workloads: KV cache eviction during tool calls causes catastrophic recomputation. Introduces TTL-based retention keyed to predicted tool-call duration. First system to treat agent pauses as a scheduling primitive.
[05] SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters — arXiv preprint, 2026 Measures that production agent traces show 100:1 input-to-output token ratios; agents spend 38% of total time regenerating KV cache lost during tool calls; GPU memory utilization averages 42% due to fragmentation. Proposes workflow-atomic scheduling that surfaces agent step-graphs to the scheduler.
[06] KVFlow: Workflow-Aware KV Cache Eviction for Multi-Agent Workflows — Pan, Patel, Hu et al., 2025 · preprint Eviction policies that read the agent’s step-graph as a hint, prioritizing prefixes that downstream agents will reuse. A precursor to SAGA.
[07] LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference — Liu, Cheng, Yao et al., 2025 · tech report Treats KV cache as a tiered storage system across GPU, CPU, and disk — analogous to the page-cache layer of a conventional OS. Foundational for the L2 memory hierarchy.
§4 — L1 Silicon: the SiliAgentPU is not yet built
This is the open territory. Existing AI silicon (NVIDIA H/B-series, AMD MI400, Cerebras WSE, Groq LPU, D-Matrix Corsair) is optimized for tensor throughput on transformer forward passes. None are designed around agent primitives — long-lived KV cache hierarchies, tool-call dispatch, multi-agent context switching, plan/act/observe loops.
The first academic seed in this direction is Agent.xpu, which schedules agentic workloads across heterogeneous SoC components. It is a scheduling paper, not a microarchitecture paper — but it is the closest published work to a hardware/agent co-design.
[08] Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC — arXiv preprint, June 2025 Splits agent inference across iGPU thread-level execution units and NPU MAC arrays, with explicit treatment of prefill vs. decode and KV-cache pinning. Demonstrates that even on commodity heterogeneous silicon, agent-aware scheduling unlocks 2–5× efficiency.
[09] ChatArch: Knowledge-driven Graph-of-thought LLM Framework for Processor Architecture Optimization — ACM TODAES, December 2025 Inverts the relationship: uses LLM agents to design processor microarchitecture. Suggestive direction: agents specifying their own silicon. The research-tooling counterpart to a SiliAgentPU program.
[10] D-Matrix Corsair / DIMC chiplet patents (2025–2026) — D-Matrix Corporation · USPTO patents Three concentrated 2025–2026 patents on Digital In-Memory Compute chiplets using block floating-point and large on-chip SRAM, specifically for transformer self-attention. Patent-landscape analysts now identify a distinct cluster around “agent-orchestrated multi-chip silicon” — the closest commercial signal toward agent-aware hardware.
An ISA whose primitives are plan, act, observe, recall — rather than matmul — does not yet exist as a published proposal.
This is the SiliAgentPU opportunity. The thesis-and-FPGA-prototype gap is real and small. A research program proposing an agent-native ISA, prototyping it on FPGA, and benchmarking against Agent.xpu would land cleanly in ISCA, MICRO, or ASPLOS within 18 months.
§5 — L4 Userland: the abundance layer
This is the layer that does not need more research, only better foundations underneath. LangChain, AutoGen, CrewAI, and the recent Rust-native cohort (Rig, AutoAgents, OpenFANG) all operate here. The most aggressive industrial framing comes from the OpenFANG project, which explicitly calls itself an “Agent Operating System” — though it is, properly, an L4 framework that assumes an underlying L3.
[11] OpenFANG — Rust-native Agent Operating System — open-sourced March 2026 · 137,000 LOC, 14 crates The most ambitious open-source project in the space. Frames itself as an “Agent OS” rather than a framework; built on Tokio, with structured concurrency for sub-agent lifecycles. 5× memory reduction and 25–44% latency improvements vs. Python equivalents.
[12] OpenClaw — Local-First Agent Runtime — Peter Steinberger, Nov 2025 · 140k+ stars A radically minimal counter-proposal: a Node.js gateway on the local machine, accessible via Telegram/Signal/WhatsApp, with shell + filesystem + browser access. Demonstrates that an agent-native runtime can begin from radical simplicity rather than enterprise complexity.
§6 — L5 Governance: observability as architecture
This layer is dominated, somewhat unhealthily, by analyst frameworks rather than rigorous architecture. Futurum Research’s January 2026 study introduced “observability-native” as a design principle: intent, reasoning, constraints, and outcomes treated as first-class telemetry rather than infrastructure side-effects. The architectural foundations for this layer remain under-specified.
[13] The Seven Principles of Observability-Native — Futurum Research, March 2026 · industry report Argues that AI agents operating at machine speed have outpaced the governance capacity of traditional observability. Defines seven principles for embedding agent-behavior visibility as a first-class architectural primitive throughout the SDLC.
[14] Mem0 — Agent Memory as Infrastructure — ongoing · industry research Empirical evidence that purpose-built agent memory significantly outperforms model-native memory on complex multi-turn tasks. A signal that L2 memory hierarchy needs to be architecturally distinct from the LLM itself.
§7 — Where the territory is still open
Reading the bibliography against the AgentOS spec, four areas have no clear academic claimant.
| Open Problem | Status | Adjacent Work | What’s Missing |
|---|---|---|---|
| Agent-native ISA | Unclaimed | Agent.xpu [08] | Microarchitecture proposal + FPGA |
| Cross-agent memory hierarchy | Fragments | LMCache [07], Mem0 [14] | Unified memory model across agents |
| Formal scheduling guarantees | Empirical only | SAGA [05], Continuum [04] | Theoretical bounds (cf. Bélády) |
| Inter-org agent protocols | Industry-led | MCP, A2A drafts | Academic security analysis |
The pattern is consistent: scheduling and userland are crowded; silicon and memory hierarchy are uncrowded; governance is theoretical. The cleanest research bet — for a lab one might assemble — is L1/L2 co-design: an agent-native silicon architecture with a unified memory model, benchmarked against existing heterogeneous-SoC scheduling.
⁂ ⁂ ⁂
§8 — Editorial note
The premise of AgentOS — that the agent is the right primitive around which to organize an operating system — is not a fringe view in 2026. It is the consensus framing of multiple independent research groups (Rutgers, Berkeley, TUM), industrial labs (Mem0, D-Matrix), and open-source projects (OpenFANG, OpenClaw). What remains contested is where the abstraction belongs: at the framework layer, the kernel layer, or the silicon layer.
The AgentOS spec stakes out the most ambitious answer — all six layers, coherently. This document’s purpose is to make explicit which collaborators already exist on each, and which remain to be recruited.
Forthcoming addenda will track new entries quarterly.