AgentOS — Continued Research

Vol. 01 · Addendum A · May 2026 — companion to the AgentOS specification. Status: living document · Scope: 2024 – Q2 2026 · Sources: 14 papers, 6 industry projects.

Abstract

The proposition that an operating system can take the agent as its fundamental abstraction — rather than the process, file, or socket — is no longer speculative. Between March 2024 and May 2026, at least three distinct research programs have converged on architectures that look strikingly like what AgentOS describes: an LLM-as-kernel, agent-as-process, tool-as-syscall stack, with dedicated scheduling, memory hierarchies, and increasingly, dedicated silicon.

This addendum catalogs those efforts and aligns them to the six layers of the AgentOS spec. The intent is not exhaustive review, but positional clarity: where the field already has working systems, where it has formal proposals, and where the territory remains open — most notably, in agent-specialized silicon (the SiliAgentPU hypothesis).

§1 — The stack, with research density per layer

Each AgentOS layer is at a different research maturity. The middle layers (kernel, scheduler) are now densely populated with peer-reviewed systems work. The silicon layer is sparsely populated and largely industrial. The governance / observability layer is dominated by analyst frameworks, not yet rigorous architecture.

Layer	Name	Scope	Status
L6	Distributed Agent Cloud	Multi-tenant agent fleets, cross-org coordination, agent marketplaces	Early · Industry
L5	Governance & Observability	Permissions, policy, audit, telemetry as first-class signals	Active · Analyst
L4	Agent Userland	Tools, skills, memory APIs, multi-agent workflows	Mature · Open-source
L3	Kernel & Scheduler	LLM-as-core, syscalls for inference / memory / tool, KV-cache scheduling	Mature · Academic
L2	Runtime & Memory Hierarchy	Persistent memory, prefix caching, episodic / semantic / working memory	Active · Academic
L1	Silicon	Agent-aware accelerators, heterogeneous SoCs, in-memory compute for attention	Early · Industry

The kernel is solved on paper. The silicon is not yet anyone’s.

§2 — L3 Kernel: the agent as first-class process

The clearest academic ancestor of AgentOS is the AIOS line of work out of Rutgers. AIOS proposes a kernel that wraps each LLM instance as a “core,” analogous to a CPU core, with system calls for inference, memory access, storage, and tool use. Subsequent papers (and the von-Neumann-for-agents framing of 2025) push the analogy further, treating planner / memory / tool-use as architectural primitives rather than library code.

[01] AIOS: LLM Agent Operating System — Mei, Li, Xu et al., Rutgers University · arXiv:2403.16971 · v5 · Aug 2025 The canonical reference. Encapsulates each LLM instance as a “core” with standardized syscalls for inference, memory, storage, and tool use. Decomposes agent queries into categorized syscalls. Closest in spirit to the AgentOS L3 spec.

[02] Building LLM Agents by Incorporating Insights from Computer Systems — arXiv preprint, April 2025 Claims the first formal analogy between von Neumann architecture and LLM agents — CPU/memory/IO mapped to planner/context/tool-use. Argues for finer-grained memory and parallelization as the next frontier.

[03] Fundamentals of Building Autonomous LLM Agents — TUM, Trends in Autonomous Agents seminar, October 2025 Pedagogically-oriented but rigorous: positions the LLM as the central processing unit of multimodal agents, with modality encoders as peripheral I/O. Useful as an architectural skeleton.

§3 — L3 Scheduler: agent workloads break the existing assumptions

The most active sub-area in 2025–2026 is scheduling. Conventional inference engines evict KV cache the moment a request finishes — fine for chat, catastrophic for agents that pause for tool calls and resume. A sequence of papers from Berkeley (Stoica’s group) and elsewhere has reframed scheduling around workflow structure rather than per-request fairness.

[04] Continuum / CacheTTL: Multi-Turn Agent Scheduling with KV Cache Time-to-Live — Li, He, Mang, Zhang, Mao, Chen, Zhou, Cheung, Gonzalez, Stoica, UC Berkeley · arXiv:2511.02230 · v4 · May 2026 Identifies the central scheduling pathology of agent workloads: KV cache eviction during tool calls causes catastrophic recomputation. Introduces TTL-based retention keyed to predicted tool-call duration. First system to treat agent pauses as a scheduling primitive.

[05] SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters — arXiv preprint, 2026 Measures that production agent traces show 100:1 input-to-output token ratios; agents spend 38% of total time regenerating KV cache lost during tool calls; GPU memory utilization averages 42% due to fragmentation. Proposes workflow-atomic scheduling that surfaces agent step-graphs to the scheduler.

[06] KVFlow: Workflow-Aware KV Cache Eviction for Multi-Agent Workflows — Pan, Patel, Hu et al., 2025 · preprint Eviction policies that read the agent’s step-graph as a hint, prioritizing prefixes that downstream agents will reuse. A precursor to SAGA.

[07] LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference — Liu, Cheng, Yao et al., 2025 · tech report Treats KV cache as a tiered storage system across GPU, CPU, and disk — analogous to the page-cache layer of a conventional OS. Foundational for the L2 memory hierarchy.

§4 — L1 Silicon: the SiliAgentPU is not yet built

This is the open territory. Existing AI silicon (NVIDIA H/B-series, AMD MI400, Cerebras WSE, Groq LPU, D-Matrix Corsair) is optimized for tensor throughput on transformer forward passes. None are designed around agent primitives — long-lived KV cache hierarchies, tool-call dispatch, multi-agent context switching, plan/act/observe loops.

The first academic seed in this direction is Agent.xpu, which schedules agentic workloads across heterogeneous SoC components. It is a scheduling paper, not a microarchitecture paper — but it is the closest published work to a hardware/agent co-design.

[08] Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC — arXiv preprint, June 2025 Splits agent inference across iGPU thread-level execution units and NPU MAC arrays, with explicit treatment of prefill vs. decode and KV-cache pinning. Demonstrates that even on commodity heterogeneous silicon, agent-aware scheduling unlocks 2–5× efficiency.

[09] ChatArch: Knowledge-driven Graph-of-thought LLM Framework for Processor Architecture Optimization — ACM TODAES, December 2025 Inverts the relationship: uses LLM agents to design processor microarchitecture. Suggestive direction: agents specifying their own silicon. The research-tooling counterpart to a SiliAgentPU program.

[10] D-Matrix Corsair / DIMC chiplet patents (2025–2026) — D-Matrix Corporation · USPTO patents Three concentrated 2025–2026 patents on Digital In-Memory Compute chiplets using block floating-point and large on-chip SRAM, specifically for transformer self-attention. Patent-landscape analysts now identify a distinct cluster around “agent-orchestrated multi-chip silicon” — the closest commercial signal toward agent-aware hardware.

An ISA whose primitives are plan, act, observe, recall — rather than matmul — does not yet exist as a published proposal.

This is the SiliAgentPU opportunity. The thesis-and-FPGA-prototype gap is real and small. A research program proposing an agent-native ISA, prototyping it on FPGA, and benchmarking against Agent.xpu would land cleanly in ISCA, MICRO, or ASPLOS within 18 months.

§5 — L4 Userland: the abundance layer

This is the layer that does not need more research, only better foundations underneath. LangChain, AutoGen, CrewAI, and the recent Rust-native cohort (Rig, AutoAgents, OpenFANG) all operate here. The most aggressive industrial framing comes from the OpenFANG project, which explicitly calls itself an “Agent Operating System” — though it is, properly, an L4 framework that assumes an underlying L3.

[11] OpenFANG — Rust-native Agent Operating System — open-sourced March 2026 · 137,000 LOC, 14 crates The most ambitious open-source project in the space. Frames itself as an “Agent OS” rather than a framework; built on Tokio, with structured concurrency for sub-agent lifecycles. 5× memory reduction and 25–44% latency improvements vs. Python equivalents.

[12] OpenClaw — Local-First Agent Runtime — Peter Steinberger, Nov 2025 · 140k+ stars A radically minimal counter-proposal: a Node.js gateway on the local machine, accessible via Telegram/Signal/WhatsApp, with shell + filesystem + browser access. Demonstrates that an agent-native runtime can begin from radical simplicity rather than enterprise complexity.

§6 — L5 Governance: observability as architecture

This layer is dominated, somewhat unhealthily, by analyst frameworks rather than rigorous architecture. Futurum Research’s January 2026 study introduced “observability-native” as a design principle: intent, reasoning, constraints, and outcomes treated as first-class telemetry rather than infrastructure side-effects. The architectural foundations for this layer remain under-specified.

[13] The Seven Principles of Observability-Native — Futurum Research, March 2026 · industry report Argues that AI agents operating at machine speed have outpaced the governance capacity of traditional observability. Defines seven principles for embedding agent-behavior visibility as a first-class architectural primitive throughout the SDLC.

[14] Mem0 — Agent Memory as Infrastructure — ongoing · industry research Empirical evidence that purpose-built agent memory significantly outperforms model-native memory on complex multi-turn tasks. A signal that L2 memory hierarchy needs to be architecturally distinct from the LLM itself.

§7 — Where the territory is still open

Reading the bibliography against the AgentOS spec, four areas have no clear academic claimant.

Open Problem	Status	Adjacent Work	What’s Missing
Agent-native ISA	Unclaimed	Agent.xpu [08]	Microarchitecture proposal + FPGA
Cross-agent memory hierarchy	Fragments	LMCache [07], Mem0 [14]	Unified memory model across agents
Formal scheduling guarantees	Empirical only	SAGA [05], Continuum [04]	Theoretical bounds (cf. Bélády)
Inter-org agent protocols	Industry-led	MCP, A2A drafts	Academic security analysis

The pattern is consistent: scheduling and userland are crowded; silicon and memory hierarchy are uncrowded; governance is theoretical. The cleanest research bet — for a lab one might assemble — is L1/L2 co-design: an agent-native silicon architecture with a unified memory model, benchmarked against existing heterogeneous-SoC scheduling.

⁂ ⁂ ⁂

§8 — Editorial note

The premise of AgentOS — that the agent is the right primitive around which to organize an operating system — is not a fringe view in 2026. It is the consensus framing of multiple independent research groups (Rutgers, Berkeley, TUM), industrial labs (Mem0, D-Matrix), and open-source projects (OpenFANG, OpenClaw). What remains contested is where the abstraction belongs: at the framework layer, the kernel layer, or the silicon layer.

The AgentOS spec stakes out the most ambitious answer — all six layers, coherently. This document’s purpose is to make explicit which collaborators already exist on each, and which remain to be recruited.

Forthcoming addenda will track new entries quarterly.