The Watt and the Wing

Publication: Sepahsalar.org/research Series: Sepahsalar · Technology Addendum: Companion essay to AgentOS · Continued Research Status: Living essay Date: May 2026 Domains: sepahsalar.org, ardeshir.io/agentos/

A brain runs on twenty watts. A frontier model rack runs on one hundred and forty thousand watts. Somewhere between those two numbers is a question worth asking out loud.

I · A Denny’s, a die, and a debt to envy

In April 1993, three engineers sat in a Denny’s in San Jose and named a company after the Latin word for envy — invidia. Jensen Huang, Chris Malachowsky, and Curtis Priem were not, in any obvious sense, the kind of people who would build the substrate of a planetary intelligence. They were graphics guys. They wanted to make video games look better. They borrowed forty thousand dollars and raised twenty million more, survived a near-bankruptcy with the NV1, and in 1999 shipped the GeForce 256 — the first device anyone called a GPU.

Then, in 2006, came the choice that mattered: CUDA. A parallel-computing platform layered on top of graphics silicon, exposing the GPU as a general-purpose mathematical engine. There was no market for it. There was barely a use case for it. For nearly a decade, CUDA was the most generously funded research curiosity in Silicon Valley — a software stack waiting for a workload that didn’t yet exist.

That workload arrived in 2012, when AlexNet won the ImageNet competition by running on two GeForce GTX 580s. The neural network era began on consumer gaming cards. By 2026, NVIDIA is reportedly the most valuable public company on Earth, with a Data Center business posting a single quarter of $62.3 billion in revenue and a Blackwell backlog of roughly 3.6 million units stretching past mid-2026. A single GB200 superchip can draw 1,200 watts. A rack of B200s pulls fifty to sixty kilowatts. A million-GPU Blackwell cluster — the kind being built right now in Northern Virginia and West Texas — is estimated at 1.0 to 1.4 gigawatts, enough to power a mid-sized city.

NVIDIA did not plan this. NVIDIA bet on this. The bet was: if we keep doubling parallel arithmetic per watt, every important problem in the world will eventually look like a matrix multiplication.

It was a brilliant bet. It was also — and this is the thesis of this essay — a monoculture bet.

II · The von Neumann tax, paid in gigawatts

Every chip in every NVIDIA rack inherits a design decision made by John von Neumann in 1945. The decision was: separate the place where you keep the numbers from the place where you do arithmetic on the numbers, and shuttle the numbers back and forth across a bus. This was a stunning simplification at the time. It made computing programmable. It also made computing thirsty, because the act of moving a number across a wire costs orders of magnitude more energy than the act of doing arithmetic on it once it arrives.

For eighty years, we paid this tax cheerfully, because the numbers were small and the wires were short. Then we started training models with hundreds of billions of parameters, and the tax bill came due in coal plants.

A human brain — the most sophisticated inference system anyone has ever measured — runs the entire act of conscious thought on roughly twenty watts. A GB200 Superchip running a single forward pass through a frontier model uses approximately sixty times that much power for a single token. The biological gap is not 10x. It is not 100x. By the measurements neuromorphic researchers report, it is between 1,000x and 100,000x, depending on the workload. IBM’s NorthPole, a brain-inspired digital chip that abolishes the memory-compute separation, has been measured at 72.7x higher energy efficiency than a GPU for large-language-model inference, and 25x more efficient than NVIDIA’s V100 on image recognition. Intel’s Loihi 2 hits roughly 1,000x lower energy on specific event-driven tasks than NVIDIA Jetson edge GPUs.

These chips are not toys. Intel’s Hala Point — built from Loihi 2 silicon — already simulates 1.15 billion neurons in a single system. IBM moved NorthPole to full-scale production in early 2026. The German firm SpiNNcloud is partnering with Sandia National Labs to put neuromorphic systems into national-defense workloads.

And yet none of these chips run GPT-style models at frontier scale today, because the entire software ecosystem — every framework, every tokenizer, every gradient compiler, every fine-tune notebook — assumes a CUDA target underneath. CUDA is the world’s most successful evolutionary niche. Like all successful niches, it has produced a tremendous diversity of life within itself and a desert around itself.

This is the monoculture.

III · Oxide, and the other half of the cage

There is a second cage, less discussed, that sits underneath the GPU question.

In 2019, Bryan Cantrill, Steve Tuck, and Jessie Frazelle founded Oxide Computer Company on a heretical observation: every hyperscaler on Earth — Google, Amazon, Meta, Microsoft — designs its own racks, its own switches, its own power-distribution units, its own firmware, its own root-of-trust. None of them buy from Dell. And yet everyone else — every bank, every hospital, every government, every research lab outside the hyperscale tier — still buys 1U pizza boxes from Dell and HP and Supermicro, then integrates them themselves, in what Cantrill memorably calls a kit-car disaster.

Oxide built a different thing. A single 92-inch rack, 3,000 pounds, with sixteen to thirty-two compute sleds that blind-mate directly to a 54-volt DC busbar — no AC-to-DC conversion at each server, no power cables, no network cables, no BIOS, no UEFI. Their own switch, their own hypervisor, their own operating system (Helios, an OpenSolaris descendant), their own bring-up RTOS (Hubris), all written in Rust, all open-source, all licensed for zero dollars. The rack draws up to 15 kW total under load — not per sled, per rack. It boots from crate to running VMs in roughly an hour.

Oxide’s customers are Lawrence Livermore, CoreWeave, the Idaho National Lab, a major financial-services firm. In March 2026 they closed a $100 million Series B led by Thomas Tull’s USIT.

Oxide is doing something important and something incomplete in the same gesture. The important thing: they have proven that commodity rack design is a malpractice, that the cabled, BIOS-poisoned, vendor-fingerpointing reality of enterprise infrastructure is a historical accident, not a law of nature. The incomplete thing: Oxide’s rack is still, fundamentally, a von Neumann machine at scale. AMD EPYC cores, DDR4 DIMMs, NVMe drives, Tofino switches. Same memory wall. Same instruction-fetch tax. Same thermodynamics, just laid out with more honesty.

NVIDIA optimizes the compute of the cage. Oxide optimizes the plumbing of the cage. Nobody is yet building a different species of cage.

IV · A first-principles digression: what is a computer for?

Step back. The question is not “how do we make chips faster” or “how do we make racks neater.” The question is: what is the purpose of computation?

For most of human history, the answer was: to extend memory and arithmetic beyond what one skull can hold. The abacus, the clay tablet, the loom card, the Jacquard, the Hollerith census tabulator, ENIAC — every one of these was a survival technology. They were built because a tribe, a temple, a state, or a war needed to count more grain, more bodies, more shells, more stars than any single human nervous system could track. Computation served the organism.

Somewhere in the second half of the twentieth century — and this is not a precise date, it is a slow inversion — the polarity flipped. Computation stopped serving the organism. The organism started serving computation. Or rather: the organism started serving the owners of computation, who were no longer the tribe, the temple, or even the nation, but a small set of capital-aggregating firms whose fiduciary obligation was, by law and by structure, to maximize quarterly return.

Look at the energy line. From 1945 to roughly 2010, the watts-per-useful-bit curve fell. Computing got more efficient with each generation, as a matter of physics and as a matter of social purpose: we wanted more humans to be able to use computers in more places, on more battery-powered devices, in more remote villages. Then, around 2012, the curve inverted. The absolute energy draw of frontier computing began to climb — not because we needed more useful computation per person, but because a different economic logic took hold: the firm that controlled the largest training run controlled the most powerful model, and the most powerful model controlled the most lucrative API, and the most lucrative API controlled the largest fraction of attention and capital.

By 2026, AI energy consumption is projected to hit 134 terawatt-hours annually — roughly the total electricity demand of Sweden. Microsoft and Meta are negotiating directly with Small Modular Reactor vendors. Data centers are sited not where humans live but where grid capacity exists. The metric is no longer compute-per-watt. It is watts available, full stop. Kilowatts have replaced chips as the binding constraint.

This is not a complaint about NVIDIA, or about Jensen Huang, or even about capitalism in any vulgar sense. It is an observation about what happens when a technology’s purpose ceases to be the survival of the organism that invented it and becomes the growth of the capital structure that owns it. The two are not the same thing. They were never the same thing. The conflation was a temporary marketing convenience of the 1990s.

V · The biological universe is not a monoculture

Here is the move I want to make, and it is the move that requires the most care.

For four billion years, life on Earth has solved the problem of intelligent information-processing in an environment of bounded energy. The solutions life has converged on are radically unlike what a Denny’s-founded graphics company would build:

Local computation. A neuron does its work next to the synapse where the memory is stored. There is no bus. There is no fetch cycle. The von Neumann tax does not exist in biology.
Event-driven sparsity. A cortical neuron fires perhaps once or twice per second on average. It is silent otherwise. A GPU multiplies by zero hundreds of trillions of times per second and pays full energy cost for every wasted multiplication.
Heterogeneous specialization. The retina is not built like the cerebellum. The cerebellum is not built like the hippocampus. There is no “general purpose neural tissue.” Every region is a different architecture solving a different physics.
Graceful degradation. A stroke does not crash the brain. A traumatic brain injury does not return a stack trace. Biological computation degrades along a thousand redundant pathways; silicon computation halts on a single bit flip.
Suffering as a feature. This is the hardest one to say in a technology essay, and the most important. Evolution does not produce resilience by avoiding cost. It produces resilience by exposing the organism to costs and selecting the variants that bear them. Pain is information. Death is selection. Diversity is not decoration; it is the inventory from which the next adaptation is drawn.

A monoculture is efficient in a stationary environment and catastrophic in a non-stationary one. The Irish potato, the Cavendish banana, the American chestnut, the Norway spruce — every monoculture in agricultural history has produced spectacular short-term yield and then collapsed under a single shock. The TSMC-NVIDIA-CUDA stack is the most beautiful monoculture humanity has ever built. It is also, by every historical analogy, the most fragile.

The Blackwell backlog stretches to mid-2026. A single earthquake on the Hsinchu fault could disable global frontier AI capacity for eighteen months. A single export-control reclassification — and there have been several — could redraw the map. A single power-grid failure in a fusion-bet region could brown out a model that fifty million people use every day. We have placed the entire forward edge of human cognitive infrastructure on one die, one fab, one architecture, one programming model, one company’s roadmap.

This is not a sustainable design. It is a high-yield design, in the agricultural sense, and we know exactly what happens to high-yield monocultures.

VI · What a diverse substrate might look like

Imagine, instead, a computing substrate that took its design lessons from the biological universe rather than from the von Neumann report.

It would not have one kind of chip. It would have many: dense matrix engines for the workloads that genuinely need them (training frontier models, scientific simulation), neuromorphic engines for event-driven perception, in-memory analog chips for retrieval and association, agent-specialized silicon for the long-horizon plan-act-observe loops that increasingly define real workloads (this is the SiliAgentPU thesis from the Layer 1 work in the AgentOS spec — agent-native instruction sets, not tensor cores). It would treat heterogeneity as a load-bearing property, not a transition state.

It would not assume one kind of rack. Oxide’s rethink of the power, cooling, and firmware contract is correct and incomplete; the next step is racks that mix CMOS, neuromorphic, photonic, and analog substrates on the same blind-mated busbar, with a software layer (an AgentOS, perhaps) that schedules each workload to the substrate whose physics suits it.

It would not assume one geography. Distributed across edge, regional, and central tiers — not because edge computing is fashionable, but because 20-watt inference at the point of perception is a different category of computation than gigawatt training in West Texas, and pretending they are the same workload is what produces the current absurdity of running smart-doorbell ML on a remote H100 cluster.

It would not optimize for one metric. The current frontier optimizes for “tokens per second per dollar,” which is a proxy that has already done damage. A biologically literate substrate would optimize for joules per useful decision — and would refuse workloads where the answer is “infinity joules for zero useful decisions,” which is currently a large fraction of consumer AI traffic.

It would treat suffering as data. Every degraded inference, every retracted hallucination, every grid brownout caused by an inference burst would feed back into the substrate’s selection pressure. The current stack treats every failure as a bug to be hidden; a diverse stack would treat failure as the signal by which the next architecture is bred.

And it would, finally, let things die. The current industry cannot kill any architectural commitment, because the capital stack will not permit a write-down. CUDA is too valuable to deprecate. The 1U server form factor is too entrenched to abandon. The hyperscale rental model is too profitable to disrupt. A biologically literate stack would, like a forest, make room for the next species by allowing the previous one to fall.

None of this is a technical impossibility. All of it is an economic impossibility under the current ownership structure of compute.

VII · Reinvention is not a slogan

This essay began at a Denny’s in 1993 and ended at an indictment of the financial logic that turned that Denny’s into a three-trillion-dollar monoculture. Let me try to land it somewhere more useful than a complaint.

The proposal — the AgentOS stack, the SiliAgentPU silicon, the distributed substrate that lives across ardeshir.io/agentos/ and sepahsalar.org — is not a competing monoculture. It is an argument for diversity as architecture. Six layers, from silicon to distributed cloud, designed so that no single layer can hold the others hostage. An open instruction set at Layer 1, so the cost of forking is low. An agent-native kernel at Layer 3, so the abstraction belongs to the user rather than the API provider. A scheduler that treats heterogeneous substrates as first-class citizens, not as fallback paths. A distributed cloud at Layer 6 that assumes failure rather than denying it.

The biological universe is sustainable not because it is comfortable. It is sustainable because diversity and suffering come together as a generative pair — diversity provides the variants, suffering provides the selection, and the system reinvents itself rather than scaling itself to death. The technologies that have served human survival longest — language, agriculture, medicine, writing — are the ones that retained this property. The technologies that have served capital growth most efficiently — the assembly line, the container ship, the data center — have shed this property in exchange for short-term yield.

We are not at the end of the GPU era. NVIDIA will, by every reasonable forecast, keep growing through 2030. The Vera Rubin platform will ship in 2027. The 3.6 million Blackwell units will be installed and absorbed. None of this is wrong. None of this needs to be undone.

But somewhere in the same decade, the next substrate has to begin. And it will not come from inside the monoculture. It will come from the people who notice that a brain runs on twenty watts, and who refuse to treat that fact as a curiosity rather than as a design specification.

The wing did not evolve to compete with the leg. It evolved because there was an opening in the air, and a creature small enough to fall into it. The watt and the wing are the same kind of bet — a bet that there is a niche, currently empty, that rewards the species that learns to inhabit it.

We can build that. We need the lab. We need the funding. We need the patience of biology — which is the patience of selection running across many generations, on many variants, in many environments, with many quiet failures, until something flies.

Notes & sources

This essay draws on current (May 2026) public reporting on NVIDIA’s financials and roadmap; Oxide Computer Company’s published specifications and Bryan Cantrill’s recorded interviews; peer-reviewed and trade reporting on Intel Loihi 2/Hala Point and IBM NorthPole; and the working materials of the AgentOS specification. Specific figures cited inline:

NVIDIA Q4 FY2026 Data Center revenue: $62.3B (SEC 8-K, Feb 2026).
Blackwell backlog: ~3.6M units, sold out through mid-2026.
GB200 Superchip peak draw: ~1,200W; B200 rack: ~50–60 kW; 1M-GPU Blackwell cluster: 1.0–1.4 GW.
IBM NorthPole vs V100: 25x energy efficiency, 22x throughput on inference; 72.7x efficiency on LLM inference.
Intel Loihi 2 vs NVIDIA Jetson Orin Nano: ~1,000x energy efficiency, 75x lower latency on event-driven tasks.
Intel Hala Point: 1.15B neurons in one system.
Oxide rack: 3,000 lb, 92.7” tall, up to 32 sleds, 15 kW max, 54V DC busbar, $100M Series B (Mar 2026).
Projected 2026 AI energy footprint: ~134 TWh/yr (approximately Sweden’s annual electricity demand).
Human brain power budget: ~20 W.

Two figures are presented as best-available secondary reporting and should be verified against primary sources before reprinting: the 72.7x LLM-inference efficiency claim for NorthPole and the 1,000x efficiency claim for Loihi 2 (both are vendor-stated, and both are workload-specific rather than general).

Companion to: AgentOS · Continued Research Series: Sepahsalar · Technology Living document — comments and corrections welcome.

The research lives at Sepahsalar.org. The applied work at ardeshir.io. The learning network at MetaLearn. The creative commons at the IMAGINARIUM. The distributed cloud at Univrs. The code is open at github.com/univrs and github.com/ardeshir.