The Three Theories of the Cloud

Crusoe vs. Oxide vs. the Hyperscalers — and the Physics Behind the $700B AI Capex Cycle

A deep technical and economic survey of next-generation AI infrastructure, the architectural divergence between purpose-built AI factories and traditional cloud, and why power — not capital — has become the binding constraint on machine intelligence.

1. Why this matters now

Between 2022 and 2024, hyperscalers spent roughly $477 billion on infrastructure. Goldman Sachs projects that 2025–2027 spend will reach $1.15 trillion — more than double, in a third of the time. Q1 2026 earnings revised the number upward again, with Amazon, Microsoft, Alphabet, Meta and Oracle now committing to $660–725 billion in 2026 alone, of which roughly 75% (~$450B) is AI-specific.¹²³

This is no longer a software industry. Capital intensity at Microsoft (45% of revenue) and Oracle (57%) now exceeds that of integrated electric utilities. Morgan Stanley and JPMorgan estimate the tech sector will issue ~$1.5 trillion in new debt over the next three years to fund the buildout.²⁴

Three architectural theories are competing to absorb this capital:

	Crusoe	Oxide	Hyperscalers (AWS / Azure / GCP / OCI)
Unit of design	Gigawatt campus	One rack	Region
Customer	AI labs, frontier model trainers	Sovereign / regulated enterprise	Everyone
Pricing	$/GPU-hour, undercuts hyperscalers	One-time purchase + support	$/instance-hour, $/GB egress
Moat	Power + land + speed-to-energization	Hardware-software co-design	Service breadth + lock-in
Model	AI factory (single-purpose)	Cloud computer (owned)	General-purpose rental

2. Crusoe: an energy company that ships GPU-hours

2.1 History

Crusoe was founded in 2018 by Chase Lochmiller and Cully Cavness around a contrarian observation: oil and gas wells were burning ~$30 billion of stranded methane per year as flare gas, and that gas could be redirected into containerized data centers placed on the wellpad itself. The original product was Bitcoin mining; Crusoe later divested that business to NYDIG to focus exclusively on AI.

2.2 Trajectory

2018–2022: Patented Digital Flare Mitigation®. Repurposed over 21 billion cubic feet of stranded gas, avoiding 2.7 million metric tons of CO₂.
Dec 2024: $600M Series D at $2.8B valuation, led by Founders Fund, with NVIDIA participating.
Oct 2024 / May 2025: Blue Owl + Primary Digital JV scales from $3.4B to $15B to fund the 1.2 GW Abilene, Texas campus — leased to OpenAI as the first phase of “Stargate.”
Oct 2025: $1.375B Series E at $10B valuation, led by Valor Equity Partners and Mubadala Capital. Power pipeline crosses 45 gigawatts.
2026: 1.8 GW Wyoming campus with Tallgrass; Crusoe Spark modular AI data centers powered by Redwood Materials’ second-life EV batteries; Crusoe Managed Inference launches with proprietary MemoryAlloy technology.

Total raised across debt and equity: ~$3.9 billion as of October 2025.⁵

2.3 The technical differentiation — what “AI from the ground up” actually means

The phrase is often marketing, but for Crusoe it’s literal. Three architectural choices distinguish a purpose-built AI campus from a retrofitted hyperscaler hall:

(a) Power density: 5–15 kW → 132–240 kW per rack

Traditional enterprise data centers were engineered for racks drawing 5–15 kW. NVIDIA Hopper-generation systems pushed this to ~40 kW. The new Blackwell-generation GB200 NVL72 rack draws ~120 kW, and the GB300 NVL72 successor moves toward 240 kW. NVIDIA’s own roadmap shows GPU TDP doubling every two years, reaching 1,500W per chip by 2026.⁶

This is not an incremental change. It is a 10-30× increase in power per square meter, and it makes existing facilities physically incompatible with frontier hardware — Microsoft reportedly spent $1 billion retrofitting facilities for liquid cooling after discovering air-cooled infrastructure couldn’t support GPT training workloads.

(b) Cooling: air → direct liquid → immersion

At 100+ kW per rack, air cooling fails on basic thermodynamics. Water has roughly 25× the thermal conductivity of air.⁷ Crusoe’s Abilene campus uses closed-loop direct-to-chip liquid cooling with coolant entering at 25°C and exiting 20°C warmer. NVIDIA reports the Blackwell platform delivers 25× more energy efficiency and 300× more water efficiency than traditional air-cooled architectures.

(c) Network fabric: TCP/IP → NVLink + RDMA

This is the deepest break from cloud orthodoxy. A general-purpose cloud is built for millions of small, independent, fault-tolerant containers communicating over Ethernet/TCP. A frontier training run is one tightly-coupled job that needs lossless, microsecond-latency, all-to-all bandwidth across tens of thousands of GPUs.

The GB200 NVL72 makes this concrete:

72 Blackwell GPUs + 36 Grace CPUs in a single rack
Connected by fifth-generation NVLink at 130 TB/s of all-to-all bandwidth
The rack acts as a single GPU with 13.4 TB unified memory
Inter-rack connectivity uses Quantum-2 InfiniBand or Spectrum-X Ethernet with RoCE in a rail-optimized fat-tree topology
3.2 km of copper cabling chosen over optics to reduce power draw by 20 kW

You cannot time-share a training cluster the way you time-share EC2. The fabric topology is the product.

2.4 Economic value: why Crusoe undercuts hyperscalers

Crusoe is vertically integrated from the electron upward — they own or co-own the power generation, the substation, the building, the cooling plant, the GPUs, and the orchestration software. Contrary Research observes that Crusoe’s vertical integration allows it to undercut traditional cloud pricing on H100/B200 SKUs while offering a meaningful cost advantage.

The unit economics are simple: AWS, Azure, GCP, and OCI must amortize the cost of supporting hundreds of services (S3, Lambda, RDS, IAM, Bedrock, etc.) and global edge presence on top of every GPU-hour. Crusoe sells one SKU. The result is meaningfully lower $/GPU-hour, with the trade-off that you don’t get a cloud — you get raw compute.

3. Oxide: the counter-thesis

3.1 Origin

Oxide was founded in 2019 by Steve Tuck (ex-Joyent / Dell) and Bryan Cantrill (ex-Sun / Joyent / DTrace co-author). The thesis: hyperscalers built proprietary rack-scale servers a decade ago because retail server vendors couldn’t deliver the integration they needed. Everyone else — banks, labs, governments, regulated enterprises — has been stuck buying “kit-car” servers from Dell/HPE/Supermicro and stitching them together with VMware, Kubernetes, and bespoke automation.

3.2 Funding & traction

2023: $44M Series A, shipped first commercial rack
Aug 2025: $100M Series B, led by USIT
Feb 2026: $200M Series C, bringing total raised to ~$378M
Customers: Lawrence Livermore National Lab, Idaho National Lab, CoreWeave, Switch, an unnamed global financial services firm

3.3 What’s actually in the box

The Oxide Cloud Computer is a single 3,000-pound rack that ships as one unit:

Compute: 16, 24, or 32 sleds, each with AMD EPYC 7713P (64 cores), 512 GB or 1 TB RAM, 32 TB NVMe — totaling up to 2,048 cores, 32 TB RAM, 1 PB storage
Networking: Self-designed switch using Intel Tofino 2 ASICs, no traditional cabling — sleds snap directly into a backplane
Power: DC bus bar instead of per-server AC PSUs (35% more efficient)
Firmware: No BIOS. Replaced with Hubris, a custom Rust-based OS, running directly after the AMD security processor
Hypervisor & control plane: Custom-built, all open source
Provisioning: Rack to first developer VM in under 2 hours

3.4 Economic value: TCO, not $/hour

Oxide’s pitch is the inverse of Crusoe’s. Where Crusoe says “the cloud is too expensive, rent ours instead,” Oxide says “the cloud is structurally extractive — own your compute, in your own building, with cloud ergonomics.”

Pricing starts around $500K for a base configuration.⁸ Oxide claims ~50% TCO advantage versus equivalent public cloud spend once you account for licensing (no VMware tax, no per-core fees), the absence of egress charges, and 5-7 year hardware refresh cycles versus 3-4 years for air-cooled gear.

The market: any workload where predictable steady-state compute meets data-residency/sovereignty constraints. That includes labs, banks, healthcare, government, and the small but growing “cloud repatriation” cohort.

Limitation: Oxide is currently CPU-only. CTO Bryan Cantrill has said they’re working on GPU integration but wanted to “solve the mainstream compute problem first.” For now, Oxide is not a player in AI training.

4. The hyperscalers: scale, breadth, and the lock-in machine

The economic value of AWS, Azure, GCP, and OCI is not $/GPU-hour — it never was. It’s the service ecosystem and integration: managed databases, IAM, networking, identity federation, compliance certifications, edge presence, and decades of API surface area that customers have built around.

This is also their vulnerability for AI workloads. A hyperscaler region must support a workload mix that includes a five-person startup running a Lambda function and a frontier lab running a 100,000-GPU training job. The architectural compromises required to serve both — particularly air-first cooling, lower rack densities, and Ethernet-first fabric — make hyperscaler regions structurally suboptimal for the hardest AI workloads.

That’s why Microsoft is investing in Stargate (built and operated by Crusoe and Oracle), why Oracle is the operating partner on the first 1.2 GW phase with 450,000+ GB200 GPUs, and why Microsoft has disclosed an $80 billion backlog of Azure orders that cannot be fulfilled due to power constraints.

The hyperscalers are not losing — they’re partnering. The capex cycle is too large for any single balance sheet, even theirs.

5. Why power, not capital, is the constraint

If you only remember one thing from this piece, remember this: the AI capex cycle has stopped being limited by money and started being limited by megawatts.

5.1 The IEA’s projection

The International Energy Agency’s Energy and AI report projects global data center electricity consumption will rise from 415 TWh in 2024 to 945 TWh by 2030 — slightly more than Japan’s entire electricity consumption today. In the Lift-Off Case, that figure exceeds 1,700 TWh by 2035, around 4.4% of global electricity demand.

In the U.S., Lawrence Berkeley National Lab’s 2024 report projects data center consumption rising from 176 TWh in 2023 to 325–580 TWh by 2028 — 6.7% to 12% of total U.S. electricity.

By 2030, U.S. data centers are expected to consume more electricity than all energy-intensive manufacturing combined, including aluminum, steel, cement, and chemicals.

5.2 The grid integration problem

Data center load is not like other electrical loads. Training clusters present as steady, multi-hundred-megawatt loads with sharp ramp characteristics during checkpoint events. Inference clusters are diurnal and bursty. Both need 24/7 firm power — solar PV alone can’t underwrite a training run.

The market response:

Microsoft’s Three Mile Island restart deal with Constellation Energy
Amazon’s nuclear PPA expansion at Talen Energy’s Susquehanna plant
Meta’s $20B nuclear PPA framework
Crusoe exploring SMR PPAs
Oracle’s Stargate using on-site natural gas turbines from GE Vernova for fast energization

Behind-the-meter generation — power produced on-site, never touching the public grid — is now a competitive necessity. This is why Crusoe’s flare-gas heritage matters: they had six years of operational experience pairing compute with stranded generation before anyone else needed it.

5.3 Power Stress concentrations

Recent academic work on the AI–energy coupling problem flags Power Stress Index values exceeding 0.25 in Oregon, Virginia, and Ireland, indicating local grid vulnerability. Texas and Japan, with more diversified generation, can absorb new loads more effectively. Concentration of AI infrastructure in a few low-PUE, cheap-power geographies creates regional grid instability that policy is only beginning to confront.

Data Center Watch reports that more than 36 projects representing $162 billion in investment were either blocked or significantly delayed as of June 2025 due to community opposition or interconnect queue limits.

6. The unit economics: does any of this work?

Bain’s analysis, cited by Goldman Sachs, frames the central question bluntly: sustaining the current trajectory requires ~$500B annual spend to generate ~$2T in revenue — a 4× revenue multiple on capital that has not yet been demonstrated at scale.⁴

This is the bull/bear axis:

Bull case: AI workloads are still supply-constrained. Microsoft’s $80B Azure backlog, Alphabet’s three upward capex revisions in 2025, and Oracle’s 57% capital intensity all suggest demand is real and persistent. Inference economics improve fast — Alphabet reduced Gemini serving costs by 78% over 2025 through model optimization.

Bear case: AI capex equates to ~0.8% of U.S. GDP, well below the 1.5%+ peaks of past tech booms — but the late-1990s telecom buildout shows what happens when capex outruns monetization. Hyperscaler equity gains are increasingly leveraged; capex now exceeds internal cash generation.

Goldman’s basket of AI infrastructure stocks returned 44% YTD against just 9% growth in two-year forward EPS estimates — a textbook multiple expansion that requires monetization to catch up.

7. Synthesis: three answers to “where does compute belong?”

These companies are not really competing — they are answering different questions.

Crusoe answers: “How do we build the largest possible single-purpose machine for training frontier models, as fast as physically possible?” The answer is gigawatt campuses co-located with stranded power, single-tenant, single-SKU, vertically integrated from electrons to tokens.

Oxide answers: “How do we give regulated, sovereign, or repatriating organizations cloud ergonomics without renting from a landlord?” The answer is hardware-software co-design, rack-scale integration, open-source firmware, and ownership economics.

The hyperscalers answer: “How do we serve the entire economy with general-purpose compute, while partnering with specialists for the hard frontier workloads?” The answer is breadth, integration, ecosystem lock-in, and increasingly, leasing capacity from the specialists rather than building it themselves.

Both divergences — vertical re-integration (Crusoe) and sovereignty (Oxide) — are reactions to the same underlying truth: the cloud unbundled compute from power, real estate, and ownership in a way that worked beautifully for general-purpose workloads but breaks down at the frontier of AI. Whoever owns the megawatt owns the margin.

The next decade of cloud will not be “winner-take-all.” It will be three industries — frontier-AI factories, sovereign cloud computers, and general-purpose hyperscalers — that look more like rail, telecom, and electric utilities than the software business they grew out of.

The Three Theories of the Cloud

The Three Theories of the Cloud

Crusoe vs. Oxide vs. the Hyperscalers — and the Physics Behind the $700B AI Capex Cycle

1. Why this matters now

2. Crusoe: an energy company that ships GPU-hours

2.1 History

2.2 Trajectory

2.3 The technical differentiation — what “AI from the ground up” actually means

(a) Power density: 5–15 kW → 132–240 kW per rack

(b) Cooling: air → direct liquid → immersion

(c) Network fabric: TCP/IP → NVLink + RDMA

2.4 Economic value: why Crusoe undercuts hyperscalers

3. Oxide: the counter-thesis

3.1 Origin

3.2 Funding & traction

3.3 What’s actually in the box

3.4 Economic value: TCO, not $/hour

4. The hyperscalers: scale, breadth, and the lock-in machine

5. Why power, not capital, is the constraint

5.1 The IEA’s projection

5.2 The grid integration problem

5.3 Power Stress concentrations

6. The unit economics: does any of this work?

7. Synthesis: three answers to “where does compute belong?”

Citations & sources

Further reading

The Three Theories of the Cloud

Crusoe vs. Oxide vs. the Hyperscalers — and the Physics Behind the $700B AI Capex Cycle

1. Why this matters now

2. Crusoe: an energy company that ships GPU-hours

2.1 History

2.2 Trajectory

2.3 The technical differentiation — what “AI from the ground up” actually means

(a) Power density: 5–15 kW → 132–240 kW per rack

(b) Cooling: air → direct liquid → immersion

(c) Network fabric: TCP/IP → NVLink + RDMA

2.4 Economic value: why Crusoe undercuts hyperscalers

3. Oxide: the counter-thesis

3.1 Origin

3.2 Funding & traction

3.3 What’s actually in the box

3.4 Economic value: TCO, not $/hour

4. The hyperscalers: scale, breadth, and the lock-in machine

5. Why power, not capital, is the constraint

5.1 The IEA’s projection

5.2 The grid integration problem

5.3 Power Stress concentrations

6. The unit economics: does any of this work?

7. Synthesis: three answers to “where does compute belong?”

Citations & sources

Further reading

Footnotes