The H100's Price Has Almost Nothing to Do With the Chip
An H100 costs an estimated $3,320 to build and sold for as much as $40,000. The reflex is to call that a hardware monopoly. It isn't. The premium is rent on seventeen years of software nobody wants to rewrite.
Comes with a free Pricing Power Diagnostic template — plus a worked example for Nvidia.
Take an H100 apart and the bill of materials is almost disappointing. Roughly $300 for the logic die. About $1,350 for the stacks of high-bandwidth memory. Around $750 to glue it all together with advanced packaging, and another $920 to test and assemble. Add it up and one of the most fought-over objects on earth costs an estimated $3,320 to build.5 That same card moved through the market at $25,000 to $40,000 depending on form factor and the month you asked.6 The gap between those two numbers is the whole story — and almost nobody is paying for the thing in the gap.
The official story is that Nvidia has a hardware monopoly: it makes the fastest AI chip, scarcity does the rest, and the price is just supply meeting demand. That story is half true and badly aimed. The H100 is fast — 80 billion transistors on a custom 4nm-class process, nearly 1,000 teraflops, memory bandwidth measured in thousands of gigabytes a second.10 But specs are catchable. The thing that isn't catchable is the seventeen years of software the buyer is quietly renting along with the silicon.
The chip is the cheap part of the chip
Here is the thesis a smart friend can repeat at dinner: Nvidia doesn't sell a GPU at a hardware price. It sells access to CUDA, and prices the GPU as the only door into it. CUDA is the programming layer — the libraries, the compilers, the cuDNN and cuBLAS routines that the dominant machine-learning framework, PyTorch, was built to lean on. Around 4 million developers already write against it, by Nvidia's own count.9 When a lab buys an H100, the chip is not the product. The product is that everything they have already built runs the day it arrives. That's worth far more than $3,320, and Nvidia knows exactly how much more.
| The silicon | The software | |
|---|---|---|
| Roughly what it costs Nvidia | ~$3,320 to build a card | Built once, copied at near-zero cost |
| What a rival can match | Specs, transistor count, bandwidth | Years of ecosystem, 4M developers |
| Switching cost to the buyer | Buy a different card | 6–12 months of re-engineering, tens of millions |
| Where the price premium really sits | A few thousand dollars | Most of the $25,000–$40,000 |
This is why the popular '90% margin' line is a category error worth correcting. The ~88% figure that circulates is a per-chip manufacturing-cost-to-street-price ratio — an analyst model, not a disclosed number.5 Nvidia's actual company-wide GAAP gross margin for FY2024 was 72.7%: $44.3 billion of gross profit on $60.9 billion of revenue, filed with the SEC.1 Its data-center segment runs an estimated 74–75%.8 Still extraordinary for a hardware company — but it is a software-economics margin wearing a hardware company's clothes. The chip carries the price the code commands.
Scarcity set the spike. Software set the floor.
It's tempting to credit the eye-watering 2023 prices to a plain chip shortage, and that's the second thing worth getting right. The bottleneck wasn't raw wafers. It was CoWoS — the advanced packaging step that stacks high-bandwidth memory onto the die — and that capacity was the explicit gating factor on H100 availability through 2024.10 Scarcity at the packaging stage is what drove rental rates past $7 to $10 per GPU-hour at the 2023 peak.7 But scarcity is the part that decays. By late 2025 those same cards rented for $2 to $4 an hour, and secondhand SXM5 boards that fetched $40,000 in late 2023 now move for as little as $6,000.67 If the price were really just a hardware monopoly, that collapse would have been the end. It wasn't.
Because while the spot price fell, the segment margin didn't. Nvidia's data-center revenue went from a record $14.51 billion in the quarter ending October 20232 to a record $62.3 billion in the quarter ending January 20264 — and margins stayed in the mid-70s. Scarcity is a candle; it burns down. The CUDA moat is the floor under the wax. When supply loosened, the premium didn't vanish, because the reason to pay it was never the shortage. It was the cost of leaving.
A rival chip can be cheaper per teraflop and still lose, because the buyer's real expense is re-engineering. Porting an AI stack to AMD's ROCm can take 6–12 months, and hyperscalers peg the migration at tens of millions of dollars.9 As long as that number stays larger than the price gap between an H100 and the alternative, Nvidia keeps the premium — and it gets to set the gap.
But isn't the moat already cracking?
The honest objection is that this can't last, and the evidence for it is real. AMD's ROCm reached roughly 85% CUDA parity for training workloads by 2025, PyTorch added native ROCm support, and the hyperscalers stopped waiting — Google, Amazon, Microsoft, and Meta have all built custom silicon for large slices of their own internal work, routing around Nvidia where the volume justifies the engineering.89 Anyone who calls the CUDA moat 'insurmountable' is overselling it. It is contested, and it is eroding at the edges.8
But notice where the erosion happens: at the margins, among the handful of players large enough to amortize a multi-year, multi-million-dollar port across millions of their own chips. For everyone else — the startup, the lab, the enterprise that needs its model training next quarter — '85% parity' is not 'good enough.' It is the 15% that breaks at 2 a.m. on a deadline, plus the 6–12 months you don't have. The moat doesn't need to be insurmountable. It only needs to be more expensive to cross than the premium Nvidia charges to stay. So far, for almost everyone, it is.
When you can, build the position where the customer's real expense is the cost of leaving — and then price the thing they buy as the only door into it. Nvidia gives the software away to win developers, then charges for the silicon that runs it. The lesson isn't 'make the fastest chip'; competitors can do that. It's that ecosystems compound and specs don't. Two cautions: a switching-cost moat decays whenever a rival makes leaving cheaper (better tooling, automated porting, a framework that abstracts you away), so you defend it by lowering your own friction faster than they lower theirs. And a premium that looks like pure lock-in invites the biggest customers to fund their own way out — which is exactly what the hyperscalers are doing.
So the next time someone marvels that a $3,320 chip sells for $40,000, the marvel is aimed at the wrong object. Nvidia isn't charging twelve times cost for sand and solder. It's charging for the seventeen years of code that turns the sand into something a buyer can use the morning it arrives — and for the inconvenient truth that the cheapest H100 is usually the one you don't have to rewrite your company around. The silicon is what ships. The lock-in is what sells. And the price was never really about the chip.
Pricing Power Diagnostic
A scored diagnostic of pricing power: brand pull, switching costs, substitutes, and how critical the product is to the buyer. Each dimension rated 1-5 so you can see, at a glance, whether a price rise sticks or sends customers running. Blank to grade your own offer; filled as the worked example scoring a story's business on its real ability to charge more.
The worked example unlocks with a subscription. See plans →
Sources
Where this comes from — the filings, records, and reporting behind it.
- 1Nvidia FY2024 (year ended Jan 28, 2024) total revenue was $60.922 billion, cost of revenue $16.621 billion, and gross profit $44.301 billion — implying a GAAP gross margin of approximately 72.7%.
- 2Nvidia FY2024 Q3 earnings release (quarter ended Oct 29, 2023) reported record Data Center revenue of $14.51 billion, up 279% year-over-year, and guided Q4 non-GAAP gross margin of ~75.5%.
- 3Nvidia's FY2023 10-K (year ended Jan 29, 2023) confirms the H100 started shipping in fiscal year 2023 and introduced the Hopper architecture with a Transformer Engine designed to accelerate AI transformer model training.
- 4Nvidia Q4 FY2026 (quarter ended Jan 25, 2026) total GAAP revenue was $68.1 billion, up 73% year-over-year, and Data Center revenue reached a record $62.3 billion, up 75% year-over-year.
- 5Estimated manufacturing cost of an NVIDIA H100 SXM5 is approximately $3,320: ~$300 for the TSMC 4N logic die (814mm²), ~$1,350 for HBM3 memory, ~$750 for CoWoS-S packaging, and ~$920 for test and assembly — implying a per-unit gross margin near 88% at a $28,000 street price. Note: this is an analyst-model estimate, not a disclosed Nvidia figure.
- 6H100 80GB GPU purchase prices ranged from ~$25,000 (PCIe) to ~$40,000 (SXM5) as of Q1 2026; secondary market SXM5 cards that sold for $40,000 in late 2023 now move for $6,000–$22,000; Nvidia does not publish a fixed retail price, and street prices move through OEM/channel partners.
- 7Early H100 rental prices exceeded $7–$10/GPU-hour in 2023, reflecting extreme scarcity and AI demand; by late 2025 the same H100 GPUs were available for $2–$4/hour across non-hyperscale marketplace providers, with hyperscaler on-demand rates in low single digits.
- 8Nvidia's data center gross margin is approximately 74–75%, described as extraordinary for hardware, and defended primarily by the CUDA software lock-in premium; AMD's ROCm had achieved approximately 85% CUDA parity for training workloads as of 2025, making the moat contested but still material.
- 9CUDA has approximately 4 million active developers as of 2026 (Nvidia's own estimate); PyTorch, the dominant ML framework, was developed with heavy CUDA optimization and has deep integration with Nvidia's cuDNN and cuBLAS libraries; porting an AI stack to AMD ROCm can require 6–12 months of engineering and hyperscalers estimate the migration cost at tens of millions of dollars.
- 10The H100 GH100 die contains 80 billion transistors on TSMC's 4N (4nm-class) process, delivering 989 TFLOPS FP16 and 3,350 GB/s HBM3 memory bandwidth in SXM5 form; the CoWoS advanced packaging bottleneck — not raw wafer supply — was the primary constraint on H100 availability through 2024.