Nvidia · Moat Anatomy

Nvidia's Moat Isn't the Chip. It's 20 Years of Software Nobody Wants to Rewrite.

Everyone credits Nvidia's lead to faster silicon. The real moat is a two-decade software stack that makes switching agony — and the only customers big enough to escape it are the same two who already pay 36% of Nvidia's revenue.

Moat Anatomy · 8 min

Comes with a free Moat Anatomy Canvas template — plus a worked example for Nvidia.

A team that wants to train a large model in 2026 doesn't really shop for a chip. It inherits one. Their code is written against CUDA, their training runs lean on cuDNN, their multi-GPU communication routes through NCCL, their inference is squeezed by TensorRT — a stack Nvidia has been compounding since 2006.5 Swapping the hardware underneath all of that isn't a purchasing decision. It's a rewrite. And that, not the silicon, is what actually protects Nvidia.

The official story is that Nvidia wins because its GPUs are faster. Better benchmarks, better moat. The real story is that Nvidia could lose the benchmark crown next quarter and most of its customers still couldn't leave — because the thing they're locked into was never the chip. It was the twenty years of software wrapped around it.

The moat is the rewrite nobody wants to do

Here is the thesis in one line: Nvidia isn't a chip company protected by a software layer — it's a software platform that happens to sell the hardware it runs best on. CUDA arrived in 2006 and grew into a full stack — cuDNN, NCCL, TensorRT, profilers — woven into the daily workflow of production AI.5 Nvidia's investor materials claim more than four million registered CUDA developers and over forty thousand organizations running CUDA-accelerated applications.5 Treat that registration count with care — it folds in anyone who ever downloaded the toolkit, not just active builders. But the point survives the caveat. The asset isn't the headcount. It's the accumulated dependency: every model trained, every pipeline tuned, every engineer fluent in one set of tools deepens the cost of ever using another.

That cost is why market share holds even where rivals are close. By the most credible independent estimate, Nvidia held about 92% of the data center GPU market by revenue in 2024.6 You'll see 98% quoted — that's a cherry-picked 2023 figure for merchant accelerators only, and it ignores the custom silicon hyperscalers build for themselves. Even the honest 92% is staggering for a market this large, and it isn't held by being 8% faster. It's held because the second-best chip still has to fight the first chip's software.

The benchmark battleThe real battle
What's comparedChip speed, FLOPS, memory bandwidthYears of code written against CUDA
Cost to switchA purchase orderRewriting and re-validating the whole stack
Who can close itAny decent silicon teamOnly software parity over years
Where Nvidia is vulnerableRarely — and it doesn't matterInference, where the stack matters less
Where Nvidia's competitors actually have to win
~92%
of the data center GPU market by revenue in 2024 — a position defended not by faster chips but by the software nobody wants to rewrite6

Why the moat is narrower than the legend

The popular framing — that CUDA is unbreachable — overstates the durability. AMD's ROCm has reached roughly 85% CUDA parity for PyTorch training workloads as of 2025; OpenAI's Triton lets developers write GPU code once across backends; PyTorch 2.0 natively supports non-CUDA targets.7 The bridge over the moat is being built in public. Where the lock-in still bites hardest is large-scale training, where system-level integration — TensorRT-LLM, NCCL collectives, the whole orchestration — creates a stickiness that a single compatible kernel doesn't dissolve.7 But inference is a different story. Once a model is trained, running it doesn't require the same depth of tooling, and that's exactly where the alternatives are getting good enough.

And the financials quietly confirm the moat is being tested, not widening. Nvidia's GAAP gross margin peaked at 78.4% in a single quarter in early FY2025.4 That's the number people repeat. But the full-year FY2025 margin was 75.0%, and FY2026 fell to 71.1% — pressured partly by a $4.5 billion H20 export-control charge.12 A truly unassailable position usually shows up as margins that hold or climb. These are drifting down even as revenue explodes.

The threat isn't AMD. It's the two customers who already pay the bills.

Here's the part the moat narratives skip. The CUDA switching cost is real for roughly 99% of AI teams — the startups and enterprises who will never write their own silicon. It is far weaker for the handful of cloud giants who can.8 Google has its TPU, Amazon has Trainium, Microsoft has Maia, Meta has MTIA — captive chips built for their own inference workloads, chips you can't even rent from outside.8 These buyers don't need CUDA to leave, because they're not switching to another vendor's GPU. They're switching to themselves.

Now line that up against Nvidia's revenue. In FY2026, one direct customer was 22% of total revenue and another was 14% — two customers, 36% of the whole company.2 These are precisely the players with both the volume to justify custom silicon and the incentive to escape Nvidia's pricing. The moat protects Nvidia from everyone except the customers large enough to matter most. That's the structural irony: the deeper its dependence on a few hyperscalers, the more it funds the engineering that lets those hyperscalers walk.

36%
of Nvidia's FY2026 revenue came from just two direct customers — the same class of buyer building captive chips to bypass the GPU entirely2

So is the moat real or not?

The fair objection is that this reads too bearish for a company that just did $215.9 billion in revenue, up 65%, with data center alone at $193.7 billion.1 Demand is ferocious; supply is the binding constraint. And that's true — but near-term scarcity is not a structural moat. Being sold out tells you the market is hot, not that the market can't route around you. The honest counter the other way is that the hyperscaler threat is concentrated in inference, is mostly captive and internal, and leaves the vast training market — where CUDA's system-level lock-in is deepest — still firmly Nvidia's.8 Both are right. The moat is genuine and load-bearing for almost everyone. It just happens to be thinnest exactly where Nvidia's revenue is most concentrated.

A moat is only as wide as your weakest exit

Switching costs look monolithic until you ask 'who can actually pay to leave?' Nvidia's CUDA lock-in is overwhelming for the long tail of teams that will never build silicon — and nearly irrelevant for the two or three buyers big enough to build their own. The danger isn't the average customer's stickiness; it's the marginal customer's escape route. When a small number of buyers represent a large share of revenue AND have a credible path out, the moat protects the wrong end of the income statement. Measure the moat at its weakest point of exit, not its strongest point of entry — because that weakest point is usually where your biggest customer is standing.

What actually protects Nvidia is not the chip and not the benchmark. It's twenty years of software that turns every competitor's better hardware into someone else's rewriting problem. That moat is real, and it is narrowing — slowly at the edges where ROCm and Triton chip away, suddenly at the center where a single hyperscaler decides its own chip is good enough. Nvidia's genius was making the world program against its platform. Its exposure is that it taught the only customers who could afford to leave exactly how much leaving is worth.

Take it further — The Moat Anatomy
Canvas

Moat Anatomy Canvas

A one-page canvas that dissects a moat instead of asserting it: where the advantage comes from, how much of the market it covers, how long it would take to copy, and what keeps it from eroding. Blank to dissect your own claimed edge; filled as the worked example tracing the structure of the story's defensible advantage. Use it to tell a real moat from a head start.

Preview the blank →

The worked example unlocks with a subscription. See plans →

Sources

Where this comes from — the filings, records, and reporting behind it.

  1. 1
    Primary · Company recordDocumented
    Nvidia FY2026 total revenue was $215.9 billion (up 65% YoY); Data Center segment revenue was $193.7 billion (up 68% YoY); FY2026 GAAP gross margin was 71.1%, down from 75.0% in FY2025.
  2. 2
    Primary · SEC filingDocumented
    Nvidia FY2026 10-K: One direct customer represented 22% of total revenue and another represented 14% of total revenue in FY2026, both attributable to the Compute & Networking segment. FY2026 GAAP gross margins decreased to 71.1% from 75.0% in FY2025, partly due to a $4.5 billion H20 export-control charge.
  3. 3
    Primary · SEC filingDocumented
    Nvidia FY2025 10-K: Full-year GAAP gross margins were 75.0% in FY2025, up from 72.7% in FY2024. Data Center revenue was $115 billion in FY2025, representing approximately 88% of total revenue.
  4. 4
    Primary · SEC filingDocumented
    Nvidia Q1 FY2025 (quarter ending April 2024): GAAP gross margin peaked at 78.4%, up from 64.6% in Q1 FY2024, driven by 427% YoY Data Center revenue growth. This is the source of the widely-cited '78% gross margin' figure, which is a single-quarter peak, not the annual rate.
  5. 5
    SecondaryAttributed to source
    Nvidia's investor materials report more than 4 million developers have registered for CUDA and over 40,000 organizations use CUDA-accelerated applications. CUDA was introduced in 2006 and has evolved into a platform including cuDNN, NCCL, TensorRT, and profiling tools embedded deeply in production AI workflows.
  6. 6
    SecondaryWidely reported
    Nvidia held approximately 92% of the data center GPU market by revenue in 2024, with annual data center GPU revenue just over $115 billion, up 142% YoY. This is the most credible independent market-share figure available; the sometimes-cited 98% figure is a 2023 merchant-GPU-only estimate.
  7. 7
    SecondaryWidely reported
    ROCm 7 is within 10–30% of CUDA parity for most workloads; AMD's ROCm has reached approximately 85% CUDA parity for PyTorch training workloads as of 2025. OpenAI Triton enables write-once GPU programming and PyTorch 2.0 natively supports non-CUDA backends — meaning the CUDA software moat is narrowing but system-level integration (cuDNN, TensorRT-LLM, NCCL) still creates deep stickiness for large-scale training.
  8. 8
    SecondaryWidely reported
    The hyperscaler custom silicon threat (Google TPU, AWS Trainium, Microsoft Maia, Meta MTIA) is concentrated in internal inference workloads, not external developer markets. Hyperscaler custom ASICs are captive (Maia and MTIA are not externally rentable); the CUDA switching-cost analysis therefore applies asymmetrically — it holds for ~99% of non-hyperscaler teams but not for the handful of cloud giants that represent ~36% of Nvidia's FY2026 revenue.