Nvidia · Moat Anatomy

Nvidia's Moat Isn't the Code. It's the Eighteen Years You'd Have to Re-Live to Catch Up.

Everyone says you can't port CUDA off Nvidia. You can - AMD's HIP translates most of it. The real wall is that 5.9 million developers and ~250 hand-tuned libraries make Nvidia the path of least resistance even when rival silicon has 1.5x the raw compute.

Moat Anatomy · 8 min

Comes with a free Moat Anatomy Canvas template.

An engineer at a well-funded AI lab wants to run her model on AMD's silicon. The hardware is right there: AMD's MI300X carries 1.5 times the theoretical compute of Nvidia's H100, and her finance team would love to stop paying Nvidia's margin.7 She runs the workload. It hits maybe half the H100's real throughput - 37 to 66 percent, depending on the day.7 The chip isn't slow. The chip is starved. The fast version of the kernel she needs - the attention math, the quantization tricks - was written for CUDA first, and the AMD port arrived months later, or never.7 She switches back to Nvidia by lunch. Nobody made her. The path was just easier.

The official story is that Nvidia's lock-in is the code: that CUDA is some proprietary language you can't escape. Almost none of that is true. AMD's HIP translates most CUDA source with minimal changes.8 You can leave the language any afternoon. What you cannot leave is the eighteen years of work that happened on the other side of the language - and that is the real wall.

The moat isn't the language. It's everything written in it.

Nvidia introduced CUDA in November 2006 and shipped the public SDK on February 15, 2007 - a platform built by a team Nvidia had hired starting in 2004, including Ian Buck, who had created the Brook GPU language at Stanford.15 None of that founding history is the asset. The asset is what got piled on top of it for the next eighteen years: roughly 250 widely-used CUDA libraries, hand-tuned by people who knew exactly which memory access pattern the silicon rewards.3 When a new technique appears - a faster way to do attention, a smarter way to page memory - it ships as a CUDA kernel first, because that is where the developers are. Every one of those kernels is a small switching cost. Stack up a decade of them and you have a wall nobody planned and nobody can climb in an afternoon. The thesis is simple: Nvidia isn't defended by what CUDA is. It's defended by everything 5.9 million developers have already written in it.2

5.9M
developers worldwide on CUDA and Nvidia's tools as of its FY2025 filing - the compounding asset, not the chip2

Look at how that number got there and you see why it's a moat rather than a marketing figure. Nvidia's own developer blog said it took thirteen years to reach a million registered developers - and less than two more years to reach two million.4 That curve is the tell. Early on, a developer platform grows on its own merits, slowly. Past a certain density it grows because it's already big: tutorials exist, answers exist, the libraries you need are already written, the person next to you already knows it. Nvidia crossed that inflection a decade ago. Its FY2025 filing now puts the count past 5.9 million.2 Each new developer makes the next one's choice more obvious, and the choice is rarely the rival chip.

The mythThe mechanism
The barrierCUDA can't be ported to AMDHIP ports most of it; the libraries don't come with it
What's scarceThe programming modelHand-tuned kernels and production-hardened performance
Why rivals loseInferior hardware1.5x the theoretical compute, half the real throughput
The compounding assetA clever language5.9M developers and ~250 libraries built over 18 years
What people think the moat is vs. what it actually is

Where the wall is concrete, and where it's already cracking

Here is the part the bulls and the bears both get wrong, because they treat 'the CUDA moat' as one thing. It is two. In training - building the model from scratch, the most compute-hungry and least forgiving workload - Nvidia's share runs above 90 percent, and the dependence is near-total, because the frontier libraries that make training tractable are CUDA-first by default.6 If your stack leans on TensorRT-LLM or FlashAttention 3, CUDA is simply the only viable option.8 But inference - running a finished model in production - is a different game. It's a more standardized, more repeatable workload, and that is exactly where a challenger can get a real foothold. AMD's MI355X posted MLPerf Inference 6.0 results in April 2026 within single-digit percentage points of Nvidia's B200 - tying on Llama 2 70B Offline throughput and reaching 97% of B200 Server performance on the same benchmark.108 The moat is near-absolute at the hard end and leaking at the easy end. Treating it as uniform is how you misprice it in both directions.

Software is the 'moat' keeping users from switching.3
Communications of the ACMOn Nvidia's claimed installed base of 500M+ GPUs and ~250 widely-used CUDA libraries, late 2023

Isn't a software lead the easiest kind to lose?

The fair objection is that software moats are the flimsiest of all. There's no patent on knowing CUDA. Open-source ports exist. AMD's silicon is genuinely competitive now, sometimes ahead on raw numbers - so why won't this lead evaporate the way software leads usually do? Two reasons. The first is that the rival doesn't get to copy the destination, only the language - the libraries arrive 'months later, sometimes never,' which means a challenger is permanently running last year's kernels against this year's.7 The second is the honest one the bulls skip: the moat is not eternal, and it is thinnest exactly where the market is growing fastest. Inference volume dwarfs training over a model's life - the aggregated compute cost of inference often greatly exceeds training because the same model serves an enormous number of requests9 - and inference is where the gap has narrowed to single digits.8 Note, too, that AMD was never technically irrelevant - it was strategically absent. Its earlier failure wasn't a worse chip; it was the decision not to fund a comparable developer platform while Nvidia spent eighteen years funding one. That is the asymmetry. You can match the silicon in a product cycle. You cannot match eighteen years of someone else's developers in one.

Build the moat where the copy can't follow

Nvidia's defensible asset was never the thing rivals could clone - the language, the chip, the spec sheet. It was the accumulated work of other people, done on Nvidia's platform, that no competitor can retroactively acquire. The lesson for any platform: don't moat on the artifact you ship, moat on the compounding behavior you induce in others. Two cautions, though. First, segment the moat honestly - it is near-absolute where the work is hard to redo (training) and porous where the work is standardized (inference), and a lead that's uniform on the slide is rarely uniform in reality. Second, the same density that protects you attracts the regulators and challengers who can see the rent - so the only durable defense is to keep being the place the next breakthrough ships first, not merely the place the last one shipped.

Nvidia's competitors keep winning the argument they think matters - the benchmark, the teraflops, the price per chip - and keep losing the one that actually decides the purchase: which platform the next fast kernel will appear on first. That race was settled eighteen years ago and it compounds daily. The chip is the part you can buy. The eighteen years are the part you'd have to re-live. Nvidia didn't build a wall. It built a head start so old that catching up means starting in 2007 - and the developers, all 5.9 million of them, have no reason to wait for you.

Take it further — The Moat Anatomy
Canvas

Moat Anatomy Canvas

A one-page canvas that dissects a moat instead of asserting it: where the advantage comes from, how much of the market it covers, how long it would take to copy, and what keeps it from eroding. Blank to dissect your own claimed edge; filled as the worked example tracing the structure of the story's defensible advantage. Use it to tell a real moat from a head start.

Preview the blank →

The worked example unlocks with a subscription. See plans →

Sources

Where this comes from — the filings, records, and reporting behind it.

  1. 1
    Primary · Company recordDocumented
    In November 2006, NVIDIA introduced CUDA as a general-purpose parallel computing platform and programming model; the initial CUDA SDK was made publicly available on February 15, 2007.
  2. 2
    Primary · SEC filingDocumented
    As of Nvidia's FY2025 annual filing (period ending January 26, 2025), there are over 5.9 million developers worldwide using CUDA and Nvidia's other software tools.
  3. 3
    SecondaryWidely reported
    As of late 2023, Nvidia claimed an installed base of more than 500 million GPUs with thousands of CUDA-based applications; software is the 'moat' keeping users from switching, with approximately 250 CUDA libraries widely used by GPU programmers.
  4. 4
    Primary · Company recordAttributed to source
    By November 2021, Nvidia's developer blog reported more than 1 billion CUDA-capable GPUs in the world and 2 million registered NVIDIA developers; it took 13 years to reach 1 million registered developers and less than two more years to reach 2 million.
  5. 5
    SecondaryWidely reported
    CUDA was created by Nvidia starting in 2004 when Nvidia hired Ian Buck (who had built the Brook GPU language at Stanford) and paired him with John Nickolls to develop what became CUDA; it was officially released in 2007.
  6. 6
    SecondaryWidely reported
    Nvidia holds approximately 80–90% of the AI accelerator market by revenue as of 2024–2025; in AI training specifically, share exceeds 90%; data center segment generated over $100 billion in FY2025.
  7. 7
    SecondaryWidely reported
    AMD's MI300X has 1.5× higher theoretical compute than the H100 but achieves only 37–66% of H100/H200 performance in LLM inference, demonstrating that the gap is software maturity, not hardware; Flash Attention 2, PagedAttention, and quantization kernels are written for CUDA first and ported to ROCm second—sometimes months later, sometimes never.
  8. 8
    SecondaryWidely reported
    AMD's MI355X posted record MLPerf Inference 6.0 results in April 2026, within single-digit percentage points of B200 on server inference workloads, while ROCm's HIP translates most CUDA code with minimal changes; however, if a stack relies on TensorRT-LLM or FlashAttention 3, CUDA remains the only viable option.
  9. 9
    Primary · AcademicDocumented
    The aggregated cost of inference over the lifetime of a model often greatly exceeds the cost of training, because the same model is used to perform a large number of inferences.
  10. 10
    Primary · Company recordDocumented
    AMD's MI355X posted MLPerf Inference 6.0 results in April 2026; on Llama 2 70B against B200, the MI355X tied in Offline and delivered 97% of Server performance (a 3% gap); on GPT-OSS-120B it delivered 111% of B200 Offline and 115% of B200 Server single-node performance.