Can CUDA code run on AMD GPUs?

Largely yes. AMD's HIP translates most CUDA C++ source with minimal changes. The claim that CUDA 'cannot be ported' is a myth. The real barrier isn't language portability - it's the absence of equivalent hand-tuned libraries and production-hardened performance. AMD's MI300X has 1.5x the theoretical compute of an H100 but reaches only 37-66% of its real LLM-inference performance.

Why is Nvidia's CUDA considered a moat?

Because the value isn't the programming model - it's the compounding around it. Nvidia introduced CUDA in November 2006 and shipped the SDK in February 2007, and over 18 years built roughly 250 widely-used libraries and an ecosystem of more than 5.9 million developers. Most frontier AI kernels are written for CUDA first and ported elsewhere later, if ever, which makes Nvidia the path of least resistance.

Is the CUDA moat eroding?

At the margin, yes. AMD's MI355X posted MLPerf inference results within single-digit percentage points of Nvidia's B200 in April 2026, and inference is more replaceable than training. But Nvidia still holds roughly 80-90% of the AI accelerator market and over 90% of training, where the dependence on CUDA-first libraries is near-absolute.

Nvidia · Moat Anatomy

Nvidia's Moat Isn't the Code. It's the Eighteen Years You'd Have to Re-Live to Catch Up.

Everyone says you can't port CUDA off Nvidia. You can - AMD's HIP translates most of it. The real wall is that 5.9 million developers and ~250 hand-tuned libraries make Nvidia the path of least resistance even when rival silicon has 1.5x the raw compute.

Moat Anatomy · 8 min

Comes with a free Moat Anatomy Canvas template.

An engineer at a well-funded AI lab wants to run her model on AMD's silicon. The hardware is right there: AMD's MI300X carries 1.5 times the theoretical compute of Nvidia's H100, and her finance team would love to stop paying Nvidia's margin.⁷ She runs the workload. It hits maybe half the H100's real throughput - 37 to 66 percent, depending on the day.⁷ The chip isn't slow. The chip is starved. The fast version of the kernel she needs - the attention math, the quantization tricks - was written for CUDA first, and the AMD port arrived months later, or never.⁷ She switches back to Nvidia by lunch. Nobody made her. The path was just easier.

The official story is that Nvidia's lock-in is the code: that CUDA is some proprietary language you can't escape. Almost none of that is true. AMD's HIP translates most CUDA source with minimal changes.⁸ You can leave the language any afternoon. What you cannot leave is the eighteen years of work that happened on the other side of the language - and that is the real wall.

The moat isn't the language. It's everything written in it.

Nvidia introduced CUDA in November 2006 and shipped the public SDK on February 15, 2007 - a platform built by a team Nvidia had hired starting in 2004, including Ian Buck, who had created the Brook GPU language at Stanford.¹⁵ None of that founding history is the asset. The asset is what got piled on top of it for the next eighteen years: roughly 250 widely-used CUDA libraries, hand-tuned by people who knew exactly which memory access pattern the silicon rewards.³ When a new technique appears - a faster way to do attention, a smarter way to page memory - it ships as a CUDA kernel first, because that is where the developers are. Every one of those kernels is a small switching cost. Stack up a decade of them and you have a wall nobody planned and nobody can climb in an afternoon. The thesis is simple: Nvidia isn't defended by what CUDA is. It's defended by everything 5.9 million developers have already written in it.²

5.9M

developers worldwide on CUDA and Nvidia's tools as of its FY2025 filing - the compounding asset, not the chip²

Look at how that number got there and you see why it's a moat rather than a marketing figure. Nvidia's own developer blog said it took thirteen years to reach a million registered developers - and less than two more years to reach two million.⁴ That curve is the tell. Early on, a developer platform grows on its own merits, slowly. Past a certain density it grows because it's already big: tutorials exist, answers exist, the libraries you need are already written, the person next to you already knows it. Nvidia crossed that inflection a decade ago. Its FY2025 filing now puts the count past 5.9 million.² Each new developer makes the next one's choice more obvious, and the choice is rarely the rival chip.

	The myth	The mechanism
The barrier	CUDA can't be ported to AMD	HIP ports most of it; the libraries don't come with it
What's scarce	The programming model	Hand-tuned kernels and production-hardened performance
Why rivals lose	Inferior hardware	1.5x the theoretical compute, half the real throughput
The compounding asset	A clever language	5.9M developers and ~250 libraries built over 18 years

What people think the moat is vs. what it actually is

Where the wall is concrete, and where it's already cracking

Here is the part the bulls and the bears both get wrong, because they treat 'the CUDA moat' as one thing. It is two. In training - building the model from scratch, the most compute-hungry and least forgiving workload - Nvidia's share runs above 90 percent, and the dependence is near-total, because the frontier libraries that make training tractable are CUDA-first by default.⁶ If your stack leans on TensorRT-LLM or FlashAttention 3, CUDA is simply the only viable option.⁸ But inference - running a finished model in production - is a different game. It's a more standardized, more repeatable workload, and that is exactly where a challenger can get a real foothold. AMD's MI355X posted MLPerf Inference 6.0 results in April 2026 within single-digit percentage points of Nvidia's B200 - tying on Llama 2 70B Offline throughput and reaching 97% of B200 Server performance on the same benchmark.¹⁰⁸ The moat is near-absolute at the hard end and leaking at the easy end. Treating it as uniform is how you misprice it in both directions.

“Software is the 'moat' keeping users from switching.”³

Communications of the ACMOn Nvidia's claimed installed base of 500M+ GPUs and ~250 widely-used CUDA libraries, late 2023

Isn't a software lead the easiest kind to lose?

The fair objection is that software moats are the flimsiest of all. There's no patent on knowing CUDA. Open-source ports exist. AMD's silicon is genuinely competitive now, sometimes ahead on raw numbers - so why won't this lead evaporate the way software leads usually do? Two reasons. The first is that the rival doesn't get to copy the destination, only the language - the libraries arrive 'months later, sometimes never,' which means a challenger is permanently running last year's kernels against this year's.⁷ The second is the honest one the bulls skip: the moat is not eternal, and it is thinnest exactly where the market is growing fastest. Inference volume dwarfs training over a model's life - the aggregated compute cost of inference often greatly exceeds training because the same model serves an enormous number of requests⁹ - and inference is where the gap has narrowed to single digits.⁸ Note, too, that AMD was never technically irrelevant - it was strategically absent. Its earlier failure wasn't a worse chip; it was the decision not to fund a comparable developer platform while Nvidia spent eighteen years funding one. That is the asymmetry. You can match the silicon in a product cycle. You cannot match eighteen years of someone else's developers in one.

Build the moat where the copy can't follow

Nvidia's defensible asset was never the thing rivals could clone - the language, the chip, the spec sheet. It was the accumulated work of other people, done on Nvidia's platform, that no competitor can retroactively acquire. The lesson for any platform: don't moat on the artifact you ship, moat on the compounding behavior you induce in others. Two cautions, though. First, segment the moat honestly - it is near-absolute where the work is hard to redo (training) and porous where the work is standardized (inference), and a lead that's uniform on the slide is rarely uniform in reality. Second, the same density that protects you attracts the regulators and challengers who can see the rent - so the only durable defense is to keep being the place the next breakthrough ships first, not merely the place the last one shipped.

Nvidia's competitors keep winning the argument they think matters - the benchmark, the teraflops, the price per chip - and keep losing the one that actually decides the purchase: which platform the next fast kernel will appear on first. That race was settled eighteen years ago and it compounds daily. The chip is the part you can buy. The eighteen years are the part you'd have to re-live. Nvidia didn't build a wall. It built a head start so old that catching up means starting in 2007 - and the developers, all 5.9 million of them, have no reason to wait for you.

Moats that aren't where you'd look for them

Visa's toll road

It lends nothing, risks nothing, and owns the wire.

Read →

The Bloomberg lock-in

A network nobody on a trading desk can afford to leave.

Read →

Amazon's flywheel

How a loop becomes an un-attackable moat.

Read →

Take it further — The Moat Anatomy

Canvas

Moat Anatomy Canvas

A one-page canvas that dissects a moat instead of asserting it: where the advantage comes from, how much of the market it covers, how long it would take to copy, and what keeps it from eroding. Blank to dissect your own claimed edge; filled as the worked example tracing the structure of the story's defensible advantage. Use it to tell a real moat from a head start.

Preview the blank →

The worked example unlocks with a subscription. See plans →

Sources

Where this comes from — the filings, records, and reporting behind it.

1
Primary · Company recordDocumented
In November 2006, NVIDIA introduced CUDA as a general-purpose parallel computing platform and programming model; the initial CUDA SDK was made publicly available on February 15, 2007.
NVIDIA Corporation, CUDA C++ Programming Guide (Legacy) ↗ · 2024
2
Primary · SEC filingDocumented
As of Nvidia's FY2025 annual filing (period ending January 26, 2025), there are over 5.9 million developers worldwide using CUDA and Nvidia's other software tools.
U.S. Securities and Exchange Commission / NVIDIA Corporation, NVIDIA Corporation Form 10-K, FY2025 (nvda-20250126) ↗ · 2025-01-26
3
SecondaryWidely reported
As of late 2023, Nvidia claimed an installed base of more than 500 million GPUs with thousands of CUDA-based applications; software is the 'moat' keeping users from switching, with approximately 250 CUDA libraries widely used by GPU programmers.
Communications of the ACM, Nvidia at the Center of the Generative AI Ecosystem—For Now ↗ · 2024-01-11
4
Primary · Company recordAttributed to source
By November 2021, Nvidia's developer blog reported more than 1 billion CUDA-capable GPUs in the world and 2 million registered NVIDIA developers; it took 13 years to reach 1 million registered developers and less than two more years to reach 2 million.
NVIDIA Corporation, 2 Million Registered Developers, Countless Breakthroughs ↗ · 2021-11-18
5
SecondaryWidely reported
CUDA was created by Nvidia starting in 2004 when Nvidia hired Ian Buck (who had built the Brook GPU language at Stanford) and paired him with John Nickolls to develop what became CUDA; it was officially released in 2007.
Wikipedia, CUDA ↗ · 2025
6
SecondaryWidely reported
Nvidia holds approximately 80–90% of the AI accelerator market by revenue as of 2024–2025; in AI training specifically, share exceeds 90%; data center segment generated over $100 billion in FY2025.
Silicon Analysts, NVIDIA AI GPU Market Share 2026: ~80% of AI Accelerators ↗ · 2026-02-21
7
SecondaryWidely reported
AMD's MI300X has 1.5× higher theoretical compute than the H100 but achieves only 37–66% of H100/H200 performance in LLM inference, demonstrating that the gap is software maturity, not hardware; Flash Attention 2, PagedAttention, and quantization kernels are written for CUDA first and ported to ROCm second—sometimes months later, sometimes never.
Thunder Compute, ROCm vs CUDA: GPU Computing Comparison (June 2026) ↗ · 2026-06
8
SecondaryWidely reported
AMD's MI355X posted record MLPerf Inference 6.0 results in April 2026, within single-digit percentage points of B200 on server inference workloads, while ROCm's HIP translates most CUDA code with minimal changes; however, if a stack relies on TensorRT-LLM or FlashAttention 3, CUDA remains the only viable option.
Spheron Blog, ROCm vs CUDA: AMD vs NVIDIA AI Stack Compared (2026) ↗ · 2026-04-08
9
Primary · AcademicDocumented
The aggregated cost of inference over the lifetime of a model often greatly exceeds the cost of training, because the same model is used to perform a large number of inferences.
Epoch AI, Trading off compute in training and inference ↗ · 2023-07-28
10
Primary · Company recordDocumented
AMD's MI355X posted MLPerf Inference 6.0 results in April 2026; on Llama 2 70B against B200, the MI355X tied in Offline and delivered 97% of Server performance (a 3% gap); on GPT-OSS-120B it delivered 111% of B200 Offline and 115% of B200 Server single-node performance.
AMD Corporation, AMD Delivers Breakthrough MLPerf Inference 6.0 Results ↗ · 2026-04-01

Keep going

The Decade of Looking Wrong: How Nvidia's CUDA Bet Survived Wall Street's Contempt →The H100's Price Has Almost Nothing to Do With the Chip →Nvidia Didn't Get Caught in the Chip War. It Spent Three Years Designing Its Way Around It. →

Nvidia's Moat Isn't the Code. It's the Eighteen Years You'd Have to Re-Live to Catch Up.

The moat isn't the language. It's everything written in it.

Where the wall is concrete, and where it's already cracking

Isn't a software lead the easiest kind to lose?

Moats that aren't where you'd look for them

Moat Anatomy Canvas

Sources

More from Nvidia

Moat Anatomy

Explore

Start here