Pairs with the Moat Anatomy Canvas — a ready-to-use strategy tool. Included with a subscription, or $1.99.
In 2017, when AMD showed up to the server market with a processor stitched together from four separate dies, Intel didn't analyze it. Intel laughed at it - dismissing the design as desktop chips 'glued together.'7 It was a good line. It was also the most expensive joke in modern semiconductor history. Because while Intel kept etching ever-larger single slabs of silicon, AMD was quietly proving that the future of the high-end CPU was not one big chip at all - it was several small ones, talking to each other fast enough that nobody could tell the difference.
The story everyone tells is that AMD invented the chiplet and beat Intel with a clever new layout. Almost every word of that is wrong. AMD did not invent multi-chip packaging - its own first-gen EPYC Naples in 2017 was already four dies on one substrate.4 And the layout was not the moat. The moat was the four years it took anyone to copy the version that actually mattered.
The refinement that turned a package trick into a weapon
Naples was a multi-chip module: four identical 14 nm dies, each a complete little computer with its own cores, I/O, and memory channels, lashed together on a single package.4 It worked, but it was not the disruptive idea. Every die paid the full tax of carrying I/O it might not need, and four copies of the same I/O blocks meant four times the area spent on the least scalable part of a chip. The real breakthrough came two years later, with EPYC Rome in 2019. AMD did something almost embarrassingly simple in hindsight: it stopped duplicating the I/O. Instead of four self-contained dies, Rome split the design into up to eight tiny 7 nm compute dies from TSMC - eight Zen 2 cores each - feeding into one centralized 14 nm I/O die from GlobalFoundries that owned the memory channels, the PCIe lanes, and the interconnect for the whole socket.2
This is the part that reads like a footnote and was actually the whole game. The compute logic - the part that benefits most from the newest, most expensive process node - went on cutting-edge 7 nm. The I/O - the memory controllers and PCIe, the part that barely shrinks and gets riskier on bleeding-edge nodes - stayed on cheap, mature, well-yielded 14 nm.2 AMD was no longer buying one node for the whole chip. It was buying exactly the node each function deserved. As AMD puts it, decoupling core development from I/O development let it shrink the compute die and tune variants for performance or efficiency independently.5
| EPYC Naples (2017) | EPYC Rome (2019) | |
|---|---|---|
| Die topology | 4 identical full-SoC dies | Up to 8 compute dies + 1 central I/O die |
| Process node | All 14 nm | 7 nm compute, 14 nm I/O |
| I/O placement | Duplicated on every die | Centralized on one die |
| Max cores per socket | 32 | 64 |
The result was a single processor made of nine separate pieces of silicon - eight compute dies and one I/O die - carrying nearly 40 billion transistors across just over a thousand square millimeters of total die area.8 Try to build that as one monolithic chip on 7 nm and the yield economics collapse: a single defect anywhere kills the entire enormous die. Build it as nine small dies and a defect kills one $74-square-millimeter compute tile, not the whole thing.8 Same transistors. Radically different cost of being wrong.
Why the wire was the asset, not the dies
Splitting a chip into pieces is the easy part. Making those pieces behave like one chip - so a core on die three can reach memory hanging off die seven without the latency tanking performance - is the hard part, and it is where most multi-die designs fail. AMD's answer was Infinity Fabric, announced in April 2017 by its CTO and built as two coordinated planes, one for data and one for control.1 This is the connective tissue. Without a fast, coherent fabric, a chiplet design is just a pile of dies with a latency problem. With one, it is a single logical processor that happens to be assembled rather than carved.
And the fabric is where AMD's advantage compounded rather than sat still. Rome did not just inherit Infinity Fabric - it ran a second generation that doubled read bandwidth per fabric clock, from 16 bytes to 32.8 In Rome's topology, the central I/O die carried eight Infinity Fabric links - one to bind each compute die into the whole - alongside 128 PCIe Gen 4 lanes and eight memory channels.2 The fabric was not a feature bolted onto the chiplets. It was the thing that made chiplets a strength instead of a liability, and AMD had been investing in it since before the laughter started.
Intel paid the premium-node price for the entire chip, I/O included, and ate the full yield penalty of one giant die. AMD paid 7 nm only for the compute tiles, kept the 416 mm² I/O die on cheap 14 nm28, and turned a defect from a chip-killer into a tile-killer. The savings weren't a discount - they were structural, and they let AMD undercut Intel on price while out-coring it.
The moat was the calendar, not the chiplet
Here is the strategic core. A chiplet design is not a secret - the peer-reviewed account of Rome's architecture was published openly at a top circuits conference.3 Anyone could read exactly how AMD did it. So why didn't Intel just copy it? Because the moat was never the diagram. It was the years of co-optimization the diagram represented: a fabric refined since 2017, a two-node manufacturing discipline tuned across Naples and then Rome, and the organizational decision to decouple core and I/O roadmaps so each could move at its own pace.5 That is not a layout you photocopy. It is a capability you build, and building it takes years even when you know the answer. Intel knew the answer by 2019. It still didn't ship Sapphire Rapids, its first chiplet server CPU, until 2023.7
Sit with that gap. From Rome's 2019 launch to Intel's 2023 response is roughly four years - four years in which AMD owned the only cost structure that could put 64 cores in a socket at a price Intel couldn't profitably match.2 In a market where datacenter buyers refresh on multi-year cycles and switching costs are high, four years is not a head start. It is a window long enough to reprice the entire server market and seed a new generation of installed base. AMD didn't win because its chip was prettier. It won because, by the time the laughter stopped, the race was already several laps old.
“Instead of building larger monolithic dies, AMD invested in a strategy to use processor building blocks called chiplets. Decoupling our core and I/O development processes enabled us to shrink the CPU die and optimize variants for performance or energy efficiency.”5
Isn't this just AMD getting lucky on yields?
The fair objection is that AMD's chiplet story is a tidy narrative built on hindsight - that the cost advantage was really a bet on TSMC's 7 nm node landing well, and that AMD oversells the whole thing. There's truth in the skepticism. AMD's own much-repeated claim that chiplets saved roughly 50,000 metric tons of CO2 in 2023 turns out to be an internal hypothetical, not a measured fact - and the counterfactual is artificial, since AMD hasn't even built a monolithic datacenter chip since it discontinued Opteron in early 2017.6 So yes: when AMD markets the architecture, it sometimes reaches for numbers it can't fully stand behind.
But the marketing exaggeration doesn't dissolve the moat - it sits on top of a real one. The structural fact survives every caveat: AMD paid premium-node prices only for the silicon that needed them, contained yield risk to small tiles, and bound it all with an interconnect it had been refining for years. Those are not lucky outcomes; they are the predictable payoff of a deliberate two-node, fabric-first architecture. The proof is the simplest evidence there is. If this were luck, Intel - with more capital, more fabs, and a full public description of the design - would have closed the gap in a year. It took four.37
The deepest moats often hide in a boring decision your competitor refuses to make. AMD's was to stop treating its processor as one indivisible thing and start treating it as a portfolio of functions, each deserving its own cost basis - cutting-edge silicon for compute, cheap mature silicon for I/O, one fast fabric to hide the seams. The lesson generalizes past chips: when a rival insists on doing everything on the premium path because that's how it's always been done, the opening is to decouple the parts that need the premium from the parts that don't. Two cautions, though. Decoupling only works if the connective layer is genuinely fast - a chiplet without a great interconnect is just a latency problem with extra steps. And don't oversell the win with numbers you can't measure; the structural advantage is real enough that you never need the hype.
Intel called it a chip glued together, and in a literal sense it was right. What it missed was that the glue - Infinity Fabric, the two-node discipline, the years of co-optimization underneath the diagram - was the actual invention, and the rest of the industry would spend the better part of a decade trying to reproduce it. AMD didn't out-design Intel on a single chip. It out-decoupled it, and then it ran out the clock. The architecture changed the game. But the calendar is what kept the lead.
Moat Anatomy Canvas
A one-page canvas that dissects a moat instead of asserting it: where the advantage comes from, how much of the market it covers, how long it would take to copy, and what keeps it from eroding. Blank to dissect your own claimed edge; filled as the worked example tracing the structure of the story's defensible advantage. Use it to tell a real moat from a head start.
Included with any subscription, or unlock this tool for $1.99. Get it → · See plans →
Sources
Where this comes from — the filings, records, and reporting behind it.
- 1Infinity Fabric was first announced and detailed in April 2017 (specifically April 6, 2017) by Mark Papermaster, AMD's SVP and CTO; it consists of two planes — Infinity Scalable Data Fabric (SDF) and Infinity Scalable Control Fabric (SCF).WikiChip, Infinity Fabric (IF) – AMD – WikiChip ↗ · 2020-08-18
- 2EPYC Rome (Zen 2, released August 7, 2019) comprises up to nine dies: one centralized 14 nm I/O die (GlobalFoundries) and eight 7 nm compute dies (TSMC); the I/O die incorporates eight Infinity Fabric links, 128 PCIe Gen 4 lanes, and eight DDR4 memory channels; each compute die carries eight Zen 2 cores for a maximum of 64 cores per socket.
- 3The peer-reviewed IEEE ISSCC 2020 paper 'AMD Chiplet Architecture for High-Performance Server and Desktop Products' (Naffziger, Lepak, Paraschou, Subramony; doi:10.1109/ISSCC19947.2020.9063103) is the primary academic source establishing the technical design of the Zen 2 chiplet split.IEEE International Solid-State Circuits Conference (ISSCC), 2.2 AMD Chiplet Architecture for High-Performance Server and Desktop Products · 2020-02-17
- 4First-gen EPYC Naples (Zen 1, launched June 2017) used four 14 nm Zeppelin dies in an MCM — each die a full SoC with its own cores, I/O, and two DDR4 memory channels — not the separated CCD/IOD topology; Naples offered up to 32 cores and 128 PCIe 3.0 lanes.
- 5AMD's own technology page confirms: 'Instead of building larger monolithic dies, AMD invested in a strategy to use processor building blocks called chiplets. Decoupling our core and I/O development processes enabled us to shrink the CPU die and optimize variants for performance or energy efficiency.'
- 6AMD's 4th Gen EPYC (Genoa, Zen 4) chiplet CO2 savings figure of ~50,000 metric tons in 2023 is AMD's own hypothetical estimate (not directly measured), based on avoided wafers vs. a monolithic counterfactual; AMD has not produced a monolithic datacenter processor since Opteron was discontinued in early 2017.
- 7Intel mocked first-gen EPYC Naples as desktop chips 'glued together' at its 2017 launch; Intel's own first chiplet-based server CPU (Sapphire Rapids Xeon) did not reach market until 2023.
- 8EPYC Rome (Zen 2) Infinity Fabric doubled read bandwidth per fabric clock to 32 bytes (vs. 16 bytes in Naples); the Rome chip contains 39.54 billion transistors across nine dies totalling 1,008 mm², with each 74 mm² compute die housing 3.9 billion transistors and the 416 mm² I/O die housing 8.34 billion transistors.