When Lisa Su, CEO of AMD, presented a keynote talk at CES 2021 last week, she reminded me of a crucial aspect of High-Performance Computing (HPC) that often goes unnoticed. HPC is at the center of many computing innovations. Since SC04, I’ve attended the US SuperComputing conference in November pretty much religiously every other year or so. SC is where everyone in technology hauls out their best and brightest ideas and technologies, and I’ve been thrilled over the years to be a part of it with NEC, Myricom, and Solarflare. Some of the most intelligent people I’ve ever met or had the pleasure of working with, I first met at an SC event or dinner. SC though is continuously changing; just today, I posted a reference to the Cerebras CS-1, which uses a single chip that measures 8.5″ on each side to achieve SC performance that is 200X faster than #466 on the Top500.org list of supers. High-Performance Computing (HPC) is entering its fourth wave of innovation.

The first was defined by Seymour Cray in the early 1970s when he brought out the vector-based mainframe. The second wave was the clustering of Linux computers, which started to become a dominant force in HPC in the late 1990s. When this began, Intel systems were all single-core, with some supporting multiple CPU sockets. The “free” Linux operating system and low-cost Gigabit Ethernet (GbE) were the catalysts that enabled universities to quickly and easily cobble together significantly robust systems. Simultaneously, the open development of a Message Passing Interface (MPI) was completed that made it much easier to port existing first wave HPC applications over to clustered Linux systems without having to use TCP/IP. This second wave brought about advancements in HPC networking and storage that further defined it as a unique market. Today we’re at the tail end of the third wave of innovation driven by the Graphical Processing Unit (GPU). Some would say the dominant HPC brand today is NVIDIA because they’ve pushed GPUs’ envelope further and faster than anyone else, and they own Mellanox, the Infiniband networking guys. Today, our focus is the expanding role of accelerators, beyond GPUs, in HPC as they will define this new fourth wave of innovation.
Last week I thought this fourth wave would be defined by a whole new mode where all HPC computations are pushed off to special-purpose accelerators. These accelerators would then leverage the latest advances of the PCI express bus, new protocols for this bus, and the addition of Non-Volatile Memory express (NVMe) for storage. The fourth and soon fifth generation of the PCIe bus has provided dramatic speed improvements and support for two new protocols (CXL & CCIX) on this bus. Then along came the Cerebras CS-1 utilizing an 8.5″ square chip package that holds a single gargantuan chip with over a trillion transistors. While I think Cerebras may stand alone for some time using this single chip approach, it won’t be long before AMD considers the possibility of pouring hundreds of Zen3 chiplets into a package with an Infinity fabric that is MUCH larger than anything previously utilized. Imagine a single package that rivals Cerebras at 8.5″ square with hundreds of Zen3 chiplets (these are eight x86 cores sharing a common L3 cache), a large number of High Bandwidth Memory (HBM) chiplets, some FPGA chiplets contributed from Xilinx, along with Machine Learning (ML) chiplets from Xilinx latest Versal family, and chiplets for encryption, and 100GbE or faster networking. Talk about a system on a chip; this would be an HPC Super in a single package rivaling other many rack systems on the Top500.org.
More to come in Part II as we explain in more detail what I’d been thinking about regarding accelerators.
[…] return to the “Expanded Role of HPC Accelerators” in my next post, but before doing so, we need to take a step back and look at how data is […]