The Mummy in the Datacenter

This article was originally published in November of 2008 at 10GbE.net.

While Brendan Fraser travels China in his latest quest to terminate yet another mummy. IT leaders are starting to wonder if they’ve got a mummy of their own haunting their raised floor. This mummy is easy to find, he’s wrapped in thick black copper cables, and his long fingers may be attached to many of your servers. It is Infiniband!
 
Once praised as the next generation networking technology, having conquered High-Performance Computing, it continued its battle for world networking domination by attacking storage and now the data center. It promises you 20Gbps, hinted that it would soon offer 40Gbps and shared with you its plans for 160Gbps! It claimed full bisection, the ability to use all the network capacity available, and low latency (the time it takes to actually move a packet of data around). It’s democratic, the software stack was developed by an “open” committee of great technological leaders so it MUST be good for us. Everyone from HP to SGI has sung it’s praises whenever they’ve come by to peddle the latest in server technology. A corpse wrapped in rags, a centuries old immortal Dragon Emperor or a black cable bandit, they all can be eradicated.
 
We will tear this black cable Bandit down to size one claim at a time. First, they assert that it’s 20Gbps, how about 12Gbps on its best day with all the electrons flowing in the same direction. Infiniband employs what is know as 8b/10b encoding to put the bits on the wire. For every 10 signal bits, there are 8 useful data bits. Ethernet uses the same method, the difference is that Ethernet for the past 30 years has advertised the actual data rate, the 8, while Infiniband promotes the 25% larger and useless signal rate, the 10. Using Infiniband math Ethernet would then be 12.5Gbps instead of the 10Gbps it actually is. So using Ethernet math Infiniband’s Double Data Rate (DDR) is actually only 16Gbps and not the 20Gbps they claim. But wait there’s more! I said earlier that you will only get 12Gbps under ideal conditions, where did the other 4Gbps go? Today most servers use PCIe 1.1 8-lane I/O slots. Ideally, these are 16Gbps slots, once you add in PCIe overhead though you only get about 12Gbps on the best of systems. So with a straight face, they sell you 20Gbps knowing in their heart you’ll never get more than 12Gbps.
 
Full bi-section, the ability for a network of servers to use all the network fabric available. Infiniband claims that using their architecture and switches you can leverage the ENTIRE network fabric under the right circumstances. On slides, this might be true, but in the real world, it’s impossible. Infiniband is statically routed, meaning that packets from server A to server X have only one fixed predetermined path they can travel. One of the nations largest labs proved that on a 1,152 server Infiniband network that static routing was only 21% efficient and delivered on average 263MB/sec (2.1Gbps of the theoretical 10Gbps possible). So when they tell you full bisection, ask them why LLNL only saw 21%? In an IEEE paper presented last week, it was proven that statically routed system can not achieve greater than 38% efficiency. Now some of the really savvy Mummy supporters will say that the latest incantation of Infiniband has adaptive routing, they do this by using yet another shell game, they redefine the term adaptive routing to mean more than one static route. Real adaptive routing and using a pair of static routes are vastly different things. Real Adaptive routing can deliver 77% efficiency on 512 nodes and nearly 100% efficiency on clusters smaller than 512 nodes. If you want full bisection for more than a 16 node cluster talk with Myricom or Quadrics, they do real adaptive routing.
 
Latency is the time it takes to move a packet from one application on a network server to another application on a different server on the same network. Infiniband has always positioned itself as being low latency. Typically Infiniband advertises a latency of roughly three microseconds between two NICs, using zero-byte packets. Well in the past year 10GbE NICs and switches have come onto the market that can achieve similar performance. If we look at Arista’s switches they measure latency in a few hundred nanoseconds while Cisco’s latest 10GbE switches are sub four microseconds, compared to the prior generations that were measured in the 10’s of microseconds or more. Now when the Infiniband crowd crows about using low latency switching ask them about Arista or BLADE Network technologies 10GbE switches.
 
Infiniband claims 20Gbps and delivers less than 12Gbps. Infiniband claims full bisection yet beyond a small network they can’t exceed 38% efficiency. Infiniband claims low latency and now 10GbE can match it. Where is their value proposition in the data center?

Leave a Reply