Site icon The Technology Evangelist

Four Failings of RoCE

RoCERecently someone suggested that I watch this rather informative video of how Microsoft Research had attempted to make RDMA over Converged Ethernet (RoCE) lossless. Unbelievably this video exposes and documents several serious flaws in the design of RoCE. Also, it appears they’ve replaced the word “Converged” with “Commodity,” to down message that RoCE doesn’t require anything special to run on regular old Ethernet. Here are the four points I got out of the video, please let me know your take:

As the speaker closes out the discussion, he says, “This experiment shows that even with RDMA low latency and high throughput cannot be achieved at the same time as network congestion can cause queues to build up in the network.” Anyone who has done this for a while knows that low-latency and high bandwidth are mutually exclusive. That’s why High-Performance Computing (HPC) tests often start the tests with zero byte packets then scale up to demonstrate how latency increase proportionately to packet size.

All the above aside, this important question remains, why would anyone map a protocol like RDMA, which was designed for use on a lossless local bus, to a switched network and think that this would work? A local lossless bus is very deterministic, and it has requirements bound to its lossless nature and predictable performance. Conversely, Ethernet was designed from the beginning to expect, and accommodate loss, and performance has always been secondary to packet delivery. That’s why Ethernet performance is non-deterministic. The resilience of Ethernet, not performance, was the primary design criteria DARPA had mandated to ensure our military’s network would remain functional at all cost.

Soon Solarflare will begin shipping ScaleOut Onload free with all their 8000 series NICs, some of which sell for under $300 USD. With ScaleOut Onload TCP now has all the kernel bypass tricks RDMA offers, but with all the benefits and compatibility of sockets based TCP, no code changes. Furthermore, it delivers the performance of RDMA, but with much better reliability and availability than RoCE.

P.S. Mellanox just informed me that the NIC specific issues mentioned above were corrected some time ago in their ConnectX-4 series cards.

Exit mobile version