R.I.P. TCP Offload Engine NICs (TOEs)

Solarflare Delivers Smart NICs for the Masses: Software Definable,  Ultra-Scalable, Full Network Telemetry with Built-in Firewall for True Application Segmentation, Standard Ethernet TCP/UDP Compliant

As this blog post by Michael C. Bazarewsky states, Microsoft quietly pulled support for TCP Chimney in its Windows 10 operating system. Chimney was an architecture for offloading the state and responsibility of a TCP connection to a NIC that supported it. The piece cited numerous technical issues and lack of adoption, and Michael’s analysis hits the nail on the head. Goodbye TOE NICs.

During the early years of this millennium, Silicon Valley venture capitalists dumped hundreds of millions of dollars into start-ups that would deliver the next generation of network interface cards at 10Gb/sec using TCP offload engines. Many of these companies failed under their weight of trying to develop expensive, complicated silicon that just did not work. Others received a big surprise in 2005 when Microsoft settled with Alacritech over patents they held describing Microsoft’s Chimney architecture. In a cross-license arrangement with Microsoft and Broadcom, Alacritech received many tens of millions of dollars in licensing fees. Alacritech would later get tens of millions of more fees from nearly every other NIC vendor implementing a TOE in their design. At the time, Broadcom was desperate to pave the way for their acquisition of Israeli based Siloquent. Due to server OEM pressure, the settlement was a small price to pay for the certain business Broadcom would garner from sales of the Siloquent device. At 1Gb/sec, Broadcom owned an astounding 100% of the server LAN-on-Motherboard (LOM) market, and yet their position was threatened by the onslaught of new, well-funded 10Gb start-ups.

In fact, the feature list for new “Ethernet” enhancements got so full of great ideas that most vendor’s designs relied on a complex “sea of cores” promising extreme flexibility that ultimately proved to be very difficult to qualify at the server OEMs. Any minor change to one code set would cause the entire design to fail in ways that were extremely difficult to debug, not to mention being miserably poor in performance. Most notably, Netxen, another 10Gb TOE NIC vendor, quickly failed after winning major design-ins at the three big OEMs, ultimately ending in a fire sale to Qlogic. Emulex saw the same pot of gold in its acquisition of ServerEngines.

That new impetus was a move by Cisco to introduce Fibre Channel Over Ethernet (FCoE) as a standard to converge networking and storage traffic. Cisco let Qlogic and Emulex (Q & E) inside the tent before their Unified Computing System (UCS) server introduction. But the setup took some time. It required a new set of Ethernet standards, now more commonly known as Data Center Bridging (DCB). DCB was a set of physical layer requirements that attempted to emulate the reliability of TCP by injecting wire protocols that would allow “lossless” transmission of packets. What a break for Q & E! Given the duopoly’s control over the Fibre Channel market, this would surely put both companies in the pole position to take over the Ethernet NIC market. Even Broadcom spent untold millions to develop a Fiber Channel driver that would run on their NIC.

Q & E quickly released what many called the “Frankenstein NIC,” a kluge of Applied-Specified Integrated Circuits (ASIC) designed to get a product to market even while struggling to develop a single ASIC, a skill at which neither company excelled. Barely achieving its targeted functionality, no design saw much traction. Through all of our customer interactions (over 1,650), we could find only one that had implemented FCoE. This large bank has since retracted its support for FCoE and in fact, showed a presentation slide several years ago stating they were “moving from FCoE to Ethernet,” an acknowledgment that FCoE was indeed NOT Ethernet.

In conjunction with TOEs, the industry pundits believed that RDMA (Remote Direct Memory Access) was another required feature to reduce latency, and not just for High-Frequency Trading (HFT), another acknowledgment that lowering latency was critical to the hyper-scale cloud, big data, and storage architectures. However, once again, while intellectually stimulating, using RDMA in any environment proved to be complex and simply not compatible with customers’ applications or existing infrastructures.

The latest RDMA push is to position it as the underlying fabric for Non-Volatile Memory Express (NVMeF). Why? Flash has already reduced the latency of storage access by an order of magnitude, and the next generation of flash devices will reduce latency and increase capacity even further. Whenever there’s a step function in the performance of a particular block of computer architecture, developers come up with new ways to use that capability to drive efficiencies and introduce new, and more interesting applications. Much like Moore’s Law, rotating magnetic memory is on its last legs. Several of our most significant customers have already stopped buying rotating memory in favor of Flash SSDs.

Well… here we go again. RDMA is NOT Ethernet. Despite the “fake news” about running RDMA, RoCE and iWARP on Ethernet, the largest cloud companies, and our large financial services customers have declared that they cannot and will not implement NVMeF using RDMA. It just doesn’t fit in their infrastructures or applications. They want low-latency standard Ethernet.

Since our company’s beginning, we’ve never implemented TOEs, RDMA or FCoE or any of the other great and technically sound ideas for changing Ethernet. Sticking to our guns, we decided to go directly to the market and create the pull for our products. The first market to embrace our approach was High-Frequency Trading (HFT). Over 99% of the world’s volume of Electronic trading, in all instruments, runs on our company’s NICs. Why? Customers could test and run our NICs without any application modifications or changes to their infrastructure and realize enormous benefits in latency, Jitter, message rate and robustness… it’s standard Ethernet, and our kernel bypass software has become the industry’s default standard.

It’s not that there isn’t room for innovation in server networking, it’s that you have to consider the customer’s ability to adapt and manage that change in a way that’s not inconsistent or disruptive to their infrastructure, while at the same time, delivering highly valued capabilities.

  • If companies are looking for innovation in server networking, they need to look for a company that can provide the following: Best-in-class PTP synchronization
  • Ultra-high resolution time stamps for every packet at every line rate
  • A method for lossless, unobtrusive, packet capture and analysis
  • Significant performance improvement in NGINX and LXC Containers
  • A firewall NIC and Application Micro-Segmentation that can control every app, VM, or container with unique security profiles
  • Real, extensive Software Definable Networking (SDN) without agents

In summary, while it’s taken a long time for the industry to realize its inertia, logic eventually prevailed.  Today, companies can now benefit from innovations in silicon and software architecture that are in deployment and have been validated by the market.   Innovative approaches such as neural-scale networking, which is designed to respond to the high-bandwidth, ultra-low-latency, hardware-based security, telemetry, and massive connectivity needs of ultra-scale computing, is likely the only strategy to achieve a next generation cloud and datacenter architecture that can scale, be easily managed, and maybe most importantly secured.

— Russell Stern, CEO Solarflare

One thought on “R.I.P. TCP Offload Engine NICs (TOEs)

Leave a Reply