Apple, Ferrari, & Dyson have all proven the immense power artful design has over influencing our purchasing decisions. Does this really apply to those products that we don’t see, is design just as critical? Yes, and just last week a master mechanic who’d just rebuilt my friend’s Transmission explained why. He then handed me the failed Ford part, which was made out of pressed steel, and explained that it typically lasted 200K miles. He then showed me the equivalent Chevy part which was cast, and considerably heavier, and rarely failed. So why doesn’t Ford use a cast metal part? Simple, one of Ford’s key design criteria is weight. This enables Ford to more easily meet their Corporate Average Fuel Economy (CAFE) regulations that establish the ratio of high MPG to low MPG vehicles a company can sell. So to save two pounds on this part, perhaps a 1/100 of a MPG, Ford intentionally is designing their truck transmission to fail at 200K miles.
Well what about 10Gb NIC design, perhaps the third most important subsystem in your server behind the CPU & memory, how is this important? For this discussion, we’ll compare Solarflare’s SFN7002F to Intel’s X520-DA2 (which uses Intel’s 82599 controller). Solarflare’s key design criteria is low latency, a fancy way of saying reducing the time it takes to get your data into & out of main memory. Contrast this to Intel whose focus is on producing a general purpose commodity product that is designed to meets the widest range of requirements. We’re going to look at four areas that highlight significant performance differences between these two approaches: transmit & receive queues, MSI-X interrupt vectors, Receive Side Scaling (RSS) queues, and physical/virtual functions (a foundational approach to supporting virtualized computing).
First we need to look at what a network interface card (NIC) really does. A NIC receives information from an external network, and places it into your computer’s memory. It also takes information provided to it, and places that on the network. Solarflare’s approach to networking has been honed over the past five years by servicing the financial markets of the world, the folks who dollarize every nanosecond of the trading day. As such Solarflare has 1,024 transmit & receive queues, or Virtual NICs (vNICs) connected to each 10GbE port. On the Ethernet controller chip Solarflare has also placed a layer-2 network switch in front of those 1,024 vNICs. This network switch can use the packet’s VLAN tag to intelligently steer packets to the proper vNIC assigned to a given VLAN. While Intel on the other hand has only 128 receive/transmit queues attached to each port, or 1/8th of what Solarflare has committed.
Message Signaled Interrupts (MSI-X), is a very common way to inform the processor that data is waiting at an I/O device to be picked up. With PCI Express, we no longer have dedicated hardware interrupt request lines, so I/O devices have to use a shared messaging interface to inform the processor data is waiting. Solarflare supports 1,024 MSI-X interrupts compared to Intel’s 128. Again, this is 1/8th the underlying infrastructure necessary for passing high performance data to the host CPU complex. All of these numbers are per port.
As computers moved from one processor chip to two, then from single core chips to now 18 cores/chip (Intel) the challenge has always been mapping pathways from I/O devices directly to these processor cores. One of the most efficient mechanisms for linking cores to ethernet receive queues is a process known as Receive Side Scaling (RSS). On Intel servers PCI Express slots have an affinity for a specific CPU socket. So for optimal performance you align your 10G ethernet NICs to utilize specific CPUs by the PCI Express slot you install them into. Suppose for example you have a state-of-the-art Haswell dual socket server, and each socket has an 18 core processor. For optimal performance you might install two dual port 10GbE adapters, one in a slot that maps to CPU socket 0, and a second in a slot that maps to CPU socket 1. With this approach you can then achieve peak network performance on your server. There’s a problem though, Intel’s 82599 controller only supports 16 RSS queues per port so two cores on each of your sockets will receive less then optimal performance, as their traffic has to be routed through other cores. Solarflare’s controller on the other hand has 64 RSS queues per port, and can easily spread traffic over multiple paths to every core in your server.
Today many computers rely on virtualization to fully utilize all the resources of the server. To do this network adapters support what are called Physical & Logical functions. Physical Functions (PFs) are a method for exposing to the Virtual Machine’s Hypervisor to what is essential a fully complete physical instances of the network adapters. Solarflare supports 16 Physical Functions while Intel only supports two. Virtual Functions (VFs), are a method for creating full virtualized NICs, Solarflare supports 240 while Intel only has 128. In testing with 32 VMs running Solarflare has demonstrated that it delivers 18% greater overall performance. Note that these PF & VF numbers are per adapter.
Earlier we mentioned that Solarflare’s NICs were designed around latency, yet we’ve not covered latency. Solarflare’s generic kernel device driver that runs on their commodity SFN7002F adapter delivers sub 4 microsecond latency for a 1/2 round trip. Note a half round trip is a single send & receive combined. Intel with their generic driver is more than double this. Furthermore, Solarflare also sells an optional driver called Open Onload that further reduces latency to under 1.7 microseconds! When it comes to overall server performance, network adapter latency really does matter.
So next time you’re selecting components for a server deployment please consider what you’ve learned above about 10GbE NICs, and “Choose Wisely.”