Smart NIC

While reading these words, it’s not just your brain doing the processing required to make this feat possible. We’ve all seen over and under exposed photos and can appreciate the decision making necessary to achieve a perfect light balanced photo. In the laboratory, we observed that the optic nerve connecting the eye to the brain is responsible for measuring the intensity of the light hitting the back of your eye. In response to this data, each optic nerve dynamically adjusts the aperture of the iris in your eye connected to this nerve to optimize these levels. For those with some photography experience, you might recall that there is a direct relationship between aperture (f-stop) and focal length. It also turns out that your optic nerve, after years of training as a child, has come to realize you’re reading text up close, so it is now also responsible for modifying the muscles around that eye to sharpen your focus on this text. All this data processing is completed before your brain has even registered the first word in the title. Imagine if your brain was responsible for processing all the data and actions that are required for your body to function properly?

Much like your optic nerve, the difference between a standard Network Interface Card (NIC) and a smart NIC is how much processing the Smart NIC offloads from the host CPU. Until recently Smart NICs were designed around Field Programmable Gate Array (FPGA) platforms costing thousands of dollars. As their name implies, FPGAs are designed to accept localized programming that can be easily updated once installed. Now a new breed of Smart NIC is emerging that while it isn’t nearly as flexible as an FPGA, they contain several sophisticated capabilities not previously found in NICs costing only a few hundred dollars. These new affordable Smart NICs can include a firewall for security, a layer 2/3 switch for traffic steering, several performance acceleration techniques, and network visibility with possibly remote management.

The firewall mentioned above filters all network packets against a table built specifically for each local Internet Protocol (IP) address under control. An application processing network traffic is required to register a numerical network port. This port then becomes the internal address to send and receive network traffic. Filtering at the application level then becomes a simple process of only permitting traffic for specific numeric network ports. The industry has labeled this “application network segmentation,” and in this instance, it is done entirely in the NIC. So How does this assist the host x86 CPU? It turns out that by the point at which operating system software filtering kicks in the host CPU has often expended over 10K CPU cycles to process a packet. If the packet is dropped the cost of that drop is 10K lost host CPU cycles. If that filtering was done in the NIC, and the packet was then dropped there would be NO host CPU impact.

Smart NICs also often have an internal switch which is used to steer packets within the server rapidly. This steering enables the NIC to move packets to and from interfaces and virtual NIC buffers which can be mapped to applications, virtual machines or containers. Efficiently steering packets is another offload method that can dramatically reduce host CPU overhead.

Improving overall server performance, often through kernel bypass, has been the providence of High-Performance Computing (HPC) for decades. Now it’s available for generic Ethernet and can be applied to existing and off the shelf applications. As an example, Solarflare has labeled its family of Kernel Bypass accelerations techniques Universal Kernel Bypass (UKB). There are two classes of traffic to accelerate: network packet and application sockets based. To speed up network packets UKB includes an implementation of the Data Plane Development Kit (DPDK) and EtherFabric VirtualInterface (EF_VI), both are designed to deliver high volumes of packets, well into the 10s of millions per second, to applications familiar with these Application Programming Interfaces (APIs). For more standard off-the-shelf applications there are several sockets based acceleration libraries included with UKB: ScaleOut Onload, Onload, and TCPDirect. While ScaleOut Onload (SOO) is free and comes with all Solarflare 8000 series NICs, Onload (OOL) and TCPDirect require an additional license as they provide micro-second and sub-microsecond 1/2 round trip network latencies. By comparison, SOO delivers 2-3 microsecond latency, but the real value proposition of SOO is the dramatic reduction in host CPU resources required to move network data. SOO is classified as “zero-copy” because network data is copied once directly into or out of your application’s buffer. SOO saves the host CPU thousands of instructions, multiple memory copies, and one or more CPU context switches, all dramatically improve application performance, often 2-3X, depending on how network intense an application is.

Finally, Smart NICs can also securely report NIC network traffic flows, and packet counts off the NIC to a centralized controller. This controller can then graphically display for network administrators everything that is going on within every server under its management. This is real enterprise visibility, and since only flow metadata and packet counts are being shipped off NIC over a secure TLS link the impact on the enterprise network is negligible. Imagine all the NICs in all your servers reporting in their traffic flows, and allowing you to manage and secure those streams in real time, with ZERO host CPU impact. That’s one Smart NIC!

The Technology Evangelist

Since my first computer in 1983, a TRS80 Model III, I've been educating others on the promises of computing.

What’s a Smart NIC?