Extreme Packet Capture, Star Trek Style (Part 1)

At one point or another, we’ve all watched Trek. From Scotty yelling “I’m giving it all she’s got cap’n” to Picard’s “Make it so.” We love the mix of humanity & technology. Like Scotty, most techies get excited by the thought of pushing their own skills & systems to their very limits. Is there a place though in IT for pushing the tools we use to their limit, and can we overload our systems? I believe there is a place, and when it’s appropriate we should push our systems, and ourselves for that matter, to their limits so you can know for certain what we can truly expect. Computers these days have many safety systems to prevent us from frying them. Let’s face it, any techie worth his salt has the smell of burnt Silicon stored in the neurons of their olfactory nerve, so let’s try and avoid that. Yes, some of these systems can be bypassed, but today that’s not what I’m suggesting.

So how do we perform network analytics or problem determination on 10GbE links operating at near Warp speed? The answer is simple, get the right hardware, and software. You wouldn’t send a crew venturing off into deep space without at least a Warp 2 capable ship, so you shouldn’t expect to capture packets at high data rates on a 10GbE link with built in hardware and Libpcap. Heck, even my little $69 Raspberry Pi can be easily configured with Libpcap & Wireshark to analyze traffic. No, to do real 10GbE performance monitoring and management you’ll need the right network interface card and software. Let’s assume you’re on a tight budget, and most folks are, my recommendation is an Emulex One Connect Network Xceleration (NX) adapter (P/N OCe12101DM-SNF2) which offers a single wire-rate 10GbE port WITH Myricom’s FastStack Sniffer10G included, all for under $690 list price. Now you might ask yourself, why not use something else? True there are other adapters and other solutions, although I think you’d be hard pressed to find another lossless wire-rate solution for 10GbE for one for under $700.

So what’s so special about this Emulex adapter and FastStack Sniffer10G? Several things: a transparent access model, lossless wire-rate 10GbE packet capture, multiple shared parallel memory buffers, two buffer creation strategies, adjustable hashing algorithm, user space or kernel mode, wire-rate injection, and a number of useful tools and sample code. Let’s take a moment and look into each of these.

By a transparent access model, we simply mean a libpcap replacement library on Linux, no coding required! This replacement library supports multiple in memory buffers & strategies that are totally transparent to your libpcap compliant application. Sniffer10G employs the use of environment variables for configuring in memory queues. This allows Sniffer10G to easily plug into programs like: Snort, Suricata, BRO-IDS, WireShark, TCPDump, etc…

Lossless 10GbE wire-rate packet capture requires three things: hardware designed for gargantuan packet rates, software architected to bypass the OS and quickly move packets into user space, and a tight marriage between the two. This Emulex adapter uses a processor made by Myricom capable of sustaining a packet handling rate of approximately 16 million packets per second (Mpps), combined send & receive. A 10GbE link fully loaded with the smallest possible packets, 64 bytes, can sustain a wire-rate of 14.88Mpps. Often folks interested in packet capture typically don’t retransmit packets on the same link they’re capturing on so a ceiling of 16Mpps is a perfect match to catch 14.88Mpps of worst case inbound traffic.

Multiple shared memory buffers is a way to spread the work. FastStack Sniffer10G supports two buffer models: cloned & flow hashed. With cloning, Sniffer10G will create multiple in-memory copies of the same inbound traffic. This enables you to have multiple different applications running that each has their own independent in-memory copy of the packets being captured. At wire-rate, Sniffer10G should be able to sustain up to three clones for a total of four in-memory queues of the same identical captured traffic. Sniffer10G can provide more clones, but after three at wire-rate, we may begin to drop packets. Cloning brings up a significant problem though, each queue could potentially receive a new packet every 67 nanoseconds so your code that is working these queues has to be extremely tight. Flow hashing solves this problem. With flow hashing, Sniffer10G spreads traffic between up to 32 different queues (each queue should be bound to its own core). This gives your code working the queue in a worst/best case up to two full microseconds (best meaning that the packet distribution is even across 32 buffers) to process inbound packets.

So what about flow hashing. Well, flow hashing is non-deterministic, meaning that it’s not a simple round robin method that yields easily predictable results because frankly, most folks interested in packet capture really don’t want this type of packet distribution between queues. Flow hashing allows you to maintain network flow affinity meaning that the same queue will always receive packets from the same source going to the same destination/port etc… so that when you do deep packet inspection for things that span multiple packets you’ll have access to all those packets in the same queue that make up that traffic flow. Pretty cool huh? So flow hashing is handled by default by using the IP/TCP/UDP source and destination addresses. This can be altered by setting the Receive Side Scaling (RSS) flags prior to launching your code using an environment variable (SNF_RSS_FLAGS).

The remaining cool features, such as an optional kernel space Sniffer10G driver, wire-rate injection & samples will be addressed in part two of this series… Stay tuned for Part 2.