VMA – Voltaire Messaging Abandoned

This morning Mellanox announced that they are releasing the Voltaire Messaging Accelerator (VMA) as open source. Tom Thirer, the director of product management at Mellanox said: “By opening VMA source code we enable our customers with the freedom to implement the acceleration product and more easily tailor it to their specific application needs.”  He then followed this up with “We encourage our customers to use the free and open VMA source package and to contribute back to the community.” Now to be fair, I work for a company that has been selling 10GbE NICs, along with delivering & supporting a competing open source kernel bypass stack to the customer for over 5 years.

So what does moving VMA into OpenSource mean to Mellanox’s customers who run their business on systems that use VMA in production? Well, any problems or issues you now, or will ever have in the future with VMA, are now your problems and you get the privilege of fixing them.

OpenSource is a great method for rapidly advancing a broad appeal code base.  We all know and love Linux, the perceived shining star of the open source community, it runs on everything from a $60 Raspberry Pi to IBM’s System z mainframes. OpenSource works very well when there is significant interest, and demand for what the code offers. Mellanox’s VMA isn’t Linux, it’s a very specific network driver that runs on only one company’s network chip in a very niche set of markets. One of the main reasons Mellanox acquired Voltaire in 2011 for $208M was to gain control of VMA, it was one of the few unique features of Voltaire’s product line. Ever since then Mellanox been trying to stabilize the code base, reduce the jitter (unpredictable delays that can paralyze low latency systems), and exterminate some very pesky bugs. Those bugs and the support issues attached to them are the driving reason behind why Mellanox is now giving the source code away to the open source community.

Some might argue that they’re doing the financial services, HPC, and Web2.0 markets a huge favor by “donating” this code to the community. Mellanox is a business, they’ve spent many millions to acquire VMA in 2011, and likely much more over the past two years to further develop & maintain it. You don’t just jettison an expensive piece of code because you want to give your customers “the freedom to implement the acceleration product and more easily tailor it to their specific application needs.”

It’s been known in the industry for at least six weeks that Mellanox was going in this direction, in fact, the source code has actually been in Google Code since August 12, so whose contributed changes? Well, Mellanox has, over 30 times in fact, in order to get ready for this announcement.  This is big news, so how many people are following the code? Three, and two are the Mellanox employees who have submitted code fixes, all but one submitted by the same employee. How about the discussion list perhaps users are commenting there, nope it’s empty.

Finally, if Mellanox were serious about VMA moving forward there would be one or more courses on this product in the Mellanox Academy, today there are zero!  Check out the course catalog for yourself.  If the catalog isn’t enough to convince you that Mellanox’s focus is on Infiniband then let’s follow the numbers, and look at their most recent financials. Toward the end of their last quarterly SEC 10Q filing, you’ll see that Ethernet made up only 14% of their revenue. FDR, QDR & DDR Infiniband combined make up over 80% of their revenue. Mellanox is Infiniband, and more importantly, Infiniband is Mellanox.

Now Mellanox has said that they will still provide a binary version of VMA that they will support, but they’ve not publicly stated what that support contract will cost.

Building a Better Security Appliance

In the past, this Blog has discussed how one might setup a rule based cyber security application like Snort or Suricata on 10Gb Ethernet using Myricom’s FastStack Sniffer10G packet capture solution. I learned recently of another approach for managing cyber security which utilizes a different and unique approach. This technique leverages detailed traffic logs, and an advanced scripting engine tuned for managing Internet domain sourced content. The application is called Bro, and it’s fast becoming the hot new tool for managing cyber security. A partner of ours, Reservoir Labs, recently released a 1U cyber security appliance that at its core uses Bro with FastStack Sniffer10G to provide a stand-alone or managed cluster solution.

While Snort and Suricata rely on rules to analyze traffic Bro uses a scripting language designed to manipulate Internet domain sourced packet flows. Here is how packets actually flow through this solution. Raw traffic is captured via a network tap which is wired into an Emulex card running FastStack Sniffer10G. Sniffer10G then utilizes flow hashing via a four tuple (source/destination address/port) to spread inbound traffic between ring buffers attached to each core on the server. Bro then connects to these Libpcap structured ring buffers and combs through that data utilizing a sophisticated schema designed to identify and log real time traffic into flows. With Bro running on each core it can then leverage the full system to search for threats. The scripting language is similar to Python, but it was designed to analyze traffic flows looking for dynamic cyber-attacks.

Furthermore, Bro can run standalone or via a unified cluster based management framework. While all this sounds new, it isn’t. Bro has a long history coming out of Lawrence Berkley National Labs where it’s been running in production since 1996. So if you’re building a state of the art cyber security infrastructure for your enterprise you should also seriously consider utilizing Bro or tap into the folks at Reservoir Labs.

Extreme Packet Capture, Star Trek Style (Part 2)

Our approach to technology defines who we are, as individuals and groups. The groups could be companies, countries or a species, regardless the technology we employ demonstrates our origins and our roots. The Klingons are a fictional race of warriors and hunters who pride themselves on war ships with camouflage cloaking, strong defensive shields, and superior maneuverability. In contrast, the fictional Federation is a collection of races whose focus is on exploration. Their star charts, scientific scanners, and fast charging photon based phasers offer a unique contrast to the Klingon’s much slower & less efficient particle beam disrupters.
The same holds true for packet capture solutions. Our company, Myricom, designed our product in collaboration with a government agency interested in network security. One of the key design criteria was the replacement of an already existing method with one that bypassed the operating system so lossless packet capture at wire-rate could be achieved. The process of capturing network packets is transparent to the end user application, and they are stored in memory via a user space or kernel space driver. This technique enables our product to support over a dozen existing applications right out of the box. Another vendor designed their capture product for the financial market where saving the market data to disk for later analysis is critically important. Both of these approaches fit perfectly for the problems they solve, but one is more versatile.
In part one I promised to wrap up this series by talking about injection and sample code, so let’s begin. Injection is simply taking packets in memory and putting them onto the ethernet. Where FastStack Sniffer10G differentiates itself from other approaches is that it has total control over the network interface, with nothing between it and the wire. Therefore when you capture packets, you can modify the contents if you like, and then inject them back onto the wire without anyone being the wiser. Most security appliances do just this, they act as a man in the middle, a guard who looks at everything and only lets in, our out if you’re really careful, things that are acceptable. Since Sniffer10G is an in-memory solution this can be sustained at wire-rate provided your man-in-the-middle code is pretty tight, and you leverage multiple queues for processing traffic in parallel. There is no transparent way to offer injection so you need to use the  Application Programming Interface (API), but several useful sample programs are included, with the source.
The sample programs provided that do injection are: snf_pktgen, snf_replay, and snf_bridge. The snf_pktgen is just what it’s name implies a simple packet generator. You can tell snf_pktgen what packet size to use, how many packets to send (or infinite) and the number of concurrent parallel threads to use to send them. It will make a best effort to pack the wire full of packets with the size you provided. Similar to that we have snf_replay which will play back a sequence of packets already constructed to the ethernet. Here you pass snf_replay the file name that contains the packets. An optional packet rate (undocumented for various reasons). Another option to read the whole packet file into memory prior to writing it to the ethernet. Also optional insertion of a VLAN tag for those packets without tags. The number of times to replay the file to the ethernet, and the number of threads you’d like transmitting concurrently. Finally, we have snf_bridge, I’ve not personally used this one. With snf_bridge you define the port to use for capture and injection (they can be different), the number of in memory rings to use, and the CPU binding mask. The you can specify the number of packets to forward before exiting. The number of times to retry forwarding a packet before dropping it. The amount of time to wait between capture & injection, in milliseconds. Finally, the option to reflect non-UDP and TCP packets to the network device. All of these sample programs, and several more, are available in the /opt/snf/bin/tests directory in binary form and /opt/snf/share/examples directory as source code.
So Federation, Romulan, or Myricom, a clear problem statement, and the tool box we bring to the table defines the products we build and the solutions we offer. Unless of course, you’re Captain Kirk, who prefers when possible to redefine the problem into something that he can solve with the resources at hand. If you ever want to chat packet capture please don’t hesitate to flip open your communicator and ring me up, 919-389-5064.

10GbE is Now a Commodity?

Last month Tehuti Networks brought out what may be the first truly commodity 10GbE NIC chip outside of Intel or Broadcom. Their TN4010 is small (11x11mm), draws a nominal amount of power (1W), it was designed for LAN on Motherboard (LOM) applications and from their data sheet, it boasts impressive specifications (ex. <4us latency). Oh, and did I mention that in large volumes you can acquire the chip for only $10! Now if this was Tehuti’s first chip I honestly wouldn’t be wasting my time writing this piece, but they’ve been in and out of 10GbE more than once.
Let’s take a moment and actually look at the specifications of the chip beyond the highlight reel. First on the Ethernet side we have a chip that claims to support SGMII, XAUI and CX4 so in English it can be used to build a backwards compatible card with an RJ45 plug supporting Cat5 for 10/100/1000Mbps and 10Gbps operation, or a low latency SFP+ chip that demands optics and fancy cables. On the server side we have a PCIe Generation two interface spanning 4-lanes, which is the most intelligent bus choice given that it’s a single port chip. So what’s in between? Tehuti OptiStrata processor. They claim it is actually a network traffic accelerator capable of supporting a number of stateless TCP/IP off-loads thereby freeing up the host CPU. It’s not OS Bypass, but it’s the next best thing. What exact stateless offloads are supported is unclear, but this can make a huge difference, especially given that it’s a $10 chip. Finally, in an interesting twist they also support something known as IEEE 802.3az. I had to look this one up, it’s the Green Ethernet standard.  It appears that this chip in times of low activity will throttle back it’s power usage, while still being backward network compatible, by over 50%.
So is 10GbE now a commodity, it is if this chip works and people start soldering it down on mother boards, or building NICs with it. Right now the jury is out because Tehuti only provides a reference design. They learned from their last foray into the 10GbE market not to get into the messy business of actually selling cards, because it takes substantial capital, time and most important of all a skilled sales staff. Frankly though, not bringing a contract manufacturer online though to turn out their reference design as a single instance of their product, and doing a simple web-direct credit card sales model is a huge mistake. This raises the bar for anyone even remotely interested in considering using their chip for adoption.