99.99999% Available + 2.7us = 1 Awesome Computer

What do you get when you put together a pair of dual socket servers running in hardware lock-step with a pair of leading edge, ultra-low latency OS Bypass network adapters all running RedHat Enterprise Linux? One awesome 24 core system that boasts 99.99999% uptime, zero jitter, 2.7 micro seconds of 1/2 round trip UDP latency, and 2.9 microseconds for TCP.

How is this possible? First, we’ll cover what Stratus Technologies has done with Lock-Step, and how it makes the ftServer dramatically different than all others. Then we’ll explain what jitter is, and why removing it is so critical for deterministic systems like financial trading. Finally, we’ll cover these impressive Solarflare ultra-low latency numbers, and what they really mean.

We’ve all bought something with a credit card, flown through Chicago O’hare, used public utilities, and possibly even called 9-1-1. What you don’t know is that very often at the heart of each of these systems is a Stratus server. Stratus should adopt the old Timex slogan “It takes a licking and keeps on ticking” because that’s what it means to provide 99.99999% up time, you’re allowed three seconds a year for unplanned outages. Three seconds is how long it takes me to say “99.99999% up time.” How is this possible? Imagine running a three legged race with a friend. Ideally, if you each compared your actions continuously with every step you could run the race at the pace of the slowest of the two of you. This is the key concept behind Lock-Step, comparing, then knowing what to do as one starts to stumble to ensure the team continues moving forward no matter what happens. Stratus leverages the latest 12-core Intel Haswell E5-2670v3 server processors with support for up to 512GB of DDR4. If any hardware component in the server fails, the system as a whole continues moving forward, alerts an admin who then replaces the failed component, then that subsystem is brought back online. I challenge you to find another computer in your life that has ever offered that level of availability over the typical 5-7 year lifecycle that Stratus servers often see.

So what is Jitter? When a computer core becomes distracted from doing its primary task to go off and do some routine house keeping (operating system or hardware driven), the impact of that temporary distraction is known as Jitter. With normal computing tasks, Jitter is hardly noticeable, it’s the computer equivalent of background noise. With certain VERY time critical computing tasks though, like say financial trading, even one Jitter event could be devastating. Suppose your server’s primary function is financial trading, and it receives a signal from market A that someone wants to buy IBM at $100, and on market B it sees a second signal that another entity wishes to sell IBM at $99. So the trading algorithm on your server buys the stock on B for $99, but then the instant it has confirmation of your purchase a thermal sensor in your server generates an interrupt. The CPU then that is running your trading algorithm goes off to service that interrupt which results in it running some code to determine which fan to turn on. Eventually, say a millisecond or so later, control is returned to your trading algorithm, but by then the buyer on market A is gone, and the new price of IBM has fallen to $99. That’s the impact of Jitter, brief often totally random moments in the trading day stolen to do basic house keeping. These stolen moments can quickly add up for traders, and for exchanges, they can be devastating. Imagine a delayed order as a result of Jitter missing an opportunity! Stratus Technologies has crawled through their server architecture and eliminated all potential sources of Jitter. Traders & exchanges using other platforms have to do all this by hand, and this is still as much art as it is science. That’s one reason why over 1,400 different customers regularly depend on Solarflare.

Finally, there’s ultra-low latency networking via generic TCP/IP and UDP networking. In the diagram below network latency is in blue. Market data arrives via UDP and orders are placed through the more reliable TCP/IP protocol. Here is a quick anatomy of part of the trading process showing one UDP receive and one TCP send. There are other components, but this is a distilled example.

Initially, the packet is received in from the wire, the light blue block, and the packet passes through the physical interface, electrical networking signals are converted to layer-2 logical bits. From there the packet is passed to the on-chip layer-2 switch which steers the packet to one of 2,048 virtualized NICs (vNIC) instances, also on the chip. The VNIC then uses DMA to transfer the packet into system memory, all of which takes 500 nanoseconds. The packet has now left the network adapter and is on its way to a communications stack somewhere in system memory, the dark blue box. Here is where Solarflare shines. In the top timeline, the dark blue box represents their host kernel device driver and the Linux communications stack. Solarflare’s kernel device driver is arguably one of the fastest in the industry, but most of this dark blue box is time spent working with the kernel. There are CPU task switches, and several memory copies of the packet, as it moves through the system, and thousands of CPU instructions are executed, all told this can be nearly 3,000 nanoseconds. In the bottom timeline, the packet is DMA’d directly into user-space where Solarflare’s very tight user space stack sits. This is where the packet is quickly processed and handed off to the end user application via the traditional sockets interface. All without additional data copies, and CPU task switches, and completed in just under 1,000 nano seconds a savings of about 2,000 nanoseconds or roughly 4,600 CPU instructions for this processor at this speed. All this, and we’ve just received a packet into our application, represented by the green blocks.
So in the two bars above the first represents market data coming in via Solarflare’s generic kernel device driver than going through the normal Linux stack until the packet is handed off to the application. The response packet, in this case, a trade via TCP, is sent back through the stack to the network adapter and eventually put on the wire, all told just over 9,000 nanoseconds. With Stratus & Solarflare the second bar shows the latency of the same transaction, but traveling through Solarflare’s OS Bypass stack in both directions, the difference here is that the transaction hits the exchange over 4,000 nanoseconds sooner. This means you can trade at nearly twice the speed, a true competitive advantage. Now four millionths of a second aren’t something humans can easily grasp, so let’s jump to light speed, this is how long it takes a photon of light to cover nearly a mile.
So if you’re looking to build a financial trading system with ultra-high availability, zero jitter & extreme network performance, you have only one choice Stratus’s new ftServer.

A 10GbE Capture Platform: Snort, Bro, Suricata & Wireshark

Perhaps you’re responsible for your companies network security, or maybe you’re designing an appliance for your business?  If so, then you’ve likely already become familiar with SnortBro,  Suricata, and Wireshark. As you may have recently discovered for real performance with these applications at 10GbE speeds you need the proper adapter, and capture driver or you risk dropping vast numbers of packets. Furthermore, it is now possible to not only capture a copy of the packets received on the server but also transmitted. All while running your programs on other cores within the same server. Furthermore, if you’re interested in capturing all the received, transmitted, and virtual machine (VM) to VM traffic within your server then you can actually designate one VM to capture a copy of all the network traffic for analysis. To further sweeten things, this can also be done from within a Docker container built to handle capture.

Some might ask why you would want to capture transmitted packets and run them through Snort, Bro or Suricata? Simple, to look for outbound traffic patterns that might indicate a breach. Perhaps a VM on one of your servers has been compromised, and it is sending out your companies precious AutoCAD files in the middle of the night to a country in Asia you don’t do business with. If you’re not looking at transmitted packets you may never detect, or stop a breach of this nature. Setting up rules to look for file transfers of specific types, during specific times, or conforming to other criteria specific to outbound traffic is a fairly new trend. Also, this capture doesn’t have to be packets on your server, you can take a more traditional approach and dedicate a server for capture in every rack, then use an optical tap or the spanning port off a switch. In fact, you can install multiple adapters and aggregate the ports together until you hit the performance limits of your system.

Most accelerated 10G capture platforms require both a performance adapter and a special purpose capture driver.  Furthermore to capture both received and transmitted packets in parallel you have only one choice, and that is an adapter and software from Solarflare. You can start with Solarflare’s Flareon Ultra SFN7122F adapter and a SolarCapture Live license, and as your needs grow scale to their dual 40G adapter the SFN7142Q.

Solarflare provides this high-performance capture platform designed specifically for engineers looking to build leading edge security solutions. Let’s take a closer look at the adapter and software. The network server adapter, the Solarflare SFN7122F, is a board that contains one Solarflare single core Ethernet controller chip. This Ethernet controller core on this chip has multiple packet engines each dedicated to processing received or transmitted packets. This enables the SFN7122F adapter to support wire-rate lossless packet capture, even with huge bursts of the smallest sized packets (64 bytes each) on a single port. This dedication of resources enables transmitting wire-rate 64-byte packets at the same time, on the same interface, and in parallel, without impacting capture performance. Furthermore, the SFN7142Q utilizes the same Ethernet controller, but with two of these on the same chip so it can support capture on two 40G ports, or four 10G port, or wire-rate lossless capture on two 10G ports.

The next component in this platform is SolarCapture Live (SCL), which provides a complete Libpcap replacement library, and a Snort DAQ interface. This allows for two fairly seamless methods for easily connecting to Snort. If SCL is initialized in cluster mode it can spawn multiple capture instances, up to one per core, and deliver all network packets in Libpcap format spread across these cores. SCL then uses advanced receive flow steering to flow-hash the packets across all of these capture nodes within the capture cluster. Flow-hashing is the process of looking at several key fields in the packet header then always routing all the traffic from a given flow consistently to the same cluster node (core) so security applications like Snort, Suricata and Bro can always see all the given data for that specific network flow.

This Solarflare capture platform also supports an optional Solarflare Precision Time Protocol (PTP) software license that can accept an external hardware Pulse Per Second (PPS) signal (via an additional optional bracket kit) which provides the necessary mini-BNC connectors that can then be used to attach the adapter to an external master clock. Unlike similar adapters, this optional PCIe faceplate has a second mini-BNC connector to support daisy chaining the clock signal out of the adapter into another adapter. These Solarflare adapters include a highly precise clock chip, the Stratum 3, this ensures that time stamping is accurate to within 100 nanoseconds from the PTP master, precision time stamping is typically only available on much more expensive FPGA based adapters. Furthermore, the PTP license enables time stamping for the capture of both received and transmitted packets, so you can use it to measure application performance. Additionally, Solarflare’s 100 nanosecond precision is 15X more precise than a competing adapter at a similar price point that only captures and time stamps inbound packets.

So if you’re looking to get into packet capture for security monitoring or performance analysis, please consider contacting Solarflare, and ask about their SFN7122F with SolarCapture Live. You’ll be pleasantly surprised at how well it performs when compared to the much more expensive FPGA based solutions which sell for 5X or more the price of this unique bundle.

Turkey Time, a Watch, and Accuracy

This article was originally posted in November of 2012 on 10GbE.net

While waiting on the turkey yesterday I was flipping through the latest issue of Wired and stumbled across the new Seiko Astron watch, and my inner nerd started to swoon.  Now for the few of you out there who don’t get Wired, especially the December issue, think of it as the geek version of the old Sears Wishbook. This time of year every tenth ad in the magazine is a high-end watch, its geek meets chic.  Among all the fancy watch ads, here was both an article and an ad for a Seiko, the brand my dad wore.  Dad worked outdoors every day of his life, never used a computer & swore by his Seiko, he considered it the working man’s watch. So it was kinda funny seeing Seiko among ads for all the other high-end brands from Rolex on down.

To be sure we’re all on the same page let’s first take a moment, and define accuracy.  Simply put accuracy, when talking about time, is the average deviation from the reference time.  Today for high precision instruments accuracy is often measured in nanoseconds (1x10E-9 or billionths of a second) lost, or gained each second or day. With 86,400 seconds in a day sometimes it’s easier to use a day when dealing with really small numbers. National and international reference clocks use the excitation of Cesium atoms by microwaves then they measuring the frequency of the resulting emitted photons as the electrons jump energy states.  This process is so repeatable, that it was made the international standard for time keeping over 50 years ago.
 
So what about this Seiko arose the geek inside me? Active GPS synchronization to the local time zone, and an understanding of all 39 time zones world wide.  This watch figures out where you are on the planet then selects the appropriate time zone and resets itself to local time, all for only $2,300.  My smartphone has been doing the same thing since they first arrived, but that’s a different story.  Seiko also claims the Astron has an accuracy of 1 second every 100,000 years, or 27 nanoseconds/day. By watch standards, this is very accurate.
 
So how does the Astron stack up to some real world high precision clocks?  Clock systems used in electronic financial markets typically use highly accurate clocks (1 picosecond/day internally) that are even more precise than Cesium clocks. To be in-step with the rest of the world though they must rely on our less accurate GPS system (10 nanoseconds/second), and often a pulse per second distribution mechanism which reduces this further to 25 nanoseconds/second (2 milliseconds/day).  As mentioned above commonly used Cesium clocks are accurate to 1 second every 1,400,000 years or 2 nanoseconds/day. A new proposed standard clock would excite neutrons instead of electrons and thus be even more accurate, 1/20th of second every 14,000,000,000 years. Well, there’s the oven timer, it’s accurate to a minute every six months when I reset it, the turkey’s done. Happy belated Thanksgiving everyone…

GPS Jamming and Spoofing

This article was originally published in August of 2012 on 10GbE.net.atomicclock

Can someone use GPS Jamming or Spoofing to game the markets of the world in such a way that their HFT shop would have a competitive advantage? We weren’t sure so we asked an expert, John Fischer the CTO of Spectracom, and leader in the field of time distribution. John said “Jamming is easy. Spoofing is hard. It can be done, but you have to be smarter than the average bear. It’s like walking on a tightrope across Niagara Falls. It can be done, but not by just anyone. And we protect against jamming with our holdover oscillator.” What brought jamming into the news recently was testimony by Dr. Todd Humphreys on July 18, 2012, before the House Subcommittee on Homeland Security on how insecure the civilian GPS system is, and that it shouldn’t be blindly trusted. Last week we were asked by a reporter about this topic. In preparing an answer we learned a number of things we think should be shared. First, let’s clarify what we mean by GPS jamming and spoofing.

There are two satellite GPS signals that are commonly available: the insecure consumer signal and the highly encrypted military signal. This whole entry will ONLY cover the consumer signal. Jamming actually isn’t that difficult, you just need to know the frequency range of the GPS signal and reproduce one that is substantially stronger than that received from these satellites. Since the rings of GPS satellites circling the earth are in orbits 12,600 miles overhead, a local jammer hidden in someone’s pocket could easily overpower something so distant. A Google search today revealed that for as little as $32 US you can buy a GPS jammer, and the more you spend the better the jammer.

The second concept is Spoofing. Here one transmits a counterfeit signal where the time contained within the data has been artificially altered. Dr. Humphrey’s team at UT Austin with a budget of under $1,000 successfully spoofed a GPS signal sufficiently enough to PWN (take control) a UAV helicopter drone. Dr. Humphreys used this demo to point out that the civilian band of the GPS signal is transmitted in the clear and should not be blindly trusted, and in fact, if you’re intelligent enough you could replace the signal with your own. His team altered the signal sufficiently enough to drive the drone into the dirt. Now note this was a drone using the consumer GPS signal (not the military one).

In HFT most shops use a GPS signal provided by the exchange. They then bring this in and connect it to their own clock. The signal from this clock is then distributed to all their trading systems. The clock here is the key. What Dr. Humphrey’s didn’t address in his testimony (section 4.3 Banking and Finance) is that these clocks add a layer of hardware which has built in checks and balances. In the presence of a lost or jammed GPS signal, these clocks by design go into a free-run mode where their own internally oscillator (often Rubidium based) takes over to provide accurate time. These internal oscillators typically drift less than one microsecond a week.

By design, these clocks have two defenses against spoofing. Both defenses are built on the clock’s own internally reference oscillator. Dr. Humphrey’s implied that these clocks typically drift 1/10 of a microsecond per second, which I’m told is true for a software only based clocking system, this means potentially 60,000 microseconds a week. Contrast this to the internal hardware oscillator in these clocks which drift only 1 microsecond a week, and the problem Dr. Humphrey’s outlines disappear. First, if the GPS signals don’t align, within predefined tolerances, to the internal reference oscillator they are ignored and the clock goes into a free-run mode. This would be a defense against an attempt to dramatically shift time forwards or backward. Second, if the GPS signals were altered very subtly over time I’m told that it is possible that this change might not be detected. The change would have to be made extremely slowly over a long period of time, but it is possible although unlikely. Suppose someone was slowing down GPS time, as these changes are compounded they could eventually exceed a threshold when a periodic check is made that compares them to the internal oscillator and once again the clock would go into a free-run mode. Although if the change were small enough it could continue to slip through.

So the presence of an accurate oscillator in one’s clock, combined with a rigorous internal process for comparing that oscillator to both the inbound GPS signals and periodically double checking over time removes the issue of blindly trusting the insecure consumer GPS system from our market trading systems.