Analog Time in a Digital World

1880s Self-Winding Clock

Imagine buying a product today that your family would cherish and still be using in 2160? I’m not talking about a piece of quality furniture or artwork, but an analog clock with some parts that are in continuous motion. This past weekend I once again got my grandfather clock, pictured to the right, functioning. This is a pendulum clock initially made for and installed at the NY Stock Exchange, then moved into a law office, and later a private residence. I inherited it some forty years ago, before becoming a teen, and it has operated a few times since it was placed in my care. This timepiece was manufactured in the 1880s, and it is a self-winding, battery-powered unit. Batteries were a very new technology in 1880. This clock shipped with two wet cells that the new owner then had to set up. The instructions called for pouring powered Sulfuric Acid from paper envelopes into each of two glass bottles, adding water, then stirring. The lids of the glass bottles contained the anode and cathode of the cell. The two cells were then wired in series and produced a three-volt battery.

When it was installed at the Exchange, around the time of Thomas Edison, it was modified with the addition of a red button on the side designed to synchronize the clock with the others on the Exchange. The button on this clock, and all others like it at the Exchange, would be pressed on the hour, prior to the opening of the market. It has since been rewired, so the button triggers an out of sync winding. During my childhood, this clock ran for a few years until it fell silent as a result of dead batteries. Over my adult life, it has run continuously several times, often only for a year or two at a stretch until the batteries were depleted. The issue, more often than not, was simply access to replacement batteries. In the 1950s, the batteries this clock required, a pair of dry-cell No. 6, were available in most hardware stores. Many devices were designed to use this No.6 cell, including some of the earliest automobiles, early in the 1900s it was a very popular power source.

We moved recently, and one of my wife’s conditions on hanging my grandfather clock in the living room was that it function. In 2005 after an earlier move from California to North Carolina, we hired a local clock repairman to restore this family timepiece. He cleaned out the old lubrication, replaced a few worn parts, hung the clock and sold us his last pair of original No. 6 dry-cell batteries. For those not familiar with the No. 6 it was a 1.5-volt battery the size of a large can of beer, but the standout attribute of this battery was that it could provide high instantaneous current for a brief period of time. In the 1990s, No. 6 cells were banned in the US because they used Mercury. The replacements offered had the same size and fit, but couldn’t produce the required instantaneous current.

Last week after some research, and a little math, I realized that four, dual D-cell battery boxes connected via terminal strips to limit current loss, could produce about 30% more instantaneous current at three volts than the original pair of No. 6 cells. So, I glued the boxes together to form a maintainable brick, added two five terminal strips, one positive and one negative, then tested all the wiring and batteries. After rehanging the clock, leveling the case, installing the new battery box, and pressing the wind button, I raised the pendulum and let it go. The escapement rocked back and forth, enabling the secondhand gear to creep ahead one tooth at a time, but after ten minutes, the clock fell silent once again.

Several more attempts, each roughly ten minutes, and the internal switch eventually kicked in, and the batteries did their job. The clock wound automatically for the first time in well over a decade; I was elated. Alas though another few minutes later, the clock came to rest once more. After some additional research into how the clock was losing energy, I came across a few suggestions. The hands were placed correctly, secured, and didn’t touch anything. I shoved my iPhone camera into the side of the case to get a view of the pendulum hanging on the escapement and found that it was hanging a bit askew. I then sprayed a small amount of synthetic lubricant on the escapement, crossed my fingers, and gave the pendulum another nudge. Fifteen minutes later, it coasted to a halt. Tinkering a few more times with the pendulum, over the next few hours, and a bit more lubricant, the clock was eventually sustaining movement. That was Sunday, it’s now Tuesday night, and the clock is going strong and hasn’t stopped since. As of this morning, the clock was losing about three minutes every twenty-four hours.

Now the chase is one to improve the accuracy. This is done by changing the pendulum length through a nut below the pendulum’s bob. If you loosen the nut, it lowers the bob making the pendulum longer, and the clock runs slower. Tighten the nut, and the pendulum is shorter, and the clock runs faster. Perhaps a successive series of turns over the next few days will get this 140-year-old device down to a few seconds a day!

Update Friday, June 12, after some very gentle tightening of the pendulum, thereby shortening its length and speeding up its swing, we’re down from three minutes to 61 seconds a day. There may be a few threads left, so hopefully, we can get this loss down to less than 30 seconds a day.

x86 Has Hit the Wall, and Now Come the Accelerators

“… when you have access to the vastness of space, you realize there’s only one resource worth fighting over… even killing for: More time. Time is the single most precious commodity in the universe.”

— Kalique Abrasax, Jupiter Ascending (2015)

Computing is humanities purest quest to convert time into work. In 2000 IBM demonstrated slicing one second into 10 billion units (10GHz) and then squeezing computational work out of each unit. At the time IBM had defined a new 130-nanometer process they called “CMOS 9S“. It was planned for future generation PowerPC chips. In parallel IBM was ramping up production of the POWER4 at 1.9GHz. Now you may be asking yourself, “but wait a minute I’ve never seen any production 10GHz CPUs, especially not 20 years ago,” and you’re correct. IBM’s POWER6 was as close as we’ve gotten with one version of that chip advertised at 5GHz, and in the lab they achieved 6GHz. I’ve also heard IBM reps brag about 7GHz with POWER8 if you turn half the cores off. So why has computing hit the wall at 4-5GHz and computation not reached 10GHz over the last twenty years?

Intel explained this five years ago in the blog post, “Why has CPU frequency ceased to grow?” The problem has a name called the “conveyor level.” Imagine a CPU as a conveyor belt driven assembly line with four workstations labeled A through D. Since an assembly line is a serial process the worker at station B can’t start until the worker at station A finishes. Ideally, each station is designed to take the same amount of time to finish their work, so the following station isn’t impacted. The slowest worker then defines the speed of the conveyor on any given day. So if the most time-consuming stage in the CPU pipeline is 250 picoseconds, then the clock frequency is 4GHz. There is also the issue of heat.

As an electron races through a computer circuit, it experiences a form of friction, known as resistance. Just like rubbing your hands together on a cold day produces heat, so does an electron zipping through a computer circuit. When designing any chip heat is the enemy. The smaller the chip geometry, today its seven nanometers, the more devices you can pack into a given space on a chip. More devices mean more heat. That same square centimeter of space at 7nm still has the same thermal limitations it did at 130nm 20 years ago. Sure we can use fancy liquid systems to rapidly wick heat away from the chip, instead of relying on airflow over an area limited heat sink, but at the end of the day, every watt of power the chip consumes becomes heat. Now there are individual circuits throughout the chip specifically designed to detect and respond to over-heating situations. The last thing anyone wants is a smoldering piece of silicon where their CPU once was. In the 7GHz example above, the IBM representative said that if you viewed the POWER8 chip as a big chessboard and you turned off all the CPU cores on the white squares than all the cores on the black squares could be clocked at nearly twice the speed or 7GHz. Why is this interesting?

For some computational problems its much better to have two consecutive computations in the same unit of time than two unrelated ones. Electronic trading, also known as high-frequency trading (HFT) is the premier market-driven problem that benefits most from increasing clock frequency. Traders often ascribe a dollar value to a millionth of a second, and it varies from market to market based on the rules and volumes of each market. In the end, though it always boils down to the trader’s speed and response to a market signal. If I’m faster than you at making the right decision, then I win the business and book the profit. Sticking with HFT, where do accelerators fit in?

Traders lease connections to exchanges. The closer and faster they can respond to signals from those connections, the more competitive they will be. Suppose my trading platform requires signals from the market to travel through my server, then another switch on my private network, back through a second server, then finally out to the market. The networking alone, even with kernel bypass through two servers and a switch could easily be several microseconds. Add a few more microseconds for trading logic in both servers, and you could be looking at almost ten microseconds to submit a trade in response to a signal. Two years ago Solarflare with LDA Technologies demonstrated 98 nanoseconds tick to trade. This was using accelerator technology and compared to the trading platform mentioned above; it is three orders of magnitude faster. That’s the difference between walking from NYC to LAX versus flying at Mach 5 and arriving in an hour. Time matters and acceleration is not just for HFTs anymore. Why do you think Google bought Myricom, Amazon picked up Annapurna Labs, Nvidia purchased Mellanox, or Xilinx acquired Solarflare?

Please stay tuned, more to come in part two. In the meantime feel free to check out previous articles on this topic:

99.99999% Available + 2.7us = 1 Awesome Computer

What do you get when you put together a pair of dual socket servers running in hardware lock-step with a pair of leading edge, ultra-low latency OS Bypass network adapters all running RedHat Enterprise Linux? One awesome 24 core system that boasts 99.99999% uptime, zero jitter, 2.7 micro seconds of 1/2 round trip UDP latency, and 2.9 microseconds for TCP.

How is this possible? First, we’ll cover what Stratus Technologies has done with Lock-Step, and how it makes the ftServer dramatically different than all others. Then we’ll explain what jitter is, and why removing it is so critical for deterministic systems like financial trading. Finally, we’ll cover these impressive Solarflare ultra-low latency numbers, and what they really mean.

We’ve all bought something with a credit card, flown through Chicago O’hare, used public utilities, and possibly even called 9-1-1. What you don’t know is that very often at the heart of each of these systems is a Stratus server. Stratus should adopt the old Timex slogan “It takes a licking and keeps on ticking” because that’s what it means to provide 99.99999% up time, you’re allowed three seconds a year for unplanned outages. Three seconds is how long it takes me to say “99.99999% up time.” How is this possible? Imagine running a three legged race with a friend. Ideally, if you each compared your actions continuously with every step you could run the race at the pace of the slowest of the two of you. This is the key concept behind Lock-Step, comparing, then knowing what to do as one starts to stumble to ensure the team continues moving forward no matter what happens. Stratus leverages the latest 12-core Intel Haswell E5-2670v3 server processors with support for up to 512GB of DDR4. If any hardware component in the server fails, the system as a whole continues moving forward, alerts an admin who then replaces the failed component, then that subsystem is brought back online. I challenge you to find another computer in your life that has ever offered that level of availability over the typical 5-7 year lifecycle that Stratus servers often see.

So what is Jitter? When a computer core becomes distracted from doing its primary task to go off and do some routine house keeping (operating system or hardware driven), the impact of that temporary distraction is known as Jitter. With normal computing tasks, Jitter is hardly noticeable, it’s the computer equivalent of background noise. With certain VERY time critical computing tasks though, like say financial trading, even one Jitter event could be devastating. Suppose your server’s primary function is financial trading, and it receives a signal from market A that someone wants to buy IBM at $100, and on market B it sees a second signal that another entity wishes to sell IBM at $99. So the trading algorithm on your server buys the stock on B for $99, but then the instant it has confirmation of your purchase a thermal sensor in your server generates an interrupt. The CPU then that is running your trading algorithm goes off to service that interrupt which results in it running some code to determine which fan to turn on. Eventually, say a millisecond or so later, control is returned to your trading algorithm, but by then the buyer on market A is gone, and the new price of IBM has fallen to $99. That’s the impact of Jitter, brief often totally random moments in the trading day stolen to do basic house keeping. These stolen moments can quickly add up for traders, and for exchanges, they can be devastating. Imagine a delayed order as a result of Jitter missing an opportunity! Stratus Technologies has crawled through their server architecture and eliminated all potential sources of Jitter. Traders & exchanges using other platforms have to do all this by hand, and this is still as much art as it is science. That’s one reason why over 1,400 different customers regularly depend on Solarflare.

Finally, there’s ultra-low latency networking via generic TCP/IP and UDP networking. In the diagram below network latency is in blue. Market data arrives via UDP and orders are placed through the more reliable TCP/IP protocol. Here is a quick anatomy of part of the trading process showing one UDP receive and one TCP send. There are other components, but this is a distilled example.

Initially, the packet is received in from the wire, the light blue block, and the packet passes through the physical interface, electrical networking signals are converted to layer-2 logical bits. From there the packet is passed to the on-chip layer-2 switch which steers the packet to one of 2,048 virtualized NICs (vNIC) instances, also on the chip. The VNIC then uses DMA to transfer the packet into system memory, all of which takes 500 nanoseconds. The packet has now left the network adapter and is on its way to a communications stack somewhere in system memory, the dark blue box. Here is where Solarflare shines. In the top timeline, the dark blue box represents their host kernel device driver and the Linux communications stack. Solarflare’s kernel device driver is arguably one of the fastest in the industry, but most of this dark blue box is time spent working with the kernel. There are CPU task switches, and several memory copies of the packet, as it moves through the system, and thousands of CPU instructions are executed, all told this can be nearly 3,000 nanoseconds. In the bottom timeline, the packet is DMA’d directly into user-space where Solarflare’s very tight user space stack sits. This is where the packet is quickly processed and handed off to the end user application via the traditional sockets interface. All without additional data copies, and CPU task switches, and completed in just under 1,000 nano seconds a savings of about 2,000 nanoseconds or roughly 4,600 CPU instructions for this processor at this speed. All this, and we’ve just received a packet into our application, represented by the green blocks.
So in the two bars above the first represents market data coming in via Solarflare’s generic kernel device driver than going through the normal Linux stack until the packet is handed off to the application. The response packet, in this case, a trade via TCP, is sent back through the stack to the network adapter and eventually put on the wire, all told just over 9,000 nanoseconds. With Stratus & Solarflare the second bar shows the latency of the same transaction, but traveling through Solarflare’s OS Bypass stack in both directions, the difference here is that the transaction hits the exchange over 4,000 nanoseconds sooner. This means you can trade at nearly twice the speed, a true competitive advantage. Now four millionths of a second aren’t something humans can easily grasp, so let’s jump to light speed, this is how long it takes a photon of light to cover nearly a mile.
So if you’re looking to build a financial trading system with ultra-high availability, zero jitter & extreme network performance, you have only one choice Stratus’s new ftServer.

A 10GbE Capture Platform: Snort, Bro, Suricata & Wireshark

Perhaps you’re responsible for your companies network security, or maybe you’re designing an appliance for your business?  If so, then you’ve likely already become familiar with SnortBro,  Suricata, and Wireshark. As you may have recently discovered for real performance with these applications at 10GbE speeds you need the proper adapter, and capture driver or you risk dropping vast numbers of packets. Furthermore, it is now possible to not only capture a copy of the packets received on the server but also transmitted. All while running your programs on other cores within the same server. Furthermore, if you’re interested in capturing all the received, transmitted, and virtual machine (VM) to VM traffic within your server then you can actually designate one VM to capture a copy of all the network traffic for analysis. To further sweeten things, this can also be done from within a Docker container built to handle capture.

Some might ask why you would want to capture transmitted packets and run them through Snort, Bro or Suricata? Simple, to look for outbound traffic patterns that might indicate a breach. Perhaps a VM on one of your servers has been compromised, and it is sending out your companies precious AutoCAD files in the middle of the night to a country in Asia you don’t do business with. If you’re not looking at transmitted packets you may never detect, or stop a breach of this nature. Setting up rules to look for file transfers of specific types, during specific times, or conforming to other criteria specific to outbound traffic is a fairly new trend. Also, this capture doesn’t have to be packets on your server, you can take a more traditional approach and dedicate a server for capture in every rack, then use an optical tap or the spanning port off a switch. In fact, you can install multiple adapters and aggregate the ports together until you hit the performance limits of your system.

Most accelerated 10G capture platforms require both a performance adapter and a special purpose capture driver.  Furthermore to capture both received and transmitted packets in parallel you have only one choice, and that is an adapter and software from Solarflare. You can start with Solarflare’s Flareon Ultra SFN7122F adapter and a SolarCapture Live license, and as your needs grow scale to their dual 40G adapter the SFN7142Q.

Solarflare provides this high-performance capture platform designed specifically for engineers looking to build leading edge security solutions. Let’s take a closer look at the adapter and software. The network server adapter, the Solarflare SFN7122F, is a board that contains one Solarflare single core Ethernet controller chip. This Ethernet controller core on this chip has multiple packet engines each dedicated to processing received or transmitted packets. This enables the SFN7122F adapter to support wire-rate lossless packet capture, even with huge bursts of the smallest sized packets (64 bytes each) on a single port. This dedication of resources enables transmitting wire-rate 64-byte packets at the same time, on the same interface, and in parallel, without impacting capture performance. Furthermore, the SFN7142Q utilizes the same Ethernet controller, but with two of these on the same chip so it can support capture on two 40G ports, or four 10G port, or wire-rate lossless capture on two 10G ports.

The next component in this platform is SolarCapture Live (SCL), which provides a complete Libpcap replacement library, and a Snort DAQ interface. This allows for two fairly seamless methods for easily connecting to Snort. If SCL is initialized in cluster mode it can spawn multiple capture instances, up to one per core, and deliver all network packets in Libpcap format spread across these cores. SCL then uses advanced receive flow steering to flow-hash the packets across all of these capture nodes within the capture cluster. Flow-hashing is the process of looking at several key fields in the packet header then always routing all the traffic from a given flow consistently to the same cluster node (core) so security applications like Snort, Suricata and Bro can always see all the given data for that specific network flow.

This Solarflare capture platform also supports an optional Solarflare Precision Time Protocol (PTP) software license that can accept an external hardware Pulse Per Second (PPS) signal (via an additional optional bracket kit) which provides the necessary mini-BNC connectors that can then be used to attach the adapter to an external master clock. Unlike similar adapters, this optional PCIe faceplate has a second mini-BNC connector to support daisy chaining the clock signal out of the adapter into another adapter. These Solarflare adapters include a highly precise clock chip, the Stratum 3, this ensures that time stamping is accurate to within 100 nanoseconds from the PTP master, precision time stamping is typically only available on much more expensive FPGA based adapters. Furthermore, the PTP license enables time stamping for the capture of both received and transmitted packets, so you can use it to measure application performance. Additionally, Solarflare’s 100 nanosecond precision is 15X more precise than a competing adapter at a similar price point that only captures and time stamps inbound packets.

So if you’re looking to get into packet capture for security monitoring or performance analysis, please consider contacting Solarflare, and ask about their SFN7122F with SolarCapture Live. You’ll be pleasantly surprised at how well it performs when compared to the much more expensive FPGA based solutions which sell for 5X or more the price of this unique bundle.

Turkey Time, a Watch, and Accuracy

This article was originally posted in November of 2012 on

While waiting on the turkey yesterday I was flipping through the latest issue of Wired and stumbled across the new Seiko Astron watch, and my inner nerd started to swoon.  Now for the few of you out there who don’t get Wired, especially the December issue, think of it as the geek version of the old Sears Wishbook. This time of year every tenth ad in the magazine is a high-end watch, its geek meets chic.  Among all the fancy watch ads, here was both an article and an ad for a Seiko, the brand my dad wore.  Dad worked outdoors every day of his life, never used a computer & swore by his Seiko, he considered it the working man’s watch. So it was kinda funny seeing Seiko among ads for all the other high-end brands from Rolex on down.

To be sure we’re all on the same page let’s first take a moment, and define accuracy.  Simply put accuracy, when talking about time, is the average deviation from the reference time.  Today for high precision instruments accuracy is often measured in nanoseconds (1x10E-9 or billionths of a second) lost, or gained each second or day. With 86,400 seconds in a day sometimes it’s easier to use a day when dealing with really small numbers. National and international reference clocks use the excitation of Cesium atoms by microwaves then they measuring the frequency of the resulting emitted photons as the electrons jump energy states.  This process is so repeatable, that it was made the international standard for time keeping over 50 years ago.
So what about this Seiko arose the geek inside me? Active GPS synchronization to the local time zone, and an understanding of all 39 time zones world wide.  This watch figures out where you are on the planet then selects the appropriate time zone and resets itself to local time, all for only $2,300.  My smartphone has been doing the same thing since they first arrived, but that’s a different story.  Seiko also claims the Astron has an accuracy of 1 second every 100,000 years, or 27 nanoseconds/day. By watch standards, this is very accurate.
So how does the Astron stack up to some real world high precision clocks?  Clock systems used in electronic financial markets typically use highly accurate clocks (1 picosecond/day internally) that are even more precise than Cesium clocks. To be in-step with the rest of the world though they must rely on our less accurate GPS system (10 nanoseconds/second), and often a pulse per second distribution mechanism which reduces this further to 25 nanoseconds/second (2 milliseconds/day).  As mentioned above commonly used Cesium clocks are accurate to 1 second every 1,400,000 years or 2 nanoseconds/day. A new proposed standard clock would excite neutrons instead of electrons and thus be even more accurate, 1/20th of second every 14,000,000,000 years. Well, there’s the oven timer, it’s accurate to a minute every six months when I reset it, the turkey’s done. Happy belated Thanksgiving everyone…

GPS Jamming and Spoofing

This article was originally published in August of 2012 on

Can someone use GPS Jamming or Spoofing to game the markets of the world in such a way that their HFT shop would have a competitive advantage? We weren’t sure so we asked an expert, John Fischer the CTO of Spectracom, and leader in the field of time distribution. John said “Jamming is easy. Spoofing is hard. It can be done, but you have to be smarter than the average bear. It’s like walking on a tightrope across Niagara Falls. It can be done, but not by just anyone. And we protect against jamming with our holdover oscillator.” What brought jamming into the news recently was testimony by Dr. Todd Humphreys on July 18, 2012, before the House Subcommittee on Homeland Security on how insecure the civilian GPS system is, and that it shouldn’t be blindly trusted. Last week we were asked by a reporter about this topic. In preparing an answer we learned a number of things we think should be shared. First, let’s clarify what we mean by GPS jamming and spoofing.

There are two satellite GPS signals that are commonly available: the insecure consumer signal and the highly encrypted military signal. This whole entry will ONLY cover the consumer signal. Jamming actually isn’t that difficult, you just need to know the frequency range of the GPS signal and reproduce one that is substantially stronger than that received from these satellites. Since the rings of GPS satellites circling the earth are in orbits 12,600 miles overhead, a local jammer hidden in someone’s pocket could easily overpower something so distant. A Google search today revealed that for as little as $32 US you can buy a GPS jammer, and the more you spend the better the jammer.

The second concept is Spoofing. Here one transmits a counterfeit signal where the time contained within the data has been artificially altered. Dr. Humphrey’s team at UT Austin with a budget of under $1,000 successfully spoofed a GPS signal sufficiently enough to PWN (take control) a UAV helicopter drone. Dr. Humphreys used this demo to point out that the civilian band of the GPS signal is transmitted in the clear and should not be blindly trusted, and in fact, if you’re intelligent enough you could replace the signal with your own. His team altered the signal sufficiently enough to drive the drone into the dirt. Now note this was a drone using the consumer GPS signal (not the military one).

In HFT most shops use a GPS signal provided by the exchange. They then bring this in and connect it to their own clock. The signal from this clock is then distributed to all their trading systems. The clock here is the key. What Dr. Humphrey’s didn’t address in his testimony (section 4.3 Banking and Finance) is that these clocks add a layer of hardware which has built in checks and balances. In the presence of a lost or jammed GPS signal, these clocks by design go into a free-run mode where their own internally oscillator (often Rubidium based) takes over to provide accurate time. These internal oscillators typically drift less than one microsecond a week.

By design, these clocks have two defenses against spoofing. Both defenses are built on the clock’s own internally reference oscillator. Dr. Humphrey’s implied that these clocks typically drift 1/10 of a microsecond per second, which I’m told is true for a software only based clocking system, this means potentially 60,000 microseconds a week. Contrast this to the internal hardware oscillator in these clocks which drift only 1 microsecond a week, and the problem Dr. Humphrey’s outlines disappear. First, if the GPS signals don’t align, within predefined tolerances, to the internal reference oscillator they are ignored and the clock goes into a free-run mode. This would be a defense against an attempt to dramatically shift time forwards or backward. Second, if the GPS signals were altered very subtly over time I’m told that it is possible that this change might not be detected. The change would have to be made extremely slowly over a long period of time, but it is possible although unlikely. Suppose someone was slowing down GPS time, as these changes are compounded they could eventually exceed a threshold when a periodic check is made that compares them to the internal oscillator and once again the clock would go into a free-run mode. Although if the change were small enough it could continue to slip through.

So the presence of an accurate oscillator in one’s clock, combined with a rigorous internal process for comparing that oscillator to both the inbound GPS signals and periodically double checking over time removes the issue of blindly trusting the insecure consumer GPS system from our market trading systems.