Henry Ford is quoted as having once said “If I had asked people what they wanted, they would have said faster horses.” Not all innovations are as ground breaking as the automobile, but when one approaches an old problem with both a new strategy and improved technology great things can happen. In June Solarflare released an update to OpenOnload (OOL) that introduced TCP Delegated Send, and in late September they will begin shipping the latest version of their Application Onload Engine (AOE) with an advanced FPGA. The combination of these two will result in the capability of turning around a TCP Send in 300ns, compared to roughly 1,700ns today. In latency focused applications a savings of 1,400ns, an 82% improvement, is game changing. To understand how Solarflare pulls this off let’s look at a much simpler example.
My son uses an online exchange to trade Magic cards, and traders are rated on how fast they fill orders. Not much different than NASDAQ processing an order for Apple stock. When my son started he would receive an order on his computer, and search through a heap of cards to find the one necessary to fill that order. He would then go down several flights of stairs to my office to fetch an envelope, and stamp then goes back up to his computer. Next, he would address the envelope, apply the stamp, run back down the stairs and walk the completed trade to the mailbox. Today he has a cache of pre-stamped envelopes with the return addresses pre-written out sitting beside his computer. All his cards are in a binder with an updated index. Filling a trade is a trivial matter. He simply checks the index, pulls the required card from the binder, updates the index, stuffs the card in an envelope, writes the final address on the front, and runs it out to the mailbox. Essentially, this is a Delegated Send. Everything that can be preprocessed in advance of the actual trade is prefetched & prepackaged.
When it comes to TCP and Delegated Send, at the start of the trading day the trading application, through OOL, establishes a TCP connection with the exchange. The trading application then calls a routine in OOL to take over control of the socket’s send path, and to obtain the Ethernet, IP and TCP headers for the connection. The application adds to these a message template and passes the resulting packet template to the FPGA where it remains cached, much like my son’s stack of pre-stamped envelopes. In response to incoming packets arriving at the FPGA causing the RTL trading code to trigger a trade, the trade is then inserted into the pre-formatted packet, the checksum computed, and packet transferred to the exchange. The whole process takes approximately 300ns. When the ACK arrives from the exchange it is then passed transparently back to the trading application through OOL. Now some will point out that other FPGA solutions exist today that enables you to possibly trade at these speeds, but do any of these solutions make it this simple? With some minor modifications to your existing trading application you can quickly take advantage of Delegated Send with the AOE, no other FPGA solution even comes close!
So if latency in trading is important to you, and you’d like your orders moving along 1,000ns faster then perhaps it’s time to take a serious look at Delegated Send on Solarflare’s AOE. To learn more please consider checking out this whitepaper.
For those already familiar with Solarflare’s AOE, this new version of the product has several very substantial improvements:
- It leverages the latest Solarflare high-performance ASIC with a PCIe Gen3 interface.
- Flexible, open choice of FPGA PCS/MAC.
- All FDK modules, and examples delivered with full source code.
- Sample implementations of 10GbE & 40GbE pass-through (requires PCS/MAC).
- Sample implementations of all four 1600MHz DDR3 AOE memory channels.