2600, Attacking Enterprise Networks

January 2, 2018March 28, 2018 scottcschweitzer networking, Security

Since 1984 the magazine “2600” has been the undisputed publication created by and for hackers and phone phreakers worldwide. The January 2017 cover to the right is misleading, the magazine isn’t named after the Atari system, but rather the 2600 Hz tone ATT used, that phreakers leveraged, to control long distance phone lines. Most article bylines are hacker handles, rather than proper names, as the articles themselves don’t always paint inside the numbers. In January 2017 a hacker with the handle “Daelphinux” published the first of a five-part series of articles titled “Successful Network Attacks – Phase One” and every quarter since then he’s published the next phase, with the final Phase Five hitting the streets today January 2, 2018. This collection of five articles is perhaps the most concise executive-level summary of how an attacker breaches an enterprise that I’ve read thus far. Hopefully, I won’t offend Daelphinux by attempting to summarizing the key points of his five articles in this single blog post. I strongly suggest that everyone reading this review the original text in these five issues of 2600.

Daelphinux classified a Successful Network Attack into five phases:

Reconnaissance – “Gathering as much useful information about the targets as possible.”
Scanning – “Gathering useful information about the target’s networks and any possible exploits.”
Gaining Access – “Getting into the network to be able to accomplish the attack’s goal.”
Maintaining Access – “Ensuring access to the network persists long enough to accomplish the attack’s goal.”
Covering Tracks – “Obfuscating the attacker’s presence on the network such that they cannot be traced.”

Each phase builds on the first, with Daelphinux envisioning this as a pyramid with phase one on the bottom, and each successive phase building on the prior one. Attackers will need tools and skills at each level in order to conduct a successful attack. Defenders, the enterprise admins, will also need tools and skills for several phases to detect and defend against an attack.

Phase 1 – Reconnaissance. As defined by Daelphinux the attacker seeks to gather all the raw data they can on their target before actively engaging them. The key here is in gathering only the “useful” data, as an attacker will rapidly accumulate an enormous pile of information. This information should come from a wide variety of sources including, but not limited to: deep web searches, web crawling those results, calling all the targets publicly available phone numbers to learn everything they can about the target, draining “whois” databases for all known corporate DNS assets, launching phishing and email scams at targeted employes and generic email address, real-life social engineering by making new friends, and finally dumpster diving.

All these efforts will produce a heap of information, but most of it will be useless. Here is where intelligent sifting comes in. Important information is often in handwritten notes, or cast of printouts such as printer test configuration pages, IP table listings, usernames, equipment manufacturer shipping boxes, operating system manuals, internal organizational charts & structures, and corporate policies (especially password).

Recognizing these reconnaissance efforts is the initial step toward thwarting an attack. For example, reviewing security footage looking for dumpster divers might sound trivial, but differentiating between an attacker and someone looking something to sell for their next meal can be challenging. Other activities like social engineering become far more complex to detect via video surveillance as these activities appear as background noise. While this phase might be the toughest to detect, it is the easiest to defend against. If you can cut off the flow of information outside enterprise you can seriously hamper their reconnaissance efforts. To do this you can hide the whois records, destroy printed copies of purchase orders, destroy shipping boxes used to pack new servers or appliances, destroy discarded manuals, remove and clean printer hard drives, make sure all in-house shredders use cross-cut shredding and finally burn any really sensitive info that has already been shredded. Much of this can be addressed through procedures and training.

Phase 2 – Scanning, In April Daelphinux covered the details of this phase. He said that when an attacker moves on to phase two they are committed, and unlikely to walk away. This phase is the first real step where the attacker can’t evade detection as they have to deploy active tools to electronically gather as much insight as they can. Tools like: Nmap, Nessus, Metasploit, ZAP, Xenotix or Grabber. Here they are looking to generate IP Maps, enumerate subnets, determine network speeds, the resilience of networks, open ports on clients, appliances & servers used, along with applications and the versions in production. All this scanning will provide the attacker with another huge heap of data to sift through with the eventual goal being to define the attack vectors that define the real “meat” of the attack.

This phase is the last chance for a defender to stop an attack before valuable assets are stolen. During this phase, administrators might notice network slowdowns, so on detecting these and investigating see if one IP address or a small range of addresses is touching a wide range of resources if so you are likely in the process of being scanned. Attackers will often launch the scanning phase remotely. So using network address translation (NAT) internally, next-generation firewalls, and current IPS & IDS appliances can in some cases detect this. It is always strongly recommended to be current in patching all your appliances and to follow proper admin process.

Phase 3 – Gaining Access, was published in 2600 in July of 2017. Without gaining access an attack isn’t an attack. Some of the key tools used are Metasploit and ZAP, but they will also leverage trojans, zombies, and backdoors to gain multiple toeholds into the enterprise’s network. Attackers often use remote shells instead of graphical tools, as shells often prove to be faster, more flexible and powerful.

Typically attackers operate from an endpoint that is not their intended target, for example leveraging another user’s machine to attack a server. By using another unsuspecting endpoint if it becomes compromised they can then move onto another workstation within the enterprise network. Attackers are typically interested in copying, moving, deleting and or altering files. In this article, we’re not interested in those seeking to launch a Denial of Service (DoS) attack, just those looking to exfiltrate value from the enterprise. Detecting attacks during this phase requires active monitoring, and the question is when, not if, you’ll detect an attacker. Doing active file audits, examining files that are changed outside of regularly expected intervals, watching for irregular traffic patterns, and reviewing access and error logs are all known methods of detecting an attack.

Phase 4 – Maintaining Access, was published in the October 2017 issue. This is the last phase where defending against an attack can successfully prevent data loss. Daelphinux stated that there are three things which should be kept in mind:

The Attacker has already breached the network.
The Attacker is actively attempting to achieve their goal.
Less experienced attackers tend to get overly comfortable with their success at this point.

This is the stage at which attackers are at their most vulnerable. They are moving around in plain sight within your network and can be detected if you’re looking for them. Network monitoring is the key to detection in this phase, but also, unfortunately, detection at this phase is also based on hope. You as the defender are hoping you discover them, and they are hoping their connection won’t get them detected and severed.

Skilled attackers will setup precautions to prevent this. One of the attacker’s strategies will be to take over network monitoring tools in an effort to circumvent detection. Enterprises need a heavy level of paranoia at this point to ensure that they are checking everything. Looking for the use of uncommon ports or protocols is another method for detecting attackers. Typically Intrusion Prevention (IPS) and Intrusion Detection Systems (IDS) and Security Information and Event Management Systems (SIEMS) are useful tools in ferreting out hackers. SIEMS themselves often are now targets for attackers. Multiple instances of these devices or appliances should be leveraged to thwart a takedown. Consider having these monitors watch each other, so in the event, one goes down you can use that as a potential indicator of an attack. If both or all monitors go down simultaneously then it’s very possible you’re under attack. Killing connections that meet certain criteria are vital to cutting off an attackers access, for example terminating connections that last longer than 25 minutes. Think that scene in the original “Transformers” when the general shouts “Cut the hard lines!”

Phase 5 – Covering Tracks, today January 2, 2018, we saw the latest issue of 2600 with Daelphinux final installment in the series on “Successful Network Attacks – Phase Five – Covering Tracks.” Once an attacker has reached phase five they’ve taken what they need, and now the coverup begins. Attempting to defend against this phase “is a form of damage control,” at best you’re preserving forensic evidence to hopefully reconstruct what they took after the fact. An attacker has to leave as cleanly as they entered otherwise they could dilute some of the value of what was taken. As Daelphinux points out, what good is a list of users and passwords if the passwords have all been changed. The same holds true of Malware and backdoors, you can’t sell a backdoor into an enterprise if it has been removed.

The best defense against this is redundant, and perhaps even hidden copies of logs. Attackers will often sanitize the logs they find, but they rarely go looking for additional copies of logs, especially if some effort has been made to hide and even secure them. It’s possible that if you have automated your logging such that multiple copies are generated, and accesses are tracked the attacker will notice this in phase three and just avoid it. Attackers will normally obfuscate both their IP and MAC addresses to further frustrate tracking them, often using addresses already on the network. Again, as mentioned in phase three setting connection limits, timeouts, and alerts when these are reached is often a good way to thwart or even detect an attacker. It should also be noted that attackers will often escalate their privilege on a system so they can disable logging. As Daelphinux noted disabling logging will often generate its own log event, but then after that, you won’t know what was done. Some attackers may even just erase or corrupt log files on the way out the door, a sort of “salt the earth” strategy to make determining what was taken that much more difficult. Regardless, a company will need to make some determinations on what was stolen or affected and alert their customers. A recent Ponemon report states that just cleaning up and dealing with a data breach in the US often costs companies $244/record.

Hats off to Daelphinux for authoring “Successful Network Attacks,” and “2600, The Hacker Quarterly” for publishing it.

I hope everyone has a Happy and Secure New Year.

Idea + technology + {…} = product

December 26, 2017March 22, 2018 scottcschweitzer Uncategorized

If you’ve been in this industry a while you may remember the IBM Simon or the Apple Newton, both great ideas for products, but unfortunately, the technology just wasn’t capable of fulfilling the promise that these product designers had in mind. The holidays provide a unique opportunity to reflect. They also simultaneously create an environment for an impulse buy proceeded by a pause every year to play with my kids (now 21 and 24). 2017 was no different, and so this year for the first time ever I picked up not one but three quadcopter drones. After dinner Christmas day, all three were simultaneous buzzing around our empty two car garage attempting to take down several foam rubber cubes balanced on the garage door opener return beam. Perhaps I should bound this a bit more, a week earlier I’d spent $25 on each of the kid’s drones, not knowing if they would even interested, and $50 on my own. We had a blast, and if you’ve not flown one you should really splurge and spend $50 for something like the Kidcia unit, it’s practically indestructible. On the downside, the rechargeable lithium batteries only last about eight minutes, so I strongly suggest purchasing several extra batteries and the optional controller.

During the past week since these purchases, but before flying, I’ve wondered several times why we haven’t seen life-sized quad-copter drones deployed in practical real-world applications? It turns out this problem has waited 110 years for the technology. Yes, the quadcopter or rotary wing aircraft was first conceived, designed and demonstrated, in tethered flight mode back in 1907. The moment you fly one of today’s quadcopters you quickly realize why they flew in tethered flight mode back in 1907, crashing is often synonymous with a landing. These small drones, mine has folding arms and dual hinged propellers, take an enormous beating and still continue to fly as if nothing happened. We put at least a dozen flights on each of the three drones on Christmas day, and we’ve yet to break a single propeller. Some of the newer, more costly units, now include collision avoidance, which may actually take some of the fun away. So back to the problem at hand, why has it taken the quadcopter over 110 years to gain any traction beyond concept? Five reasons stand out, all technological, that have made this invention finally possible:

Considerably computing power & sophisticated programming in a single ultra-low power chip
Six-axis solid-state motion sensors (3-axis gyroscope, 3-axis accelerometer) also on a single ultra-low power chip
Very high precision, efficient, compact lightweight electric motors
Compact highly efficient energy storage in the form of lithium batteries
Extremely low mass, highly durable, yet flexible propellers

That first tethered quadcopter back in 1907 achieved only two feet of altitude while flown by a single pilot and powered by a single motor with four pairs of propellers. Two of the pairs of propellers were counter-rotating to eliminate the effects of torque, and four men, aside from the pilot, were required to keep the craft steady. Clearly far too many dynamically changing variables for a single person to process. Today’s quadcopter drones have an onboard computer that continuously adjusts all four motors independently while measuring the motion of the craft in six axes and detecting changes in altitude (via another sensor). The result is that when a drone is properly setup it can be flown indoors and raised to any arbitrary altitude where it will remain hovering in place until the battery is exhausted. Once the pilot requests the drone move left to right, all four rotors speeds are independently adjusted via the onboard computer to keep the drone from rotating or losing altitude. Controlled flight of a rotary wing craft, whether a drone or a flying car, requires considerable sensor input, and enormous computational power.

Petroleum-powered quadcopters are available, but to overcome issues in the variations of engine speeds and latency, the time from sensor input to action, they often utilize variable pitch propellers with electronic actuators. These actuators allow for rapid, and subtle changes in propeller pitch adjusting for variable inputs from the sensors and the pilot. While gas-powered drones often provide greater thrust, for most applications modern drones are assembled using electronic motors. These electric motors are extremely efficient, respond rapidly to subtle changes in voltage by delivering predictable rotational speeds, all while being very lightweight. Coupled with highly efficient lithium batteries, these make for an ideal platform for building drones.

The final component making these drones possible are advanced plastics and carbon fiber that now provide for very light-weight propellers that can take considerable abuse without fracturing or failing. When I grew up in the late 1960s and early 70s it didn’t take much to break that rubber band powered a red plastic propeller that came with balsa wood planes of that era. Today I can crash my drone into the garage door at nearly full speed and all eight propeller blades remain scratch free.

So next time you interact with a product and wonder why it doesn’t perform to your expectations, perhaps the technology has still not caught up to the intent of the product designers. Happy Holidays.

Ultra-low Latency Networking for Windows, Is There a Need?

December 22, 2017March 22, 2018 scottcschweitzer HFT, networking

At a Flagg Management HPC on Wall Street event back in 2011, then CEO of Myricom Nan Boden, was pitching the concept of Ultra-low Latency for Windows. As a setup for this concept, she asked the audience how many used Linux for their low latency trading platforms. I was sitting in the back, but could easily see many, but not all hands were in the air. She then asked how many used Windows, and from the back, I saw a few lonely hands. Following the session, I commented on how few people were interested, and she said that was correct from my viewpoint in the rear. From her position on the stage nearly half had hands up, just in front of their bodies so few others could see. It was as if low latency trading on Windows were some dirty little secret.

Later that year in August 2011 Myricom went on to release DBL for Windows. While I remained there for another two years following this event, and handling sales for the Eastern region, all those hidden raised hands lead to very few sales, but why? Price was not the issue, DBL for Windows was extremely aggressively positioned against Linux. It wasn’t performance, while it was a measurable amount slower than Linux, it was still considerably faster than default Windows. We never were able to ferret out what the actual issue was. If you’re a Windows user craving ultra-low latency, please consider reaching out to me and sharing your story.

Thank you all for your time this year, and Happy Holidays.

Security: DARPA, HFT & Financial Markets

December 18, 2017March 22, 2018 scottcschweitzer HFT, Microsegmentation, Security

Today nearly half of all Americans are invested in the financial markets. This past October the Dow Jones posted the “Pentagon Turns to High-Speed Traders to Fortify Markets Against Cyberattack.” The reporter had talked with a number of High-Frequency Trading (HFT) shops which had consulted directly with the Defense Advanced Research Projects Administration (DARPA). The objectives of these discussions were to determine how we could fortify the US financial markets against Cyber attacks.

The reporter learned that the following possible scenarios were discussed as part of the “Financial Markets Vulnerability Project:”

Inject false information into stock data feeds
Flood the stock market with fake orders and trigger a market crash
Cripple a widely used payroll system
Credit Card Processors
Report fake news into systems used to algorithmically drive trading

While protecting the US financial markets is something we expect of our government, the markets themselves are actually already insulated from outside attackers. The first two threats in the above list are essentially the same, placing fake orders into the exchange with no intent to honor them. To connect to an exchange’s servers a trader must be a member in good standing on that exchange and pay significant connection fees for their server to participate in that exchange. Traders place a very high value on their access to each exchange, and while HFT shops may only hold a security for a few millionths of a second, they understand the long-term value of losing access to an exchange. Most HFT shops have leased many 10GbE connections on multiple exchange servers, across multiple exchanges, and big bank’s dark pool, and very often Solarflare NIC cards are on both sides of these connections. So while it is technically possible for an HFT shop to inject enormous volumes of orders into one or more exchanges, a type of Denial of Service attack, using one or more physical ports on one or more exchange servers it could quickly result in financial suicide for that the trading firm. The exchanges and the Securities and Exchange Commission (SEC) don’t take kindly to trading partners seeking to game the system. Quickly the exchanges, and soon after the SEC, would step in and shut down inappropriate activity. *It should be noted that the above image was taken on December 6, 2017, in New York City’s Times Square.

To further improve security for its trading customers later this month Solarflare will begin rolling out a beta of ServerLock™ which is a firmware update for these very same NICs powering the exchanges and HFT shops worldwide. With ServerLock™ the HFT shops and the exchanges themselves could rapidly pump the breaks on any given logical connection directly within the NIC hardware. This is the point at which DARPA and others should be interested. If the logic within the exchange were to detect and validate a threat they could then within a few millionths of a second install a filter into the NIC hardware to drop all subsequent packets from that threat. At that point, the threat would be eliminated, and it would no longer consume exchange CPU cycles. For HFT shops if they were to detect an algorithm had gone rogue they could employ ServerLock™ to physical cut a trading platform from the exchange without having to actually touch the platforms precious code. Much like throwing a cover over Schrodinger’s box, by applying the filter in the NIC hardware the trading platform itself remains intact for later investigation.

Number three on the list above is crippling a widely used payroll processor like ADP who processes payroll checks for one out of six Americans? First ADP uses at least two different networks. One permits inbound payroll data from their client companies, over the public internet via SSL secured connections, and a second which is a private Automated Clearing House (ACH) network. The ACH network is a member network connecting banks to clearinghouses like the Federal Reserve. Much like the exchanges above, being a paid member of an ACH network then attacking that same network would not be a wise move for a business. As for the public Internet-facing connections that ADP maintains, they likely are practicing the latest defense in depth technologies coupled with least privilege in an effort to avoid the issues faced earlier this year by Equifax.

Next, we have the Credit Card Processors also know as Payment Card Industry (PCI) players from Amex to Square who are fighting a never-ending battle to secure their systems against outsider threats. Much like the ACH network the PCI industry has its own collection of private networks for processing credit card transactions, ex. the Mastercard network, or Visa network, etc… These networks, like the ACH networks, are member networks, and attacking them would also be counterproductive. The world economy would likely not be in Jeopardy if at any point say the Amex or Discover networks were to stop processing credit cards for a few hours. We have seen the Internet websites of these providers, ex. Mastercard, have been targets of some of the most substantial Distributed DoS (DDoS) attacks the world has ever seen, and they’ve all faired it pretty well. Most have learned from these assaults how to further harden their networks.

Who would have thought two years ago that “Fake News” could possibly have turned the tide of a US Presidential election, or be used as a tool to dramatically shift a financial market? While at DEFCON 2015 I watched as Charlie Miller and Chris Valasek presented their now infamous hack of a Jeep Grand Cherokee. At the start of their talk, Charlie joked that had they thought the wired article would have moved Chrysler stock more than a point or two he would have partnered up with a VC to fund shorting their stock. He said that had he done that he’d now be sitting on the beach of his private island now sipping his favorite frozen drink through a straw, rather than lecturing us. Charlie explained that he expected their announcement would be similar to Google or Microsoft announcing a bug, but he was very wrong. It led to a recall of 1.4 million vehicles and the stock dropped double-digit percentage points following the story and the recall. While this was real news, it was a controlled news release from someone outside the company. They could have easily made hundreds of millions of US dollars shorting the stock. Now what most people aren’t aware of is that there are electronic news systems that some HFT algorithmic platforms are subscribed to. Some of these systems even “read” tweets from key people (ex. our president) to determine if their comments might move a particular security or market in one direction or another. Knowing this, these systems can then be gamed by issuing false stories expecting that the HFT algorithms will then “read” these stories and stock prices will move appropriately. When retractions are issued later it might also be expected that they will place orders that would also benefit from these retractions. So how do we suppress the impact of “fake news” on our financial markets?

These news services know that HFT systems trade on their output. Given that, they should be investing heavily in machine learning based systems to rapidly fact-check and score the potential truthfulness of a given story. For those stories that score beyond belief, they should then be kicked to humans for validation or potentially be delayed until they are backed up by additional sources or even held until after the US markets close to further limit their impact.

Kernel Bypass = Security Bypass

December 5, 2017March 22, 2018 scottcschweitzer Microsegmentation, networking, Security

As we move our performance focused applications to kernel bypass techniques like DPDK and Solarflare’s Onload this does not come without a price, and one component of that price is often security. When one bypasses the Linux kernel, they are also bypassing its security mechanisms (ex. XDP and NFTables, formerly IPTables). These security mechanisms have evolved over the past decade to ensure that your server doesn’t get compromised. Are they perfect no, software rarely is, but they are an excellent starting point to secure your Linux server. So as we move to kernel bypass platforms what options are available to us? We need to define lower level network security checkpoints that can be used as gatekeepers to keep the good stuff in and the bad stuff out. With one exception these are often hardware products that are managed using several different networking segmentation metaphors: micro, macro, and application which is also known as workload.

Micro-segmentation is the marketing term that has been co-opted by VMWare to represent its NSX security offering. When you’re a hypervisor company all the worlds a virtual machine (VM) so moving security into the hypervisor is a natural fit. VMWare then plays a clever trick and abstracts the physical network from the VM by installing a virtual network to which it then connects the VM. The hypervisor then works as the switch between the physical and virtual networks. To support coordinating workloads and security across multiple hypervisors running on different physical servers VMWare goes one step further and encapsulates traffic. This enables it to take traffic running on one virtual network and bridge it over the physical network to a virtual network on another host. So if your kernel bypass application can run from within a VM without having to rely on hypervisor bypass, then this model might work for you. Illumio has also attached itself to micro-segmentation, but rebranding it “smart micro-segmentation.” Our understanding is that they essentially run an agent that then programs NFTables in real time, so for kernel bypass applications this would offer no security.

Macro-segmentation, as you might guess, means creating segmented networks that span multiple external physical network devices. This is the term that Arista Networks has chosen (originally they used micro-segmentation, perhaps until VMWare stepped in). Macro-segmentation is the foundation for Arista’s CloudVision line of products. While this too does an awesome job of securing your network it doesn’t come without cost, which is complexity. CloudVision connects into VMWare NSX, OpenStack and other OVS DB based controllers to enable you to seamlessly configure various vendors hardware through a single interface. Furthermore, it comes with configuration modules called configlets for a wide variety of hardware that enables you to quickly and easily duplicate data center functions across one or more data centers. It also includes a configlet builder tool to quickly empower an administrator to craft a configlet for a device for which one does not exist.

The last solution is application or workload segmentation. In techie terms, this is five-tuple filtering and enforcement of network traffic. Which to the layperson means opening the network packet up, inspecting the protocol it uses, along with the source and destination addresses and ports. Then taking these five values and comparing them to some collection of filter tables to determine the appropriate action to take on the packet. Today this can be done by Solarflare ServerLock NICs or applications like XDP or NFTables. ServerLock NICs do this comparison in 50 to 250 nanoseconds within the firmware of the NIC itself, entirely transparent to the server the NIC is installed in. In doing it this way the process of filtering consumes no host CPU cycles, is agnostic to the OS or applications running, and it scales with every NIC card added to the server. Packets are filtered at wire-rate, 10Gbps/port, and there can be one filter table for every locally hosted IP address with a total capacity exceeding over 5,000 filters/NIC. As mentioned, all of this filtering is done in the NIC hardware without any awareness of it by the DPDK or Onload applications running above it.

So if you’re using DPDK or Onload, and the security of your application, or the data it shares, is of concern to you, then perhaps you should consider engaging with one of the vendors mentioned above.

If you’d like to learn more about ServerLock, please drop me an email.

Gone in 98 Nanoseconds

October 18, 2017March 23, 2018 scottcschweitzer HFT, networking, TCP Offload Engine

Imagine a daily race with hundreds of top fuel dragsters all lined up rumbling along in parallel waiting for the same green Christmas tree light before launching off the line. In some electronic markets, with specific products, every weekday morning this is exactly what happens. It’s a race where being the fastest is the primary attribute used to determine if you’re going to be doing business. On any given day only the top finishers are rewarded with trades. Those who transmit their first orders of the day the fastest receive a favorable position at the head of the queue and are likely to do some business that day. In this market, EVERY nanosecond (a billionth of a second) of delay matters, and can be monetized. Last week the new benchmark was set at 98 nanoseconds, plus your trading algorithm, in some cases 150 nanoseconds total tick to trade.

“Latency” is the industry term for the unavoidable network delays, and “Tick to Trade Latency” aggregates together the network travel time for a UDP market data signal to arrive at a trading system, and for that trading system to transmit a TCP order into the exchange. Last year Solarflare introduced Application Nanosecond TCP Send (ANTS) and lowered the “Tick to Trade Latency” bar to 350 Nanoseconds. ANTS executes in collaboration with Solarflare’s Application Onload Engine (AOE) based on an Altera Stratix FPGA. Solarflare further advanced this high-speed trading platform to achieve 250 Nanoseconds. Then in the spring of 2017 Solarflare collaborated with LDA Technologies. LDA brought their Lightspeed TCP cores to the table and replaced the AOE with a Xilinx FPGA board once again lowering the “Tick to Trade Latency” to 120 Nanoseconds. Now through further advances, and moving to the latest Penguin Computing Skylake computing platform, all three partners just announced a STAC-T0 qualified benchmark of 98 nanoseconds “Tick to Trade Latency!”

There was even a unique case in this STAC-T0 testing where the latency was measured at negative 68 nanoseconds, meaning that a trade could be injected into the exchange before the market data from the exchange had even been completely received. Compared to traditional trading systems which require that the whole market data network packet to be received before ANY processing can be done, these advanced FPGA systems receive the market data in the packet in four-byte chunks and can begin processing that data while it is arriving. Imagine showing up in the kitchen before you wife even finishes calling your name for dinner. There could be both good and bad side effects of such rapid action, you have a moment or two to taste a few things before the table is set, or you may get some last minute chores. The same holds true for such aggressive trading.

Last week, in a Podcast with the same name we had a discussion with Vahan Sardaryan, CEO of LDA Technologies, where we went into this in more detail.

Penguin Computing is also productizing the complete platform, including Solarflare’s ANTS technology and NIC, LDA Technologies Lightspeed TCP, along with a high-performance Xilinx FPGA to provide the Ultimate Trading Machine.

The Ultimate Trading Machine

Security Entirely Chimerical, SEC

September 24, 2017March 23, 2018 scottcschweitzer HFT, Security

On September 20th SEC Chairman Jay Claton released a “Statement on Cybersecurity.” It is an extremely dry read, but if one suffers through it they’ll find several interesting points.

“I recognize that even the most diligent cybersecurity efforts will not address all cyber risks that enterprises face. That stark reality makes adequate disclosure no less important.”

How does the SEC define “adequate disclosure?” The federal government has requirements that in some extreme breach cases require a report within one hour to DHS’s CERT. When faced with this class of breach recently it was found that the SEC waited 14 days, is this adequate disclosure? Much further down in the SEC Statement they disclosed the following.

“In August 2017, the Commission learned that an incident previously detected in 2016 may have provided the basis for illicit gain through trading. Specifically, a software vulnerability in the test filing component of our EDGAR system, which was patched promptly after discovery, was exploited and resulted in access to nonpublic information.”

So in the best case, the SEC waited only eight months to inform the public of this breach, but it could have been as much as 20 months. Unlike the publicly traded companies, the SEC regulates it isn’t legally required to tell investors or the public if it is ever breached. It is ONLY required to inform a law enforcement agency. EDGAR was also breached in 2014, but that saw little attention.

Now it’s one thing to breach an entity and remove data, but how about intentionally leaving false data behind for the purpose of capitalizing on that deposit? In at least two cases over the past few years, false business acquisition reports for Avon and the Rocky Mountain Chocolate Factory have been inserted into EDGAR. In the Avon case, the stock ran up 10 points. Does the SEC own up to these, well kinda of…

“As another example, our Division of Enforcement has investigated and filed cases against individuals who we allege placed fake SEC filings on our EDGAR system in an effort to profit from the resulting market movements.”

Ok, so EDGAR is a 30-year-old piece of swiss cheese riddled with potential attack surfaces some by design, others by just not keeping current on penetration testing of their systems. What about their physical assets?

“For example, a 2014 internal review by the SEC’s Office of Inspector General (“OIG”), an independent office within the agency, found that certain SEC laptops that may have contained nonpublic information could not be located.”

All the above quotes were from the Wednesday SEC Statement, but in a 2016 GAO report on the SEC, it stated that the SEC:

“…wasn’t always using encryption, supported software, well-tuned firewalls, and other key security tools while going about its business.”

Banking, in fact, our financial market structure as a whole is based on a singular concept, TRUST. The SEC was created in the wake of the Great Depression in 1934 as a way to restore trust in the markets. Technology savvy individuals will always attempt to exploit this trust for their own gain, it’s a part of how the game is played. In our financial system, the SEC plays the role of the gambling commission to ensure that the players, dealers, pit bosses, and the house are all working from the same set of published public rules. To his credit Chairman Clayton is working within the system in an attempt to shine daylight on an agency in trouble and out of touch with the technology driving the markets its charged with regulating. Today it is now possible to trade a stock based on a tick (a signal that something moved) within 150 billionths of a second, but it takes the SEC 1.2 million seconds (14 days) to report a serious breach of their security to law enforcement. Clearly, work remains to be done.

Equifax & Micro Segmentation

September 8, 2017March 23, 2018 scottcschweitzer Containers, Microsegmentation, networking, Security, Website

Earlier this week it was reported that an Equifax web service was hacked creating a breach that existed for about 10 weeks. During that time the attackers used that breach to drain 143 million people’s private information. The precise technical details of the breach, which Equifax claims was detected and closed on July 29, has yet to be revealed. While it says it’s seen no other criminal activity on its main services since July 29th that’s of little concern as Elvis has left the building. At 143 million that means a majority of the adults in the US have been compromised. Outside of Equifax specific code vulnerabilities or further database hardening what could Equifax have done to thwart these attackers?

Most detection and preventative countermeasures that could have minimized Equifax’s exposure employ some variation of behavior detection at one network layer. They then shunt suspect traffic to a sideband queue for further detailed human analysis. Today the marketing trend to attract Venture Capital investment is to call these behavior detection algorithms Artificial Intelligence or Machine Learning. How intelligent they are, and to what degree they learn is something for a future blog post. While at the NGINX Conference this week we saw several companies selling NGINX layer-7 (application layer) plugins which analyzed traffic prior to passing it to NGINX’s HTML code evaluation engine. These plugins receive the entire HTML request after the OS stack has assembled it from multiple network packets. They then do a rapid analysis the request to determine if it poses a threat. If not then the request is passed back to NGINX for the web application to respond to. Along the way, the plugin abstracts metadata from the request and in parallel, it shoots that up to their cloud service for further evaluation. This metadata is then compared against prior history and other real-time customer data from with similar services to extract new potential threat vectors. As they are detected rules are then pushed back down into the plugin that can be applied to future packets.

Everything discussed above is layer-7, the application layer, traffic analysis, and mitigation. What does layer-7 have to do with network micro-segmentation? Nothing, what’s mentioned above is the current prevailing wisdom instantiated in several solutions that are all the rage today. There are several problems with a layer-7 solution. First, it competes with your web application for host CPU cycles. Second, if the traffic is determined to be malicious you’ve already invested tens of thousands of CPU instructions, perhaps even in excess of one hundred thousand instructions to make this determination, all that computer time is lost once the message is dropped. Third, the attack is now deep inside your web server and whose to say the attacker hasn’t learned what he needed to move to a lower layer attack vector to evade detection. Layer-7 while convenient, easy to use, and even easier to understand is very inefficient.

So what is network micro-segmentation, and how does it fit in? Network segmentation is the act of altering the flow of traffic such that only what you want is permitted to pass. Imagine the factory that makes M&Ms. These days they use high-speed cameras and other analytics that look for deformed M&Ms and when they see one they steer it away from the packaging system. They are in fact segmenting the flow of M&Ms to ensure that only perfect candy-coated pieces ever make it into our mouths. The same is true for network traffic, segmentation is the process of only allowing network packets to flow into or out of a given device via a specific policy or set of policies. Micro-segmentation is doing that down to the application level. At Layer-3, the network layer, that means separating traffic by the source and destination network address and port, while also taking into account the protocol (this is known as “the five-tuple”, a set of five elements). When we focus on filtering traffic by network port we can say that we are doing application level filtering because ports are used to map network traffic to applications. When we also take into account the local IP address for filtering then we can also say we filter by the local container (ex. Docker) or Virtual Machine (VM) as these can often get their own local IP address. Both of these items together can really define a very specific network micro-segmentation strategy.

So now imagine a firewall inside a smart network interface card (NIC) that can filter both inbound and outbound packets using this network micro-segmentation. This is at layer-3, the Network, micro-segmentation within the smart NIC. When detection is moved into the NIC no x86 CPU cycles are consumed when evaluating the traffic, and no host resources are lost if the packet is deemed malicious and is dropped. Furthermore, if it is a malicious packet and it’s stopped by a firewall in the NIC then the threat has never entered the host CPU complex, and as such, the system’s integrity is preserved. Consider how this can improve an enterprise’s security as it scales out both with new servers, as well as adding containers and VMs. So how can this be done?

Solarflare has been shipping its 8000 line of smart NICs since June of 2016, and later this fall they will release a new firmware called ServerLock(TM). ServerLock is a first generation firewall in the smart NIC that is centrally managed. Every second it sends a summary of network flows through the NIC, in both directions, to a central ServerLock Manager system. This system then allows administrators to view these network flows graphically and easily turn them into security roles and policies that can then be deployed. Policies can then be deployed to a specific local IP address, a collection of addresses (think Docker containers or VMs) called an “IP Set”, a host or host groups. When deployed policies can be placed in Monitor or Enforce mode. Monitor Mode will allow all traffic to flow, but it will generate alerts for all traffic outside of all the defined policies for a local IP address. In Enforce mode, ONLY traffic conforming to the defined policies will be permitted. Traffic outside of those policies will generate an alert and be dropped. Once a network device begins to drop traffic on purpose we say that that device is segmenting the network. So in Enforce mode, ServerLock smart NICs will actively segment that server’s network by only passing traffic for supported applications, only those for which a policy exists. This applies to traffic in both directions, so for example, if an administrator walks into the data center, grabs a keyboard and elects to Secure Copy (SCP) a file from a database server to his workstation things will get interesting. If the ServerLock smart NIC in that database server doesn’t have a policy supporting SCP (port 22) his outbound request from that database server to his workstation will be dropped in the NIC. Likely unknown to him an alert will be generated on the central ServerLock Manager console calling out the application and both the database server and his workstation, and he’ll have some explaining to do.

ServerLock begins shipping this fall so while it’s too late for Equifax it’s not too late for the next Equifax. So how would this help moving forward? Simple, if every server, including web servers and database servers, has a ServerLock smart NIC then every second these servers would report their flow data to the central Solarflare ServerLock Manager for further analysis. Solarflare is working with Cloudwick to do real-time analysis of this layer-3 traffic so that Cloudwick can then proactively suggest in real time back to ServerLock administrators new roles and policies to proactively protect servers against all sorts of threats. More to come as this product is released.

9/11/17 Update – It was released over the weekend that Equifax is now pointing the blame at an Apache Struts module. The exact module has yet to be disclosed, but it could be any one of the following that has been previously addressed. On Saturday The Apache group replied pointing to other sources that believe it might have been caused by exploiting a remote code execution bug in their REST plugin as outlined in CVE-2017-9805. More to come.

9/12/17 Update – Alert Logic has the best analysis thus far.

Get Three Times More From NGINX

September 4, 2017March 23, 2018 scottcschweitzer networking

Recently Solarflare re-ran some tests with Nginx that measured the amount of traffic it could respond to before it started dropping requests. We then scaled up the number of cores provided to Nginx to see how additional compute resources impacted the servicing of web page requests, and this was the resulting graph:

click for larger image

As you can see from the above graph most NIC implementations require about six cores to achieve 80% wire-rate. The major difference highlighted in this graph though is that with a Solarflare adapter, and their OpenOnload OS Bypass driver (also known as UKB, Universal Kernel Bypass) they can achieve 90% wire-rate performance utilizing ONLY two cores versus six. Note that this is with Intel’s most current 10G NIC the x710.

What’s interesting here though is that OpenOnload internally can bond together up to six 10G links, before a configuration file change is required to support more. This could mean that a single 12 core server, running a single Nginx instance should be able to adequately service 90% wire-rate across all six 10G links, or theoretically 54Gbps of web page traffic. Now, of course, this is assuming everything is in memory, and the rest of the system is properly tuned. Viewed another way this is 4.5Gbps/core of web traffic serviced by Nginx running with OpenOnload on a Solarflare adapter compared to 1.4Gbps/core of web traffic with an Intel 10G NIC. This is a 3X gain in performance for Solarflare over Intel, how is the possible?

Simple, OpenOnload is a userspace stack that communicates directly with the network adapter in the most efficient manner possible to service UDP & TCP requests. The latest version of OpenOnload has also been tuned to address the C10K problem. What’s important to note, is that by bypassing the Linux OS to service these communication requests Solarflare reduces the number of kernel context switches/core, memory copies, and can more effectively utilize the processor cache. All of this translates to more available cycles for Nginx on each and every core.

To further drive this point home we did an additional test just showing the performance gains OOL delivered to Nginx on 40GbE. Here you can see that the OS limits Nginx on a 10-core system to servicing about 15Gbps. With the addition of just OpenOnload to Nginx, that number jumps to 45Gbps. Again another 3X gain in performance.

click for larger image

If you have web servers today running Nginx, and you want to give them a gargantuan boost in performance please consider Solarflare and their OpenOnload technology. Imagine taking your existing web server today which has been running on a single Intel x520 dual port 10G card, replacing that with a Solarflare SFN7122F card, installing their OpenOnload drivers and seeing a 3X boost in performance. This is a fantastic way to breathe new life into existing installed web servers. Please consider contacting Solarflare today do a 10G OpenOnload proof of concept so you can see these performance gains for yourself first hand.

R.I.P. TCP Offload Engine NICs (TOEs)

August 23, 2017March 23, 2018 scottcschweitzer HFT, networking, RDMA

Solarflare Delivers Smart NICs for the Masses: Software Definable, Ultra-Scalable, Full Network Telemetry with Built-in Firewall for True Application Segmentation, Standard Ethernet TCP/UDP Compliant

As this blog post by Michael C. Bazarewsky states, Microsoft quietly pulled support for TCP Chimney in its Windows 10 operating system. Chimney was an architecture for offloading the state and responsibility of a TCP connection to a NIC that supported it. The piece cited numerous technical issues and lack of adoption, and Michael’s analysis hits the nail on the head. Goodbye TOE NICs.

During the early years of this millennium, Silicon Valley venture capitalists dumped hundreds of millions of dollars into start-ups that would deliver the next generation of network interface cards at 10Gb/sec using TCP offload engines. Many of these companies failed under their weight of trying to develop expensive, complicated silicon that just did not work. Others received a big surprise in 2005 when Microsoft settled with Alacritech over patents they held describing Microsoft’s Chimney architecture. In a cross-license arrangement with Microsoft and Broadcom, Alacritech received many tens of millions of dollars in licensing fees. Alacritech would later get tens of millions of more fees from nearly every other NIC vendor implementing a TOE in their design. At the time, Broadcom was desperate to pave the way for their acquisition of Israeli based Siloquent. Due to server OEM pressure, the settlement was a small price to pay for the certain business Broadcom would garner from sales of the Siloquent device. At 1Gb/sec, Broadcom owned an astounding 100% of the server LAN-on-Motherboard (LOM) market, and yet their position was threatened by the onslaught of new, well-funded 10Gb start-ups.

In fact, the feature list for new “Ethernet” enhancements got so full of great ideas that most vendor’s designs relied on a complex “sea of cores” promising extreme flexibility that ultimately proved to be very difficult to qualify at the server OEMs. Any minor change to one code set would cause the entire design to fail in ways that were extremely difficult to debug, not to mention being miserably poor in performance. Most notably, Netxen, another 10Gb TOE NIC vendor, quickly failed after winning major design-ins at the three big OEMs, ultimately ending in a fire sale to Qlogic. Emulex saw the same pot of gold in its acquisition of ServerEngines.

That new impetus was a move by Cisco to introduce Fibre Channel Over Ethernet (FCoE) as a standard to converge networking and storage traffic. Cisco let Qlogic and Emulex (Q & E) inside the tent before their Unified Computing System (UCS) server introduction. But the setup took some time. It required a new set of Ethernet standards, now more commonly known as Data Center Bridging (DCB). DCB was a set of physical layer requirements that attempted to emulate the reliability of TCP by injecting wire protocols that would allow “lossless” transmission of packets. What a break for Q & E! Given the duopoly’s control over the Fibre Channel market, this would surely put both companies in the pole position to take over the Ethernet NIC market. Even Broadcom spent untold millions to develop a Fiber Channel driver that would run on their NIC.

Q & E quickly released what many called the “Frankenstein NIC,” a kluge of Applied-Specified Integrated Circuits (ASIC) designed to get a product to market even while struggling to develop a single ASIC, a skill at which neither company excelled. Barely achieving its targeted functionality, no design saw much traction. Through all of our customer interactions (over 1,650), we could find only one that had implemented FCoE. This large bank has since retracted its support for FCoE and in fact, showed a presentation slide several years ago stating they were “moving from FCoE to Ethernet,” an acknowledgment that FCoE was indeed NOT Ethernet.

In conjunction with TOEs, the industry pundits believed that RDMA (Remote Direct Memory Access) was another required feature to reduce latency, and not just for High-Frequency Trading (HFT), another acknowledgment that lowering latency was critical to the hyper-scale cloud, big data, and storage architectures. However, once again, while intellectually stimulating, using RDMA in any environment proved to be complex and simply not compatible with customers’ applications or existing infrastructures.

The latest RDMA push is to position it as the underlying fabric for Non-Volatile Memory Express (NVMeF). Why? Flash has already reduced the latency of storage access by an order of magnitude, and the next generation of flash devices will reduce latency and increase capacity even further. Whenever there’s a step function in the performance of a particular block of computer architecture, developers come up with new ways to use that capability to drive efficiencies and introduce new, and more interesting applications. Much like Moore’s Law, rotating magnetic memory is on its last legs. Several of our most significant customers have already stopped buying rotating memory in favor of Flash SSDs.

Well… here we go again. RDMA is NOT Ethernet. Despite the “fake news” about running RDMA, RoCE and iWARP on Ethernet, the largest cloud companies, and our large financial services customers have declared that they cannot and will not implement NVMeF using RDMA. It just doesn’t fit in their infrastructures or applications. They want low-latency standard Ethernet.

Since our company’s beginning, we’ve never implemented TOEs, RDMA or FCoE or any of the other great and technically sound ideas for changing Ethernet. Sticking to our guns, we decided to go directly to the market and create the pull for our products. The first market to embrace our approach was High-Frequency Trading (HFT). Over 99% of the world’s volume of Electronic trading, in all instruments, runs on our company’s NICs. Why? Customers could test and run our NICs without any application modifications or changes to their infrastructure and realize enormous benefits in latency, Jitter, message rate and robustness… it’s standard Ethernet, and our kernel bypass software has become the industry’s default standard.

It’s not that there isn’t room for innovation in server networking, it’s that you have to consider the customer’s ability to adapt and manage that change in a way that’s not inconsistent or disruptive to their infrastructure, while at the same time, delivering highly valued capabilities.

If companies are looking for innovation in server networking, they need to look for a company that can provide the following: Best-in-class PTP synchronization
Ultra-high resolution time stamps for every packet at every line rate
A method for lossless, unobtrusive, packet capture and analysis
Significant performance improvement in NGINX and LXC Containers
A firewall NIC and Application Micro-Segmentation that can control every app, VM, or container with unique security profiles
Real, extensive Software Definable Networking (SDN) without agents

In summary, while it’s taken a long time for the industry to realize its inertia, logic eventually prevailed. Today, companies can now benefit from innovations in silicon and software architecture that are in deployment and have been validated by the market. Innovative approaches such as neural-scale networking, which is designed to respond to the high-bandwidth, ultra-low-latency, hardware-based security, telemetry, and massive connectivity needs of ultra-scale computing, is likely the only strategy to achieve a next-generation cloud and data center architecture that can scale, be easily managed, and maybe most importantly secured.

— Russell Stern, CEO Solarflare