Is Serverless Also Trafficless?

Recently I read the article “Why Now Is The Time To Go Serverless” by Romi Stein, the CEO of OpenLegacy, a composable platform company. While I agree with Romi on several points he made around the importance of APIs, micro-service architectures, and cloud computing. I agree that serverless doesn’t truly mean computing without a server, but rather computing on servers owned and provisioned by major cloud providers. My main point of contention is that large businesses executing mission-critical functions in public clouds may eventually come to regret this move to a “Serverless” architecture as it may also be “Trafficless.” Recently we’ve seen a rash of colossal security vulnerabilities from companies like Solarwinds and Microsoft (Outlook Server). Events like these should make us all pause and rethink how we handle security. Threat detection, and the resulting aftermath of a breach, especially in a composable enterprise highly dependent on a public cloud infrastructure, may be impossible because key data doesn’t exist or isn’t available.

Getty Images

In a traditional on-premises environment, it is generally understood that the volume of network traffic within the enterprise is often 10X that of the traffic entering and leaving the enterprise. One of the more essential strategies for detecting a potential breach within an enterprise is to examine; hopefully, in near-real-time, both the internal and external network flows looking for irregular traffic patterns. If you are notified of a breach, an analysis of these traffic patterns is often used to confirm a breach has occurred. To service both of these tasks copies are made of network traffic in flight, its called traffic capture. The data may then be reduced and eventually shipped off to Splunk, or run through a similar tool, hopefully locally. Honestly, I was never a big fan of shipping off copies of a company’s network traffic to a third party for analysis; many of a company’s trade secrets reside in these digital breadcrumbs.

Is a serverless environment also trafficless? Of course not, that’s ridiculous, but are private cloud providers willing to, or even capable of, sharing copies of all the network traffic your serverless architecture generates? If they were, what would you do with all that data? Wait, here’s another opportunity for the public cloud guys. They could sell everyone another service that captures and analyzes all your serverless network traffic to tell you when you’ve been breached! Seriously, this is something worthy of consideration.

In Security, Hardware Trumps Software


Since the dawn of time humanity has needed to protect both people and things. Initial security methods were all “software based” in the sense that they relied on the user putting their trust in a process, people and social conventions. At first, it was cavemen hiding what they most valued, leveraging security through obscurity or they posted a trusted associate to watch the entrance. Finally, we expanded our security methods to include some form of “Keep Out” signs through writings and carvings. Then in 600BC along comes Theodorus of Samos, who invented the key. Warded locks had existed about three hundred years before Theodorus, but the “key” was just designed to bypass obstructions to its rotation making it slightly more challenging to access the hidden trip lever inside. For a Warded lock the “key” often looked like what we call a skeleton key today.

It could be argued that the lock represented our first “hardware based” security system as the user placed their trust in a physical token or key based system. Systems secured in hardware require that the user present their token in person, it is then validated, and if it passes, the security measures are removed. It should be noted that we trust this approach because it’s both the presence of the token and the accountability of a person in the vicinity who knows how to execute the exact process with the token to ensure success.

Now every system man invents can also be defeated. One of the first skills most hackers teach themselves is how to pick a lock. This allows us to dynamically replicate the function of the key using two very simple and compact tools (a torsion bar and a pick). Whenever we pick a lock we risk exposure, something we avoid at all cost, because the process of picking a lock looks visually different than that of using a key. Picking a lock using the tools mentioned above requires two hands. One provides a steady rotational force using the torsion bar. While the other manipulates the pick to raise the pins until each aligns with the cylinder and hangs up. Both hands require a very fine sense of touch, too heavy handed with the torsion bar and you can snap the last pin or two while freeing the lock. This will break it for future key users, and potentially expose your attempted tampering. Too light or heavy with the pick and you won’t feel the pins hanging up, it’s more skill than a science. The point is that while using a key takes seconds picking a lock takes much longer, somewhere between a few seconds to well over a minute, or never, depending on the complexity of the cylinder, and the person’s skill. The difference between defeating a software system and a hardware one is typically this aspect of presence. While it’s not always the case, often to defeat hardware-based systems it requires that the attacker be physically present because defeating hardware commonly requires hardware. Hackers often operate from countries far outside the reach of law enforcement, so physical presence is not an option. Attackers are driven by a risk-reward model, and showing up in person is considered very high risk, so the reward needs to be exponentially greater.

Today companies hide their most valuable assets in servers located in large secure data centers. There are plenty of excellent real-world hardware and software systems in place to ensure proper physical access to these systems. These security measures are so good that hackers rarely try to evade them because the risk of detection and capture is too high. Yet we need only look at the past month, April 2019, to see that companies like Microsoft, Starwood, Toyota, GA Tech and Questcare have all reported breaches. In Microsoft’s case, 6% of all MSN, HotMail, and Outlook accounts were breached, but they’ve not disclosed the details or the number of accounts. This is possible because attackers need to only break into a single system within the enterprise to reach the data center and establish a beachhead from which they can then land and expand. Attackers usually obtain a secure foothold through a phishing email or clickbait.

It takes only one undereducated employee to open a phishing email in outlook, launch a malicious attachment, or click on a rogue webpage link and it’s game over. Lockheed did extensive research in this area and they produced their now famous Cyber Kill Chain model. At a high level, it highlights the process by which attackers seize control of an enterprise. Anyone of these attack vectors can result in the installation of a remote access trojan (RAT) or a Zero-Day exploit that will give the attacker near unlimited access to the employee’s system. From there the attacker will seek out a poorly secured server in the office or data center to establish a beachhead from which they’ll launch their attack. The compromised employee system may not always be available, but it does makes for a great point to retreat back to in the event that the primary beachhead server system is discovered and sanitized.

Once an attacker has a foothold in the data center its game over. Very often they can easily move laterally, east-west, through the data center to other systems. The MITRE ATT&CK (Adversarial Tactics Techniques & Common Knowledge) framework, while similar to Lockheed’s approach, drills down much further. Specifically, on the lateral movement strategies, Mitre uncovered 17 different methods for compromising internal servers. This highlights the point that very few defenses exist in the traditional data center and those that do are often very well understood by attackers. These defenses are typically OS based firewalls that all seasoned hackers know how to disable. Hackers will disable logging, then tear down the firewall. They can also sometimes leverage an island hopping attack to a vendor or customer systems through private networks or gateways. Or in the case of the Starwood breach of Marriott the attackers got lucky and when their IT systems were merged so were the exploited systems. This is known as a data lemon, an acquisition that comes with infected and unsecured systems. Also, it should be noted that malicious insiders, employees that are aware of a pending termination or just seeking to augment their income, make up over 30% of the reported breaches. In this attack example, a malicious insider simply leverages their access and knowledge to drain all the value from their employer’s systems. So what hardware countermeasures can be put in place to limit east-west or lateral attacks within the data center? Today you have three hardware options to secure your data center servers against east-west attacks. We have switch access control lists (ACLs), top of rack firewalls or something uniquely innovative Solarflare’s ServerLock enabled NICs.

Often enterprises leverage ACLs in their top of rack 10/25/100G switches to protect east-west traffic within the data center. The problem with this approach is one of scale. IT teams can easily exhaust these resources when they attempt comprehensive application level segmentation at the server. These top of rack switches provide between 100 and 1,000 ACLs per port. By contrast, Solarflare’s ServerLock provides 5,000 ACLs per NIC, along with some foundational subnet level filtering.

In extreme cases, companies might leverage hardware firewalls internally to further zone off systems they are looking to secure. Here the problem is one of volume. Since these firewalls are used within the data center they will be tasked with filtering enormous amounts of network data. Typically the traffic inside a data center is 10X the traffic volume entering the data center. So for mission-critical clusters or server groups, they will demand high bandwidth, and these firewalls can become very expensive and directly impact application performance. Some of the fastest appliance-based firewalls designed to handle these kinds of high volumes are both expensive and add another 2.5 to 3.5 microseconds of latency in each direction. This means that if an intranet server were to fetch information from a database behind an internal firewall the transaction would see an additional delay of 5-6 microseconds. While this honestly doesn’t sound like much think of it like compound interest. If the transaction is simple and there’s only one request, then 5-6 microseconds will go unnoticed, but what happens when that employee’s request decomposes into hundreds or even thousands of database server calls? Delays then become seconds. By comparison, Solarflare’s ServerLock NIC based ACL approach adds only 0.25 to 0.75 microseconds of latency in each direction.

Finally, we have Solarflare’s ServerLock solution which executes entirely within the hardware of the server’s own Network Interface Card (NIC). There are NO server side services or agents, so there is no attackable software surface area of any kind. Think about that for a moment, a server-side security solution with ZERO ATTACKABLE SURFACE AREA. Once ServerLock is engaged through the binding process with a centralized ServerLock DirectorOne controller the local control plane for the NIC that manages security is torn down. This means that even if a hacker or malicious insider were to elevate their privilege to root they would NOT be able to see or affect the security settings on the NIC. ServerLock can test up to 5,000 ACLs against a network packet within the NIC in just over 250 nanoseconds. If your security policies leverage subnet wildcards the worst case latency is under 750 nanoseconds. Both inbound and outbound network traffic is checked in hardware. All of the Solarflare NICs within a data center can be managed by ServerLock DirectorOne controllers. Today a single ServerLock DirectorOne can manage up to 1,000 NICs.

ServerLock DirectorOne is a bundle of code that is delivered as an ISO image and can be installed onto a bare metal server, into a VM or a container. It is designed to manage all the ServerLock NICs within an infrastructure domain. To engage ServerLock on a system you run a simple binding process that facilitates an exchange of secrets between the DirectorOne controller and the ServerLock NIC. Once engaged the ServerLock NIC will begin sharing new network flows with the DirectorOne controller. DirectorOne provides visibility to all the network flows across all the ServerLock enabled systems within your infrastructure domain. At that point, you can then begin defining security policies and place them in compliance or enforcement mode. In compliance mode, no traffic through the NIC will be filtered, but any traffic that is not in compliance with the defined security policies for that NIC will generate alerts. Once a policy is moved into “enforcement” mode all out of policy packets will have the default action applied to them.

If you’re looking for the most secure solution to protect your companies servers you should consider Solarflare’s ServerLock. It is the most affordable, and secure way to protect your valuable corporate assets.

Visibility + Control = Orchestration

In Taekwondo to win you watch your opponent’s center of gravity (CoG), for the eyes lie. For example, if the CoG moves toward their back foot you can expect a front kick, or if it begins a slight twist without moving forward or backward then a punch from the arm in the direction of the twist is coming. These are mandatory anticipatory movements which are precursors to a pending threat. If my opponent throws a punch or launches a kick without these movements it will be ineffectual. A punch without a twist is a tap. Of course the above is no secret. Skilled attackers lead with a feint to disguise their real intent, but that’s for another time. Cybersecurity is no different, you need to detect a threat, see it, classify it, then act on it. Detecting and seeing the threat is commonly referred to as Visibility. Classifying then acting on the threat is called Orchestration.

Imagine if you could watch the CoG of every server in your data center? In cyber terms that CoG might be every data flow in/out of the server. Placing boundaries and alerts on those flows is the primary role of orchestration. Placing these boundaries is now called micro-segmentation. Recently we suggested that the New Network Edge is the server itself. Imagine if you could watch every data flow from every server, set up zero trust policies to govern in advance which flows are permitted, then the system generates alerts to security operations when other flows are attempted. With solid governance comes the capability to quarantine applications or systems that have gone rogue.  All the while all of this is done within the server’s own NICs, without any host agents or utilizing any local x86 CPU cycles, that’s Solarflare ServerLock.

Below is a screenshot of ServerLock displaying seven groups of hosts, in the dark grey bubbles, with all the flows between those hosts in red. The Database servers group is highlighted, and all the network flows for this group are shown. Note this is a demonstration network. Click on the image below to see a larger version of it. 

The New Network Edge – The Server

Today cleverly crafted spear phishing emails and drive-by downloads make it almost trivial for a determined attacker to infect a corporate workstation or laptop. Wombat’s “State of the Phish 2018” report shows that 76% of InfoSec professionals experienced phishing attacks in 2017. Malware Remote Access Toolkits (RATs) like Remcos for Windows can easily be rebuilt with a new name and bound to legitimate applications, documents or presentations. Apple Mac users, myself included, are typically a smug group when it comes to Malware so for them, there’s MacSpy which is nearly as feature rich. A good RAT assumes total control over the workstation or server on which they are installed then it leverages a secure HTTPS connection back to their command and control server. Furthermore, they employ their own proprietary encryption techniques to secure their traffic prior to HTTPS being applied. This prevents commercial outbound web proxies designed to inspect HTTPS traffic from gaining any useful insights into the toolkits nefarious activities. With the existence of sophisticated RATs, we must reconsider our view of the enterprise network. Once a laptop or workstation on the corporate network is compromised in the above fashion all the classic network defenses, firewalls, IDS, and IPS are rendered useless. These toolkits force us to reconsider that the New Network Edge is the server itself, and that requires a new layer in our Defense in Depth model.

The data on our enterprise servers are the jewels that attackers are paid a hefty sum to acquire. Whether it’s a lone hacker for hire by a competitor, a hacktivist group or a rogue nation state, there are bad actors looking to obtain your companies secrets. Normally the ONLY defenses on the corporate network between workstations and servers are the network switches and software firewalls that exist on both ends. The network switches enforce sub-networks (subnets) and virtualized local area networks (VLANs) that impose a logical structure on the physical network. Access Control Lists (ACLs) then define how traffic is routed across these logical boundaries. These ACLs are driven by the needs of the business and meant to reflect how information should flow between different parts of the enterprise. By contrast, the software firewalls on both the workstations and servers also define what is permitted to enter and leave these systems. As defenses, both these methods fall woefully short, but today they’re the last line of defense. We need something far more rigorous that can be centrally managed to defend the New Network Edge, our servers.

As a representation of the businesses processes, switch ACLs are often fairly loose when permitting systems on one network access to those on another. For example, someone on the inside sales team sitting in their cubical on their workstation has access to the Customer Relationship Management (CRM) system which resides on a server that is physically somewhere else. The workstation and server are very likely on different subnets or VLAN within the same enterprise, but ACLs exist that enable the sales person’s workstation access to customer data provided by the CRM system. Furthermore, that CRM system is actually pulling the customer data from a third system, a database server. It is possible that the CRM server and the database server may be on the same physical server, or perhaps in the same server rack, but very possibly on the same logical network. The question is, is there a logical path from the inside sales person’s workstation to the database server, the answer should be no, but guess what? It doesn’t matter. Once the inside salesperson is successfully spear fished then it’s only a matter of time before the attacker has access to the database server through the CRM server.

The attacker will first enable the keylogger, then watch the sales person’s screen to see what they are doing, harvest all their user ids and passwords, perhaps turn on the microphone and listen to their conversations, and inspect all the outgoing network connections. Next, the attacker will use what they’ve harvested and learned to begin their assault, first on the CRM server. Their goal at this point is to establish a secondary beachhead with the greatest potential reach from which to launch their primary assault while keeping the inside sales person’s workstation as their fallback position. From the CRM server, they should be able to easily access many of the generic service machines: DNS, DHCP, NTP, print, file, and database systems. The point here is that where external attackers often have to actively probe a network to see how it responds, internal RAT based attacks can passively watch and enumerate all the ports and addresses typically used. In doing so they avoid any internal dark space honeypots, tripwires, or sweep detectors. So how do we protect the New Network Edge, the server itself?

A new layer needs to be added to our defense in depth model called micro-segmentation or application segmentation. This enforces a strict set of policies on the boundary layer between the server and the network. Cisco, Arista, and other switch providers, with a switch-based view of the world, would have you believe that doing it in the switch is the best idea. VMWare, with its hypervisor view of the world, would have you believe that their new NSX product is the solution. Others like Illumio and Tuffin would have you believe that a server-based agent is the silver bullet for micro-segmentation. Then there’s Solarflare, a NIC company, with its NIC based view of the world, and its new entrant in the market called ServerLock.

Cisco sells a product called Tetration designed to orchestrate all the switches within your enterprise and provide finely grained micro-segmentation of your network traffic. It requires additional Cisco servers be installed to receive traffic flow data from all the switches, processes the data, then provides network admins with both the visibility and orchestration of the security policies across all the switches. There are several downsides to this approach, it is complex, expensive, and can very possibly be limited by the ACL storage capabilities of the top of rack switches. As we scale to 100s of VMs per system or 1,000s of containers these ACLs will likely be stretched beyond their limits.

VMWare NSX includes both an advanced virtual switch and a firewall that both require host CPU cycles to operate. Again, as we scale to 100s of VMs per system the CPU demands placed on the system by both the virtual switch and the NSX firewall will become significant, and measurable. Also, it should be noted that being an entirely software-based solution NSX has a large attackable surface area that could eventually be compromised. Especially given the Meltdown and Spectre vulnerabilities recently reported by Intel. Finally, VMWare NSX is a commercial product with a premium price tag.

This brings us to the agent-based solutions like Illumio and Tuffin. We’ll focus on Illumio which comes with two components the Policy Compute Engine (PCE) and the Virtual Enforcement Node (VEN). The marketing literature states that the VEN is attached to a workload, but it’s an agent installed on every server under Illumio’s control and it reports network traffic flow data into the PCE while also controlling the local OS software firewall. The PCE then provides visualization and a platform for orchestrating security policies. The Achilles heel of the VEN is that it’s a software agent which means that it both consumes x86 CPU cycles and provides a large attackable surface area. Large in the sense that both its agent and the OS-based firewall on which it depends can both be easily circumvented. An attacker need only escalate their privileges to root/admin to hamstring the OS firewall or disable or blind the VEN. Like VMWare NSX, Illumio and Tuffin are premium products.

Finally, we have Solarflare’s NIC based solution called ServerLock. Unlike NSX and Illumio which rely on Intel CPU cycles to handle firewall filtering, Solarflare executes its packet filtering engine entirely within the chip on the NIC. This means that when an inbound network packet is denied access and dropped it takes zero host CPU cycles, compared to the 15K plus x86 cycles required by software firewalls like NSX or IPTables. ServerLock NICs also establish a TLS-based domain of trust with a central ServerLock Manager similar to Illumio’s PCE. The ServerLock Manager receives flow data from all the ServerLock NICs under management and provides Visibility, Alerting and Policy Management. Unlike Illumio though the flow data coming from the ServerLock NICs requires no host CPU cycles to gather and transmit, these tasks are done entirely within the NIC. Furthermore, once the Solarflare NIC is bound to a ServerLock Manager the local control plane for viewing and managing the NIC’s hardware filter table is torn down so even if an application were to obtain root privilege there is no physical path to view or manage the filter table. At this point the, it is only capable of being changed from the specific ServerLock Manager to which it is bound. All of the above comes standard with new Solarflare X2 based NICs that are priced at or below competitive Intel NIC price points. ServerLock itself is enabled as an annual service sold as a site license.

So when you think of micro-segmentation would you rather it be done in hardware or software?

P.S. Someone asked why there is a link to a specific RAT or why I’ve included a link to an article about them, simple it validates that these toolkits are in-fact real, and readily accessible. For some people, threats aren’t real until they can actually see them. Also, another person asked, what if we’re using Salesforce.com, that’s ok, as an attacker instead of hitting the CRM server I’ll try the file servers, intranet websites, print servers, or whatever that inside salesperson has access to. Eventually, if I’m determined and the bounty is high enough, I’ll have access to everything.

Security: DARPA, HFT & Financial Markets

Today nearly half of all Americans are invested in the financial markets. This past October the Dow Jones posted the “Pentagon Turns to High-Speed Traders to Fortify Markets Against Cyberattack.” The reporter had talked with a number of High-Frequency Trading (HFT) shops which had consulted directly with the Defense Advanced Research Projects Administration (DARPA). The objectives of these discussions were to determine how we could fortify the US financial markets against Cyber attacks.

The reporter learned that the following possible scenarios were discussed as part of the “Financial Markets Vulnerability Project:”

  1. Inject false information into stock data feeds
  2. Flood the stock market with fake orders and trigger a market crash
  3. Cripple a widely used payroll system
  4. Credit Card Processors
  5. Report fake news into systems used to algorithmically drive trading

While protecting the US financial markets is something we expect of our government, the markets themselves are actually already insulated from outside attackers. The first two threats in the above list are essentially the same, placing fake orders into the exchange with no intent to honor them. To connect to an exchange’s servers a trader must be a member in good standing on that exchange and pay significant connection fees for their server to participate in that exchange. Traders place a very high value on their access to each exchange, and while HFT shops may only hold a security for a few millionths of a second, they understand the long-term value of losing access to an exchange. Most HFT shops have leased many 10GbE connections on multiple exchange servers, across multiple exchanges, and big bank’s dark pool, and very often Solarflare NIC cards are on both sides of these connections. So while it is technically possible for an HFT shop to inject enormous volumes of orders into one or more exchanges, a type of Denial of Service attack, using one or more physical ports on one or more exchange servers it could quickly result in financial suicide for that the trading firm. The exchanges and the Securities and Exchange Commission (SEC) don’t take kindly to trading partners seeking to game the system. Quickly the exchanges, and soon after the SEC, would step in and shut down inappropriate activity. *It should be noted that the above image was taken on December 6, 2017, in New York City’s Times Square.

To further improve security for its trading customers later this month Solarflare will begin rolling out a beta of ServerLock™ which is a firmware update for these very same NICs powering the exchanges and HFT shops worldwide. With ServerLock™ the HFT shops and the exchanges themselves could rapidly pump the breaks on any given logical connection directly within the NIC hardware.  This is the point at which DARPA and others should be interested. If the logic within the exchange were to detect and validate a threat they could then within a few millionths of a second install a filter into the NIC hardware to drop all subsequent packets from that threat. At that point, the threat would be eliminated, and it would no longer consume exchange CPU cycles. For HFT shops if they were to detect an algorithm had gone rogue they could employ ServerLock™ to physical cut a trading platform from the exchange without having to actually touch the platforms precious code. Much like throwing a cover over Schrodinger’s box, by applying the filter in the NIC hardware the trading platform itself remains intact for later investigation.

Number three on the list above is crippling a widely used payroll processor like ADP who processes payroll checks for one out of six Americans? First ADP uses at least two different networks. One permits inbound payroll data from their client companies, over the public internet via SSL secured connections, and a second which is a private Automated Clearing House (ACH) network. The ACH network is a member network connecting banks to clearinghouses like the Federal Reserve. Much like the exchanges above, being a paid member of an ACH network then attacking that same network would not be a wise move for a business. As for the public Internet-facing connections that ADP maintains, they likely are practicing the latest defense in depth technologies coupled with least privilege in an effort to avoid the issues faced earlier this year by Equifax.

Next, we have the Credit Card Processors also know as Payment Card Industry (PCI) players from Amex to Square who are fighting a never-ending battle to secure their systems against outsider threats. Much like the ACH network the PCI industry has its own collection of private networks for processing credit card transactions, ex. the Mastercard network, or Visa network, etc… These networks, like the ACH networks, are member networks, and attacking them would also be counterproductive. The world economy would likely not be in Jeopardy if at any point say the Amex or Discover networks were to stop processing credit cards for a few hours. We have seen the Internet websites of these providers, ex. Mastercard, have been targets of some of the most substantial Distributed DoS (DDoS) attacks the world has ever seen, and they’ve all faired it pretty well. Most have learned from these assaults how to further harden their networks.

Who would have thought two years ago that “Fake News” could possibly have turned the tide of a US Presidential election, or be used as a tool to dramatically shift a financial market? While at DEFCON 2015 I watched as Charlie Miller and Chris Valasek presented their now infamous hack of a Jeep Grand Cherokee. At the start of their talk, Charlie joked that had they thought the wired article would have moved Chrysler stock more than a point or two he would have partnered up with a VC to fund shorting their stock. He said that had he done that he’d now be sitting on the beach of his private island now sipping his favorite frozen drink through a straw, rather than lecturing us. Charlie explained that he expected their announcement would be similar to Google or Microsoft announcing a bug, but he was very wrong. It led to a recall of 1.4 million vehicles and the stock dropped double-digit percentage points following the story and the recall. While this was real news, it was a controlled news release from someone outside the company. They could have easily made hundreds of millions of US dollars shorting the stock. Now what most people aren’t aware of is that there are electronic news systems that some HFT algorithmic platforms are subscribed to. Some of these systems even “read” tweets from key people (ex. our president) to determine if their comments might move a particular security or market in one direction or another. Knowing this, these systems can then be gamed by issuing false stories expecting that the HFT algorithms will then “read” these stories and stock prices will move appropriately. When retractions are issued later it might also be expected that they will place orders that would also benefit from these retractions. So how do we suppress the impact of “fake news” on our financial markets?

These news services know that HFT systems trade on their output. Given that, they should be investing heavily in machine learning based systems to rapidly fact-check and score the potential truthfulness of a given story. For those stories that score beyond belief, they should then be kicked to humans for validation or potentially be delayed until they are backed up by additional sources or even held until after the US markets close to further limit their impact.

Kernel Bypass = Security Bypass

As we move our performance focused applications to kernel bypass techniques like DPDK and Solarflare’s Onload this does not come without a price, and one component of that price is often security. When one bypasses the Linux kernel, they are also bypassing its security mechanisms (ex. XDP and NFTables, formerly IPTables). These security mechanisms have evolved over the past decade to ensure that your server doesn’t get compromised. Are they perfect no, software rarely is, but they are an excellent starting point to secure your Linux server. So as we move to kernel bypass platforms what options are available to us? We need to define lower level network security checkpoints that can be used as gatekeepers to keep the good stuff in and the bad stuff out. With one exception these are often hardware products that are managed using several different networking segmentation metaphors: micro, macro, and application which is also known as workload.

Micro-segmentation is the marketing term that has been co-opted by VMWare to represent its NSX security offering. When you’re a hypervisor company all the worlds a virtual machine (VM) so moving security into the hypervisor is a natural fit. VMWare then plays a clever trick and abstracts the physical network from the VM by installing a virtual network to which it then connects the VM. The hypervisor then works as the switch between the physical and virtual networks. To support coordinating workloads and security across multiple hypervisors running on different physical servers VMWare goes one step further and encapsulates traffic. This enables it to take traffic running on one virtual network and bridge it over the physical network to a virtual network on another host. So if your kernel bypass application can run from within a VM without having to rely on hypervisor bypass, then this model might work for you. Illumio has also attached itself to micro-segmentation, but rebranding it “smart micro-segmentation.” Our understanding is that they essentially run an agent that then programs NFTables in real time, so for kernel bypass applications this would offer no security.

Macro-segmentation, as you might guess, means creating segmented networks that span multiple external physical network devices. This is the term that Arista Networks has chosen (originally they used micro-segmentation, perhaps until VMWare stepped in). Macro-segmentation is the foundation for Arista’s CloudVision line of products. While this too does an awesome job of securing your network it doesn’t come without cost, which is complexity. CloudVision connects into VMWare NSX, OpenStack and other OVS DB based controllers to enable you to seamlessly configure various vendors hardware through a single interface. Furthermore, it comes with configuration modules called configlets for a wide variety of hardware that enables you to quickly and easily duplicate data center functions across one or more data centers. It also includes a configlet builder tool to quickly empower an administrator to craft a configlet for a device for which one does not exist.

The last solution is application or workload segmentation. In techie terms, this is five-tuple filtering and enforcement of network traffic. Which to the layperson means opening the network packet up, inspecting the protocol it uses, along with the source and destination addresses and ports. Then taking these five values and comparing them to some collection of filter tables to determine the appropriate action to take on the packet. Today this can be done by Solarflare ServerLock NICs or applications like XDP or NFTables. ServerLock NICs do this comparison in 50 to 250 nanoseconds within the firmware of the NIC itself, entirely transparent to the server the NIC is installed in. In doing it this way the process of filtering consumes no host CPU cycles, is agnostic to the OS or applications running, and it scales with every NIC card added to the server. Packets are filtered at wire-rate, 10Gbps/port, and there can be one filter table for every locally hosted IP address with a total capacity exceeding over 5,000 filters/NIC. As mentioned, all of this filtering is done in the NIC hardware without any awareness of it by the DPDK or Onload applications running above it.

So if you’re using DPDK or Onload, and the security of your application, or the data it shares, is of concern to you, then perhaps you should consider engaging with one of the vendors mentioned above.

If you’d like to learn more about ServerLock, please drop me an email.

Equifax & Micro Segmentation

Earlier this week it was reported that an Equifax web service was hacked creating a breach that existed for about 10 weeks. During that time the attackers used that breach to drain 143 million people’s private information. The precise technical details of the breach, which Equifax claims was detected and closed on July 29, has yet to be revealed. While it says it’s seen no other criminal activity on its main services since July 29th that’s of little concern as Elvis has left the building. At 143 million that means a majority of the adults in the US have been compromised. Outside of Equifax specific code vulnerabilities or further database hardening what could Equifax have done to thwart these attackers?

Most detection and preventative countermeasures that could have minimized Equifax’s exposure employ some variation of behavior detection at one network layer. They then shunt suspect traffic to a sideband queue for further detailed human analysis. Today the marketing trend to attract Venture Capital investment is to call these behavior detection algorithms Artificial Intelligence or Machine Learning. How intelligent they are, and to what degree they learn is something for a future blog post. While at the NGINX Conference this week we saw several companies selling NGINX layer-7 (application layer) plugins which analyzed traffic prior to passing it to NGINX’s HTML code evaluation engine. These plugins receive the entire HTML request after the OS stack has assembled it from multiple network packets. They then do a rapid analysis the request to determine if it poses a threat. If not then the request is passed back to NGINX for the web application to respond to. Along the way, the plugin abstracts metadata from the request and in parallel, it shoots that up to their cloud service for further evaluation. This metadata is then compared against prior history and other real-time customer data from with similar services to extract new potential threat vectors. As they are detected rules are then pushed back down into the plugin that can be applied to future packets.

Everything discussed above is layer-7, the application layer, traffic analysis, and mitigation. What does layer-7 have to do with network micro-segmentation? Nothing, what’s mentioned above is the current prevailing wisdom instantiated in several solutions that are all the rage today. There are several problems with a layer-7 solution. First, it competes with your web application for host CPU cycles. Second, if the traffic is determined to be malicious you’ve already invested tens of thousands of CPU instructions, perhaps even in excess of one hundred thousand instructions to make this determination, all that computer time is lost once the message is dropped. Third, the attack is now deep inside your web server and whose to say the attacker hasn’t learned what he needed to move to a lower layer attack vector to evade detection. Layer-7 while convenient, easy to use, and even easier to understand is very inefficient.

So what is network micro-segmentation, and how does it fit in? Network segmentation is the act of altering the flow of traffic such that only what you want is permitted to pass. Imagine the factory that makes M&Ms. These days they use high-speed cameras and other analytics that look for deformed M&Ms and when they see one they steer it away from the packaging system. They are in fact segmenting the flow of M&Ms to ensure that only perfect candy-coated pieces ever make it into our mouths. The same is true for network traffic, segmentation is the process of only allowing network packets to flow into or out of a given device via a specific policy or set of policies. Micro-segmentation is doing that down to the application level. At Layer-3, the network layer, that means separating traffic by the source and destination network address and port, while also taking into account the protocol (this is known as “the five-tuple”, a set of five elements). When we focus on filtering traffic by network port we can say that we are doing application level filtering because ports are used to map network traffic to applications. When we also take into account the local IP address for filtering then we can also say we filter by the local container (ex. Docker) or Virtual Machine (VM) as these can often get their own local IP address. Both of these items together can really define a very specific network micro-segmentation strategy.

So now imagine a firewall inside a smart network interface card (NIC) that can filter both inbound and outbound packets using this network micro-segmentation. This is at layer-3, the Network, micro-segmentation within the smart NIC. When detection is moved into the NIC no x86 CPU cycles are consumed when evaluating the traffic, and no host resources are lost if the packet is deemed malicious and is dropped. Furthermore, if it is a malicious packet and it’s stopped by a firewall in the NIC then the threat has never entered the host CPU complex, and as such, the system’s integrity is preserved. Consider how this can improve an enterprise’s security as it scales out both with new servers, as well as adding containers and VMs. So how can this be done?

Solarflare has been shipping its 8000 line of smart NICs since June of 2016, and later this fall they will release a new firmware called ServerLock(TM). ServerLock is a first generation firewall in the smart NIC that is centrally managed. Every second it sends a summary of network flows through the NIC, in both directions, to a central ServerLock Manager system. This system then allows administrators to view these network flows graphically and easily turn them into security roles and policies that can then be deployed. Policies can then be deployed to a specific local IP address, a collection of addresses (think Docker containers or VMs) called an “IP Set”, a host or host groups. When deployed policies can be placed in Monitor or Enforce mode. Monitor Mode will allow all traffic to flow, but it will generate alerts for all traffic outside of all the defined policies for a local IP address. In Enforce mode, ONLY traffic conforming to the defined policies will be permitted. Traffic outside of those policies will generate an alert and be dropped. Once a network device begins to drop traffic on purpose we say that that device is segmenting the network. So in Enforce mode, ServerLock smart NICs will actively segment that server’s network by only passing traffic for supported applications, only those for which a policy exists. This applies to traffic in both directions, so for example, if an administrator walks into the data center, grabs a keyboard and elects to Secure Copy (SCP) a file from a database server to his workstation things will get interesting. If the ServerLock smart NIC in that database server doesn’t have a policy supporting SCP (port 22) his outbound request from that database server to his workstation will be dropped in the NIC. Likely unknown to him an alert will be generated on the central ServerLock Manager console calling out the application and both the database server and his workstation, and he’ll have some explaining to do.

ServerLock begins shipping this fall so while it’s too late for Equifax it’s not too late for the next Equifax. So how would this help moving forward? Simple, if every server, including web servers and database servers, has a ServerLock smart NIC then every second these servers would report their flow data to the central Solarflare ServerLock Manager for further analysis. Solarflare is working with Cloudwick to do real-time analysis of this layer-3 traffic so that Cloudwick can then proactively suggest in real time back to ServerLock administrators new roles and policies to proactively protect servers against all sorts of threats. More to come as this product is released.

9/11/17 Update – It was released over the weekend that Equifax is now pointing the blame at an Apache Struts module. The exact module has yet to be disclosed, but it could be any one of the following that has been previously addressed. On Saturday The Apache group replied pointing to other sources that believe it might have been caused by exploiting a remote code execution bug in their REST plugin as outlined in CVE-2017-9805. More to come.

9/12/17 Update – Alert Logic has the best analysis thus far.

Cloaked Data Lakes

Once Jessie James was asked why he robbed banks and answered: “Because that’s where the money is?” Today a corporation’s most valuable asset, aside from its people, is its data. For those folks who are Star Trek fans imagine if you could engage your data lake’s network cloaking device just before deployment? It would waver out of view then totally disappear from your enterprise network to all but those who are responsible for extracting value from it. Your key data scientists and applications could still see and interact with your cloaked data lakes, but to others exploring and scanning the network, it would be entirely transparent as if it were not even there.

Imagine if you will that a Klingon Bird of Prey is cloaked and patrolling the Neutral Zone. Along comes the Federation Starship Enterprise, also patrolling the Neutral Zone, but the Federation is actively scanning the quadrant. Since the Klingon ship is Cloaked the Federation can’t detect them, but the moment the Enterprises scanners pass over the Bird of Prey it automatically jumps to red alert, energizes its weapons systems and alters course to shadow the Federation ship. Imagine if the same could be true of an insider threat or an internal breach via say a phishing attack that is seeking out your companies data. The moment someone pings a system or executes a port scan of even one IP addresses of the servers within your data lake alarm bells are set off, and no reply is returned. The scanner would see no answer, and expect that nothing exists, little would they know the hell that would soon reign down on them.

Your network administrators would then be alerted that their new server orchestration system had raised an alert. They’ll quickly see that the attacker is another admin’s workstation, someone that has been suspected of being an insider threat, but they’ve been too cagey to nail down. Now it’s 9 PM at night, and he’s port scanning the exact range of internal network addresses that were set aside a week earlier for this new data lake. He then moves on to softer targets exfiltrating data from older systems. Little does he know though that every server he’s touched the past week has been tracking and reporting every network flow back to his workstation. Management was just waiting for the perfect piece of evidence and this attempted port scan, along with all the other network flows was the final straw.

His plan had been to finish out the week, then quit on Friday and sell all his companies data to its competitors. He had decided to stay on an extra two weeks when he heard they were standing up a new Hadoop cluster. He figured that it would make a juicy soft target with tons of the newest aggregated data which could be enormously valuable. What he didn’t know, because he wasn’t invited to those planning meetings, was that the cluster included a new stealth security feature from Solarflare called Active Cloaking. He also wasn’t aware that that feature was the driving reason why many of his companies servers over the past two weeks had been upgraded to new Solarflare 10GbE NICs with ServerLock.

Since he was a server administrator responsible for some of the older legacy systems he wasn’t involved in the latest round of network upgrades. While he had noticed that lately some of the newer servers were no longer accessible to him via SSH, what he wasn’t aware of was that every server he touched was now reporting his every move. What would prove even more damning though was that some of those older servers, which had been upgraded with Solarflare ServerLock enabled NICs, were left as internal SSH/SCP honeypots with old legacy data that held little if any real value, but would prove damning evidence once compromised. Tonight had proved to be his downfall, his manager, and his VP, along with building security had just entered his cubical and stated that the police were on their way.

At Black Hat last month both Solarflare and Cloudwick (CDL) demonstrated ServerLock and data lake cloaking. In September several huge enterprises will begin testing SeverLock, and if you’re an insider threat consider yourself warned!

1st Ever Firewall in a NIC

Last week at Black Hat Solarflare issued a press release debuting their SolarSecure solution, which places a firewall directly in your server’s Network Interface Card (NIC). This NIC based firewall called ServerLock, not only provides security, but it also offers agentless server-based network visibility. This visibility enables you to see all the applications running on every ServerLock enabled server. You can then quickly and easily develop security policies which can be used for compliance or enforcement. During the Black Hat show setup, we took a 10-minute break to have an on-camera interview with Security Guy Radio that covered some of the key aspects of SolarSecure.

SolarSecure has several very unique features not found in any other solution:

  • Security and visibility are entirely handled by the NIC hardware and firmware, there are NO server side software agents, and as such, the solution is entirely OS independent.
  • Once the NIC is bound to the centralized manager it begins reporting traffic flows to the manager which then represents those graphically for the admins to easily turn into security policies. Policies can be created for specific applications, enabling application level network segmentation.
  • Every NIC maintains separate firewall tables for each local IP address hosted on the NIC to avoid potential conflicts from multiple VMs or Containers sharing the same NIC.
  • Each NIC is capable of handling over 5,000 filter table rules along with another 1,000 packet counters that can be attached to rules.
  • Packets transit the rules engine between 50 and 250 nanoseconds so the latency hit is negligible.
  • The NIC filters both inbound and outbound packets. Packets which are dropped as a result of a match to a firewall rule generate an alert on the management console and inbound packets consume ZERO host CPU cycles.

Here is a brief animated explainer video which was produced prior to the show that sets up the problem and explains Solarflare’s solution. We also produced a one-minute demonstration of the management application and its capabilities.

What’s a Smart NIC?

While reading these words, it’s not just your brain doing the processing required to make this feat possible. We’ve all seen over and under exposed photos and can appreciate the decision making necessary to achieve a perfect light balanced photo. In the laboratory, we observed that the optic nerve connecting the eye to the brain is responsible for measuring the intensity of the light hitting the back of your eye. In response to this data, each optic nerve dynamically adjusts the aperture of the iris in your eye connected to this nerve to optimize these levels. For those with some photography experience, you might recall that there is a direct relationship between aperture (f-stop) and focal length. It also turns out that your optic nerve, after years of training as a child, has come to realize you’re reading text up close, so it is now also responsible for modifying the muscles around that eye to sharpen your focus on this text. All this data processing is completed before your brain has even registered the first word in the title. Imagine if your brain was responsible for processing all the data and actions that are required for your body to function properly?

Much like your optic nerve, the difference between a standard Network Interface Card (NIC) and a smart NIC is how much processing the Smart NIC offloads from the host CPU. Until recently Smart NICs were designed around Field Programmable Gate Array (FPGA) platforms costing thousands of dollars. As their name implies, FPGAs are designed to accept localized programming that can be easily updated once installed. Now a new breed of Smart NIC is emerging that while it isn’t nearly as flexible as an FPGA, they contain several sophisticated capabilities not previously found in NICs costing only a few hundred dollars. These new affordable Smart NICs can include a firewall for security, a layer 2/3 switch for traffic steering, several performance acceleration techniques, and network visibility with possibly remote management.

The firewall mentioned above filters all network packets against a table built specifically for each local Internet Protocol (IP) address under control. An application processing network traffic is required to register a numerical network port. This port then becomes the internal address to send and receive network traffic. Filtering at the application level then becomes a simple process of only permitting traffic for specific numeric network ports. The industry has labeled this “application network segmentation,” and in this instance, it is done entirely in the NIC. So How does this assist the host x86 CPU? It turns out that by the point at which operating system software filtering kicks in the host CPU has often expended over 10K CPU cycles to process a packet. If the packet is dropped the cost of that drop is 10K lost host CPU cycles. If that filtering was done in the NIC, and the packet was then dropped there would be NO host CPU impact.

Smart NICs also often have an internal switch which is used to steer packets within the server rapidly. This steering enables the NIC to move packets to and from interfaces and virtual NIC buffers which can be mapped to applications, virtual machines or containers. Efficiently steering packets is another offload method that can dramatically reduce host CPU overhead.

Improving overall server performance, often through kernel bypass, has been the providence of High-Performance Computing (HPC) for decades. Now it’s available for generic Ethernet and can be applied to existing and off the shelf applications. As an example, Solarflare has labeled its family of Kernel Bypass accelerations techniques Universal Kernel Bypass (UKB). There are two classes of traffic to accelerate: network packet and application sockets based. To speed up network packets UKB includes an implementation of the Data Plane Development Kit (DPDK) and EtherFabric VirtualInterface (EF_VI), both are designed to deliver high volumes of packets, well into the 10s of millions per second, to applications familiar with these Application Programming Interfaces (APIs). For more standard off-the-shelf applications there are several sockets based acceleration libraries included with UKB: ScaleOut Onload, Onload, and TCPDirect. While ScaleOut Onload (SOO) is free and comes with all Solarflare 8000 series NICs, Onload (OOL) and TCPDirect require an additional license as they provide micro-second and sub-microsecond 1/2 round trip network latencies. By comparison, SOO delivers 2-3 microsecond latency, but the real value proposition of SOO is the dramatic reduction in host CPU resources required to move network data. SOO is classified as “zero-copy” because network data is copied once directly into or out of your application’s buffer. SOO saves the host CPU thousands of instructions, multiple memory copies, and one or more CPU context switches, all dramatically improve application performance, often 2-3X, depending on how network intense an application is.

Finally, Smart NICs can also securely report NIC network traffic flows, and packet counts off the NIC to a centralized controller. This controller can then graphically display for network administrators everything that is going on within every server under its management. This is real enterprise visibility, and since only flow metadata and packet counts are being shipped off NIC over a secure TLS link the impact on the enterprise network is negligible. Imagine all the NICs in all your servers reporting in their traffic flows, and allowing you to manage and secure those streams in real time, with ZERO host CPU impact. That’s one Smart NIC!