Making the Fastest, Faster: Redis Performance Revisited

When you take something that is already considered to be the fastest and offer to make it another 50% faster people think you’re a liar. Those who built that fast thing couldn’t possibly have left that much slack in their design. Not every engineer is a “miracle worker” or notorious sand-bagger, like Scotty from the Star Ship Enterprise. So how is this possible?

A straightforward way to achieve such unbelievable gains is to alter the environment around how that fast thing is measured. Suppose the thing we’re discussing is Redis, an in-memory database. The engineers who wrote Redis rely on the Linux kernel for all network operations. When those Redis engineers measured the performance of their application what they didn’t know was that over 1/3 of the time a request spends in flight is consumed by the kernel, something they have no control over. What if they could regain that control?

Suppose we provided Redis’s direct access to the network. This would enable Redis to directly make calls to the network without any external software layers in the way. What sort of benefits might the Redis application see? There are three areas which would immediately see performance gains: latency, capacity, and determinism.

On the latency side, requests to the database would be processed faster. They are handled more quickly because the application is receiving data straight from the network directly into Redis’s memory without a detour through the kernel. This direct path reduces memory copies, eliminates kernel context switches, and removes other system overhead. The result is a dramatic reduction in time, and CPU cycles. Conversely, when Redis fulfills a database request, it can write that data directly to the network, again saving more time and reclaiming more CPU cycles. 

As more CPU cycles are freed up due to decreased latency, those compute resources go directly back into processing Redis database requests. When the Linux kernel is bypassed using Solarflare’s Cloud Onload Redis sees on average a 50% boost in the number of “Get” and “Set” commands it can process every second. Imagine Captain Kirk yelling down to Scotty to give him more power, and Scotty flips a switch, and instantly another 50% more power comes online, that’s Solarflare Cloud Onload. Below is a graph of the free version of Redis doing database SET commands using a single 10GbE (blue), 25GbE (green) and 100GbE (tan) port. The light versions of the lines are Redis running through the Linux kernel, and the darker lines are using Solarflare Cloud Onload, Scotty’s magic switch. Note we scaled the number of Redis instance along the X-axis from 1 to 32 (on an x86 system with 32 cores) and the Y-axis is 0-25 million requests/second.

Finally, there is the elusive attribute of determinism. While computers are great at doing a great many things, that is also what makes them less than 100% predictable. Servers often have many sensors, fans and a control system designed to keep them operating at peak efficiency. The problem is that these devices generate events that require near-immediate attention. When a thermal sensor generates an interrupt, the CPU is alerted, it pushes the current process to the stack, services the interrupt, perhaps by turning a fan on, then returns to the previous process. When the interrupt occurs, and how long it takes the CPU to service it are both variables that hamper determinism. If a typical “Get” request takes a microsecond (millionth of a second) to service, but that CPU core is called away from processing that “Get” request in the middle by an interrupt, it could be 20 to 200 microseconds before it returns. Solarflare’s Cloud Onload communications stack moves these interrupts out of the critical path of Redis, thereby restoring determinism to the application.

So, if you’re looking to improve Redis performance by 50% on average, and up to 100% under specific circumstances, please consider Solarflare’s Cloud Onload running on one of their new X2 series NICs. Solarflare’s new X2 series NICs are available for 10GbE, 25GbE and now 100GbE. Recent testing with 100GbE has shown that a single server with 32 CPU cores, running a single Redis instance per core, can process well over 20 million Redis requests per second. Soon we will be posting our Benchmarking Performance Guide and our Cloud Onload for Redis Cookbook that contains all the details. When these are available on Solarflare’s website then links will be added to this blog entry.  

*Update: Someone asked if I could clarify the graph a bit more. First, we focused our testing on both the GET and SET requests, as those are the two most common in-memory database commands. GET is simply used to fetch a value from the database while SET is used to store a value in the database, really basic stuff. Both graphs are very similar. For a single 10GbE link the size of the Redis GET and SET requests translates to about 4 million requests/second to fill the pipe. So scaling this to 25GbE means 10M req/sec and for 100GbE that means 40M req/sec.

It turns out that a quad-core server running four Redis instances can saturate a single 10GbE link, we’ve not tested multiple 10GbE links. Today the kernel appears to hit its limit at around 5M req/sec as can be seen from our 25G testing. This is in-line with testing we did over a decade ago when doing packet capture using Libpcap we noticed that the kernel had a limit at that time of around 3M packets/sec. Over the years with new Linux kernels we’ve seen that number increase, so 5M requests today is reasonable. As mentioned above the theoretical limit for 25GbE SET requests should be about 10M req/sec. Using Redis through the kernel over a 25GbE link we do in fact hit and sustain the 5M req/sec limit, regardless of how many Redis instances or CPU cores are deployed. Here is where Cloud Onload shines as it lifts that kernel limit from 5M and enables your server to service the link at its full potential 10M, note it will take you over 12 Redis instance on 12 Cores to achieve this. Any Redis instances or CPU cores beyond this will be underutilized. The most important takeaway here though is that Cloud Onload delivers a 100% capacity gain for Redis over using the kernel, so if your server has more than six cores Cloud Onload will enable you to get the full value out of them.

At 100GbE things are still not fully understood. With 25GbE we saw that the kernel hit it’s expected 5M req/sec limit, but for 100GbE testing, the kernel went well beyond this, in-fact triple this number. We have some ideas about how this is possible, but more research is required. We’re currently exploring why this is happening, and also how Cloud Onload can do even better than the nearly 25M requests/second at 100GbE measured above.

**Note: Credit to John Laroco for leading the Redis testing, and for noticing, and taking the opening picture at SJC airport earlier this month.

East West​ Threat Made Real

Raspberry Pi 3B+ With Power over Ethernet Port in Plastic Case

Many in corporate America still don’t view East-West attacks as a real, let alone a significant threat. Over the past several years while meeting with corporate customers to discuss our future security product, it wasn’t uncommon to encounter the occasional Ostrich. These are the 38% of people who responded to the June 2018 SANS Institute report stating that they’ve not yet been the victim of a breach. In security we have a saying “There are only two types of companies, those that know they’ve been breached, and those that have yet to discover it.” While this sounds somewhat flippant, it’s a cold hard fact that thieves see themselves as the predators and they view your company as the prey. Much like a pride of female lions roaming the Africa savanna for a large herd, black-hat hackers go where the money is. If your company delivers value into a worldwide market, then rest assured there is someone out there looking to make an easy buck from the efforts of your company. It could be contractors hired by a competitor or nation-state actors looking to steal your product designs, a ransomware attacker seeking to extort money, or merely a freelancer surfing for financial records to access your corporate bank account. These threats are real, and if you take a close look at the network traffic attempting to enter your enterprise, you’ll see the barbarians at your gate.

A few months back my team had placed a test server on the Internet with a single “You shouldn’t be here” web page with a previously unused, unadvertised, network address. This server had all its network ports secured in hardware so that only port 80 traffic was permitted. No data of any value existed on the system, and it wasn’t networked back into our enterprise. Within one week we’d recorded over 48,000 attempts to compromise the server. Several even leveraged a family of web exploits I’d discovered and reported back in 1997 to the Lotus Notes Domino development team (it warmed my heart to see these in the logs). This specific IP address was assigned to our company by AT&T, but it doesn’t show up in any public external registry as belonging to our company, so there was no apparent value behind it, yet 48,000 attempts were made. So what’s the gizmo in the picture above?

In the January 2019 issue of “2600 Magazine, The Hacker Quarterly” a hacker with the handle “s0ke” wrote an article entitled “A Brief Tunneling Tutorial.” In it, s0ke describes how to set up a persistent SSH tunnel to a remote box under his control using a Raspberry Pi. This then enables the attacker to access the corporate network just as if he was sitting in the office. In many ways, this exploit is similar to sending someone a phishing email that then installs a Remote Access Trojan (RAT) on their laptop or desktop, but it’s even better as the device is always on and available. Yesterday I took this one step further. Knowing that most corporate networks leverage IP Phones for flexibility and that IP Phones require Power over Ethernet (PoE), I ordered a new Raspberry Pi accessory called a Pi PoE Switch Hat. This is a simple little board that snaps onto the top of the Pi and leverages the power found on the ethernet port to power the entire server. The whole computer shown above is about the size of a pack of cigarettes with a good sized matchbook attached. When this case arrives, I’ll utilize our 3D printer to make matching black panels that will then be superglued in place to cover all the exposed ports and even the red cable. The only physically exposed port will be a short black RJ45 cable designed to plug into a power over Ethernet port and two tiny holes so light from the power and signal LEDs can escape (a tiny patch of black electrical tape will cover these once deployed). 

When the Raspberry Pi software bundle is complete and functioning correctly, as outlined in s0ke’s article, then I’ll layer in accessing my remote box via The Onion Router (Tor) and pushing my SSH tunnel out through port 80 or 443. This should make it transparent to any enterprise detection tools. Tor should mask the address of my remote box from their logs. In case my Pi is discovered I’ll also install some countermeasures to wipe it clean when a local console is attached. At this point with IT’s approval, I may briefly test it in our office to confirm its working correctly. Then it becomes a show-and-tell box, with a single powerpoint slide outlining that east-west threats are real and that a determined hacker with $100 in hardware and less than one minute of unaccompanied access in their facility can own their network. The actual hardware may be too provocative to display, so I’ll lead with the slide. If someone calls me on it though I may pull the unit out of my bag and move the discussion from the hypothetical to real. If you think this might be a bit much, I’m always open to suggestions on better ways to drive a point home, so please share your thoughts.

Raspberry Pi 3B+ with Pi PoE Switch Hat

P.S. The build is underway, the Pi and Pi PoE Switch Hat have arrived. To keep the image as flexible as possible I’ve installed generic Raspbian on an 8GB Micro-SD card. Applied all updates, and have begun putting on custom code, system generically named “printer” at this point . Also, a Power over Ethernet injector was ordered so the system could be tested in a “production like” power environment. It should be completed by the end of the month, perhaps in time for testing in my hotel during my next trip. Updated: 2019-01-20

A persistent automated SSH tunnel has been set up between the “printer” and the “dropbox” system and I’ve logged into the “printer” by connecting via “ssh -p 9091 scott@localhost” on the “dropbox,” this is very cool. There is a flaw in the Pi PoE Switch board or its set up at this point as it is pulling the power off the ethernet port, but it is NOT switching the traffic so at this point the solution utilizes two Ethernet cables, one for power and the second for the signal. This will be resolved shortly. Updated: 2019-01-23

Raspberry Pi Zero on Index Finger

But why risk the Ethernet port not being a powered Ethernet jack, and also who wants to leave behind such a cool Raspberry Pi 3B+ platform behind when something with less horsepower could easily do the job? So shortly after the above intrusion device was functional I simply moved the Micro-SD card over to a Raspberry Pi Zero. A regular SD card is shown in the picture for the purpose of scale. The Pi Zero is awesome if you require a low power small system on a chip (SoC) platform. For those not familiar with the Pi Zero it’s a $5 single core 1Ghz ARM platform that consumes on average 100mw, so it can run for days on a USB battery. Add to that a $14 Ethernet to MicroUSB dongle and again you have a single cable hacking solution that only requires a generic Ethernet port. Of course it still needs a tight black case to keep it neat, but that’s what 3D printers are for.

Pi Zero, Ethernet Dongle
& USB Battery
(SD Card for Size Comparison)

Now, this solution will burn out in a couple of days, but as a hacker if you’ve not established a solid beachhead in that time then perhaps you should consider another line of work. Some might ask why I’m telling hackers how to do this, but frankly, they’ve known for years since SoC computers first became main stream. So IT managers beware, solutions like these are more common than you think, and they are leaking into pop culture through shows like Mr. Robot. This particular show has received high marks for technical excellence, and Myth Busters would have a hard time finding a flaw. One need only rewatch Season 1 episode 5, to see how a Raspberry Pi could be used to destroy tapes in a facility like Iron Mountain. Sounds unrealistic, then you must watch this Youtube video where they validate that this specific hack is in-fact plausible. The point is no network is safe from a determined hacker, from the CAN bus in your car, to building HVAC systems, or industrial air-gapped control networks. Strong security processes and policies, strict enforcement, and honeypot detection inside the enterprise are all methods to thwart and detect skilled hackers. Updated: 2019-01-27

3D Printing, an Art Versus a Science

Creality Ender3D Printer

One of our cherished holiday traditions is to craft homemade gifts for family members. My now 22-year-old son, when he was ten used his Erector Set to make a fishing pole for my dad, complete with working reel, fishline, and lure. Dad was big into sports fishing, and although he passed back in 2012 that rod is still on display in my mom’s FL Keys home. This year my son returned from college bringing his Creality Ender3D printer, an entry-level product he purchased this fall in kit form for about $250USD. He wanted to print several gifts he’d been designing, but hadn’t had time to while in school. 3D Design and printing is something he’s been into for the past six years on various projects and thought it was about time to have one of his own. If you’re not familiar with the Ender3D it’s a fundamental design where the print surface moves in one dimension (forward and backward) and the single printer head moves in the other two directions (left or right and up or down). Both the print bed and the print head have controllable temperatures, as well as cooling fans designed to cure a print rapidly. This enables the printer to utilize the most common plastic, PLA, but also many other more exciting materials like TPU, a sort-of plastic/silicone blend which is both flexible and resilient. On the surface this sounds awesome, we can printer whatever we like from things we’ve designed to those designed by others and posted to sites like Thingverse. Who wouldn’t want this remarkable capability at home, right?

Phone Case with Un-removable Raft

It’s not so amazing though, as 3D printing is still very much an art more than it is a science. For those not familiar with this subtle distinction, when something is a science it must ALWAYS be reproducible, while when something is an Art there remain significant variables under control of the artist, and sometimes nature, which makes the process nondeterministic. Everything we print, of any size above an inch or two, often takes several attempts to produce a final product. Sometimes even that final product results in a print that is unfortunately unusable as a result of the process settings (not the design). For example last Monday morning, after two failed attempts we finally succeeded in printing a mobile phone case using TPU. The two prior attempts failed due to a break down in the adhesion of the print job to the build plate, the most common class of failure. Since this is an entry-level printer, it isn’t aware of a failure, so it continues printing until it jams or the job completes.

Epic Print Error – Ball of PLA Behind Print Head

This lack of intelligence or feedback into the system results in some epic messes. The previous Friday night we returned home to find a PLA print job had broken loose from the build plate, and the print head had shoved it off onto my desk. It was only 1/4 complete and a blob of PLA the size of a jawbreaker was riding the back side of the print head like a tic on a dog. Did I mention it was still printing, perhaps an hour or more after the failure? The phone case discussed above, which we’d been printing using TPU, finally finished successfully on the third attempt. It removed cleanly from the build surface. To increase our chance of success for the third attempt, we added a “raft.” A “raft” is several extra layers of print material which are laid down as part of the print process. The “raft” has a slightly larger footprint than the print itself, and that footprint has 100% coverage using a tightly weaved pattern to ensures that plenty of material is applied to the build surface to secure the print until the job is completed. We’ve never had a problem with PLA prints separating from their “rafts” but as mentioned TPU is far more flexible and resilient which made it impossible to cleanly separate the “raft” from the case, rendering the case unusable.

One of Four Puzzle Pieces Making Up
5×7 Photo Frame Gift
Note Extra Tape to Avert Curling

As mentioned previously build plate adhesion is the single biggest problem making 3D printing today an art. Various build plate materials are available from aluminum to tempered glass, and PVC, all with their unique bonding characteristics with different build materials. Tempered glass works reasonably well with TPU, provided the build plate temperature is set correctly and consistent within two degrees Celsius. With the third print, we opted to lay down a thin layer of diluted Elmer’s white glue to provide adequate binding of the print to the build plate. Something we do now as a regular process when printing with PLA.

The second problem, which impacts print adhesion to the build plate, is temperature control. Here there are three variables, the temperature of the head, the build plate, and air flow from fans designed to cool a print in process. PLA shrinks as it cools, so often prints larger than an inch tend to curl if they cool too quickly. This is where the bed temperature needs to be just right at the start then cool slowly as successive layers are applied. Peeling with PLA has been a problem for some time, so “rafts” and a little extra glue on the raft early on often solves this.

Stag Coral Bud Vase with Scaffolding
Looking Like Something a Spider Wove

Another problem is the supports created by the program that compiles your 3D model into your print file that you can then send to the printer. These supports often called “scaffolding,” are required if a print needs to expand much beyond the printed surface below it. 3D printers are akin to hot-glue guns with excellent control over print material placement. With that in mind, you can’t have the printer squeeze out a bead of plastic with nothing below it for very long before gravity takes over and introduces chaos into your print. The print programs are aware of this, and they have tricks to prevent this from happening. Imagine you are printing a capital “T” as you see it on this page. Normally you’d print this T laying flat on its’ back or even upside down to limit wasted material and ensure a successful print. For this to print properly standing up, the print program inserts temporary scaffolding under both arms of the “T” as it starts building the base of the “T.” This provides a structure onto which the printer can then print the more solid arms of the “T.” This holds especially true if you’re a grandson whose custom designed a “stag coral bud vase” with many outstretched branches as shown below, or a phone case where the protective lip touches the front glass. With PLA the scaffolding often removes pretty easily once the print is complete, and if not a Dremel can be used to clean up any remaining mess. With TPU some scaffolding can never be removed.

Same Stag Coral Bud Vase
After Removing Scaffolding

More expensive dual head printers, often starting around $800USD, can utilize a water-soluble plastic to print the scaffolding in parallel with the primary print material. Once a job has finished printing you just remove it from the printer and drop it in a tank of water overnight. On returning in the morning you find a clean print with no supports. Single head printers don’t have this technique available to them.

Other materials like wood, metal, and ceramics can also be printed on these printers, but these exotic materials require even more of a craft approach to printing. We’ve attempted a wood product several times, but have had no success to date as it’s jammed up multiple print heads. This product is VERY finicky when it comes to temperature, too low it jams up, too high and you end up burning the wood. As for metals, you can even print simple circuit boards; we’ve not yet experimented with those.

Exercise #5 of 20 in In Learning 3D Design

Another craft aspect of 3D printing which I’ve not discussed yet is the area of designing something to be printed. Design programs like SolidWorks or Fusion360 by Autodesk, the leader in this market, enable you to create practically anything you can dream up, sometimes even more. That doesn’t mean though that what you design is always in-fact printable. Shown to the right is one of the twenty exercises my son assigned me to design in Fusion360, a little role-reversal. With a single head printer if you want to create something that requires different materials or colors you have to design it for assembly post printing. Most single head printers don’t allow you to stop mid-job to change out a spool of black PLA for white PLA. Designing something for assembly, post-printing, requires considerations be made during the design process which would not be necessary if you were using a dual-head printer. Most dual-head printers today pause the job and notify you a spool needs replacing, like back in the day when your plotter needed another pen to draw with green.

Final Stag Coral Bud Vase

If you live on the bleeding edge sometimes you get cut, or in my case lately a little scorched, by the technology you’re looking to wrangle. Perhaps you might like to learn a bit more, please consider listening to this recent podcast we did on this subject. If you’re one of those out there who thinks 3D printers are something that will soon be gracing the shelves of Target, then I’d love to hear your perspective.

Below, you can see two print jobs, both from the same file, one in white PLA and the other in black, both material spools were from the same company. Note that the white printed fine, while the black has a series of raised sections that make it unusable as one-quarter of a decorative picture frame. In fact, all four puzzle pieces that were printed out in white PLA all came out fine. We’ve reached out to the Reddit community to see if anyone has any idea why this would happen with the black PLA.

Two 5×7″ Picture Frame Puzzle Pieces, Both From the Same Print File,
Using PLA Plastic From the Same Company

02/17/19 UPDATE: Extensive work with Fusion360, Cura (slicing/print utility) and the Creality Ender3D printer has resulted in an understanding that tightly controlling six variables are the most important in creating great repeatable results when using PLA:

  1. Print head temperature: 200C.
  2. Use a tempered glass print bed.
  3. Print bed temperature: 70C.
  4. Print bed height in relation to the print head at the lowest point: 1/2 the thickness of an average business card, enough so that the first bead of printed material is slightly squished.
  5. LEVEL print bed, this can be done by moving the head over all four adjustment zones and checking the spacing.
  6. Always print with a “raft” it wastes a bit of material, but I’ve yet to have an issue once the above five variables were all resolved.

Once all these issues were addressed, printing was almost as repeatable and flawless as printing on an ink-jet printer. So good luck, and happy printing.

*Note to see any of the above pictures in more detail just click on them.

Focus on the New Network Edge, the Server

For decades we’ve protected the enterprise at the network edges where the Internet meets our DMZ, and then again where our DMZ touches our Intranet. These two distinct boundary layers and the DMZ in-between makeup what we perceived as the network edge. It should be pointed out though that these boundaries were architected long before phishing and click-bate existed as part of our lexicon. Today anyone in the company can open an email, click on an attachment or a web page, and open Pandora’s box. A single errant click can covertly launch a platform that turns the computer into a beachhead for the attacker. This beachhead then circumvents all your usual well-designed edge focused defenses as it establishes an encrypted tunnel enabling the attacker access to your network whenever they like.

Once an attacker has established their employee hosted beachhead, they then begin the search for a secondary, server-based, vantage point from which to operate. A server affords them a more powerful hardware system and often one with a higher level of access across the entire enterprise. Finally, if the exploit is discovered in that server, the attacker can quickly revert to their fall back position on their initial beachhead system and wait out the discovery.

This is why enterprises must act as if they’ve already been breached. Accept the fact that there are latent attackers already inside your network seeking out your corporate jewels. So how do you prevent access to your companies most valuable data? Attackers are familiar with the defense in depth model so once they’re on your corporate networks, often all that stands between them and the data they desire is knowing where it is hidden, and obtaining the minimum required credentials to access it. So how do they find the good stuff?

They start by randomly mapping your enterprise network in hopes that you don’t have internal honey-pots or other mechanisms that might alert you to their activity. Once the network is mapped they’ll then use your DNS to assign names to the systems they’ve discovered in hopes that this might give them a clue where the good stuff resides. Next, they’ll do a selective port scan against the systems that look like possible targets to determine what applications are running on them to fill in their attack plan further. At this point, the attacker has a detailed network map of your enterprise, complete with system names, and the names of the applications running on those systems. The next step will be to determine the versions of the applications running on what appear to be the most critical systems, so they’ll know which exploits to leverage. It should be noted that even if your servers have a local OS based firewall, you’re still vulnerable. The attackers at this point know everything they need to, so if you haven’t detected the attack by this stage, then you’re in trouble because the next step is the exfiltration of data.

If we view each server within your enterprise as the new network edge, then how can we defend these systems? Solarflare will soon announce ServerLock, a system that leverages the Network Interface Card (NIC) in your server to provide a new defense in depth layer in hardware. A layer that not only shields it from attack, but it can also camouflage the server and report attempts made to access it. Two capabilities not found in OS based software firewalls. Furthermore, since all security is handled entirely within the NIC, there is no attackable surface area. So how does ServerLock provide both camouflage and reporting?

When a NIC has ServerLock enforcement enabled only network flows for which a defined policy exists are permitted to enter or exit that server. If a new connection request is made to that server which doesn’t align with a security policy, say from an invalid address or to an invalid port, then that network packet will be dropped, and optionally an alert can be generated. The attacker will not receive ANY response packet and assume that nothing is there. Suppose you are enforcing a ServerLock policy on your database servers which ONLY accepts connections from a pool of application servers, and perhaps two administrative workstations, on specific numeric ports. If a file server were compromised and used as an attack position once it reaches out to one of those database servers via a ping sweep or an explicit port scan it would get NOTHING back, the database server would appear as network dark space to the file server. On the ServerLock Manager console alerts would be generated, and the administrator would know in an instant that the file server was compromised. Virtually every port on every NIC that is under ServerLock enforcement is turned into a zero-interaction honeypot.

So suppose the attacker has established themselves on that file server, and the server then gets upgraded to ServerLock and put under enforcement. The moment that attacker steps beyond the security policies executing in that NIC on that server the jig is up. Assuming they’re on the server, once they attempt any outbound network access that falls outside the security policies those packets will be dropped in the NIC, and an alert will be raised at the ServerLock Management console. No data exfiltration today.

Also, it should be noted that ServerLock is not only firmware in the NIC to enforce security policies, but it is also an entire tamper-resistant platform within the NIC. Three elements make up this tamper-resistant platform, first only properly signed firmware can be executed, older firmware versions cannot be loaded, and any attempt to tamper with the hardware automatically destroys all the digital keys stored within the NIC. Valid NIC firmware must be signed with a 384-bit key utilizing elliptic curve cryptography. The Solarflare NIC contains the necessary keys to validate this signature, and as mentioned earlier tampering with the NIC hardware will result in fuses blowing that will corrupt the stored keys forever rendering the both unusable and unreadable.

Today enterprises should act as though they’ve already been compromised, and beef up their internal defenses to protect the new network edge, the server itself. In testing ServerLock, we put a web server protected by ServerLock directly on the Internet, outside the corporate firewall.

Compromised Server Supply Chains, Really?

2018 Was shaping up nicely to become “The Year of the CPU Vulnerability” what with Meltdown, Spectre, TLBleed, and Foreshadow we had something going then along came Bloomberg and “The Big Hack” story. Flawed CPU designs just weren’t enough; now we have to covertly install “system on a chip (SoC)” spy circuits directly into the server’s baseband management controller (BMC) at the factory. As if this weren’t enough today Bloomberg drops its second story in the series “New Evidence of Hacked Supermicro Hardware Found in U.S. Telecom” which exposes compromised RJ45 connectors in servers.

We learned recently that Edward Snowden’s cache of secret documents from five years ago included the idea of adding an extra controller chip to motherboards for remote command and control. Is it astonishing that several years later a nation-state might craft just such a chip? Today we have consumer products like the Adafruit Trinket Mini-Microcontroller, pictured below, at $7USD the whole board is 27mm x 15mm x 4mm. The Trinket is an 8Mhz 8bit Atmel ATtiny85 minicomputer that can be clocked up to 20Mhz, with 8K flash, 512 bytes of SRAM and 512 bytes of EEPROM ($0.54USD for just the microcontroller chip) in a single 4mm x 5mm x 1.5mm package. In the pervasive Maker culture that we live in today, these types of exploits aren’t hard to imagine. I’m sure we’ll see some crop up this fall using off the shelf parts like the one mentioned above.

In the latest Bloomberg story, one source Yussi Appleboum, revealed that the SMC motherboards he found had utilized a compromised RJ45 Ethernet connector. This rogue connector was encased in metal providing both camouflage for the hidden chip and as a heat sink to dissipate the power it consumes. In this case all one would need to do would be to craft a simple microcontroller with an eight pin package, one for each conductor in the RJ45 connector. This controller would then draw it’s power directly from the network while also sniffing packets entering and leaving the BMC. Inconceivable, hardly, the metal covering such a connector is somewhere around 12mm square, similar to the RJ45 on the Raspberry Pi shown to the right, that’s four times more area than the ATtiny85 referenced above. Other micro-controllers, like the one powering the Raspberry Pi Zero, could easily fit into this footprint and deliver several orders more processing power. The point is that if someone suggested this five years ago, at the time of the Snowden breach, I’d have said it was possible but unlikely as it would have required leading-edge technology in the form of custom crafted chips costing perhaps ten million or more US dollars. Today, I could recommend a whole suite of off the shelf parts, and something like this could very likely be assembled in a matter of weeks on a shoestring budget.

Moving forward OEMs need to consider how they might re-design, build, and validate to customers that they’ve delivered a tamper-proof server. Until then for OCP compatible systems you should consider Solarflare’s X2552 OCP-2 NIC which can re-route the BMC through their network ports and which includes Solarflare’s ServerLock™ technology that can then filter ALL network traffic entering and leaving the server. That is provided of course that you’ve disconnected the servers own Gigabit Ethernet ports. If you’d like a ServerLock™ sample white-list filter file that shows how to restrict a server to internal traffic only (10.x.y.z or 192.168.x.y) then please contact me to learn more.

UPDATE: This weekend I discovered the item shown to the right which is offered as both a complete product called the “LAN Tap Pro” for $40 in a discrete square black case or as this throwing star kit for $15 with all the parts, some assembly and soldering required. This product requires NO external power source, and as such, it can easily be hidden. The chip which makes the product possible, but which is not shown, should answer the question of whether or not the above hacking scenario is a reality. While this product is limited to 10/100Mb, and can not do GbE, it has a trick up its sleeve to down speed a connection so that the network can be easily tapped. When it comes to server monitoring/management ports these often do not require high-speed connections so it’s highly unlikely that down speeding the connection would likely even be noticed. The point of all this rambling is that it’s very likely that the second Bloomberg article is true if the parts necessary to accomplish the hacking task are easily available through a normal retail outlet like the Hacker Warehouse.

IPv6, an Appropriate Glacier Turns 21

This month IPv6 hit its 21st anniversary, but outside of Google, cloud providers, cell phone companies and ISPs, who really cares? One would think by now it would be widely adopted or dead, as technologies rarely last two decades if they’re unsuccessful. Estimates are that between 5% and 25% of Internet traffic is IPv6, and adoption rates vary greatly between countries. So what about the other 75% to 95% of Internet traffic? It’s using IPv4, you know those addresses of the form 192.168.1.1, technology which is 35 years old! These addresses resolve into four bytes that provide 4.3 billion unique numbers that can be assigned to publicly routed devices.

Back in 1983 having a worldwide network with a potential capacity of 4.3 billion connected devices was inconceivable. The IPv4 system was designed to link supercomputers together across the globe, military installations, large company mainframes and perhaps minicomputers used by smaller companies. For those who didn’t live through 1983, Radio Shack ruled the personal computer market with the TRS80 Model I, III, and the Color Computer. If you were online you paid CompuServe and dialled in via a 300 baud (bit/sec modem). Personal computers from Apple and IBM were just hitting the market, and cell phones were nothing more than glorified two-way radios addressable via a phone number, while selfies were called Polaroids, and you waited a minute or so to see a low quality “instant” picture. Who could have imagined then that soon a significant percentage of the people on the planet would have networked computers in their pockets, strapped to our wrists, controlling our automobiles, refrigerators and septic systems?

Now for those who might not be keeping up, we “officially” ran out of publicly available IPv4 address blocks back in January 2011, yeah right, who knew? Now to be fair the January 31, 2011 date is when we exhausted the top-level blocks which are doled out to the five regional Internet registries (RIRs). Going one level farther down, the last of the RIRs to consume its final block was North America in September 2015. So how have we survived the end of available IPv4 addresses? Simple, since 1994 we’ve been playing a series of tricks to expand the available address space beyond 4.3 billion. Suppose like me you have a single IPv4 Internet address on your home router, inside your home network the router and all the devices use an address on that network of 192.168.1.X. Your router then uses a trick called Network Address Translation (NAT) to map all the 192.168.1.X devices on your network to that single IPv4 address. Brantley Coile founded a company in 1994 called Network Translation that patented NAT, then rolled it out in a device that Cisco later bought and rebranded the PIX Firewall (Private Internet eXchange). Today the vast majority of Internet-connected devices are using a NAT’d address. These days it’s highly unlikely that a company will assign a publicly routed Internet address to a laptop or workstation. I joined IBM Research back in late 1983, and by 1987 was standing up servers with their own unique publicly routed 9.X.X.X addresses. This was before the days of firewalls and security appliances. At the time IBM was one of roughly 100 entities worldwide who had their own class A Internet address space (an IPv4 address starting with 126 or less, for example, General Electric has 3, IBM 9, HP 15, and Ford 19). If you control a class A address you have 16 million publicly routable Internet addresses at your disposal.

Administrators for decades have trained ourselves to grasp a four-byte number like 10.5.17.23, for example, so we could then key it into another device and manage or networks. We invested time in knowing IPv4 and building networks to meet our needs. IPv6 address look like 2001:0db8:85a3:0000:0000:8a2e:0370:7334, this is not human-friendly. That is why inside large companies, where the administrators are familiar with IPv4, there’s resistance to moving to IPv6 addresses. IPv6 is designed for automated machine management. Personally, this spring I was assigned the first IPv6 address I took note of when we converted over to Google Fiber at home. So what was the first thing I did once the fibre was active? I requested an IPv4 address. It turns out there are several servers in my house which I need to reach remotely, and I wasn’t about to begin pasting in an IPv6 address whenever I needed to connect with them. There may be a better way, but I fall back on what I know and trust, it’s human nature. 

Today most of our new Internet edge devices, for example, routers and smartphones, are intelligent enough that they self-configure and the whole issue of IPv4 to IPv6 conversion will slowly fade into the background. Within the home or the Enterprise though, where devices need a human touch, IPv4 will live long and prosper. 

Container Performance Doesn’t Need to Suck

Recently the OpenShift team at Red Hat, working with Solarflare Engineering, rolled out new code that was benchmarked by a third party, STAC Research, which demonstrated networking performance from within a container that was equivalent to that of a bare metal server. We’re talking 1.2 microseconds for 99% of network traffic in a 1/2RT (half round trip), that’s a TCP receive to an application coupled with a TCP send from that application.

Network performance like this was considered leading edge in High-Performance Computing (HPC) a little more than a decade ago when Myricom rolled out Myrinet10G which debuted at 2.4 microseconds back in 2006. Both networks are 10Gbps so it’s sort of an apples to apples comparison. Today, this level of performance is available for containerized applications using generic network socket calls. It should be noted that the above numbers were for zero byte packets, a traditional HPC measurement. More realistic performance using 256-byte packets yielded a 1/2RT time for the 99th percentile of traffic which was still under 1.5 microseconds, that’s amazing! It should be noted that everything was done to both the bare metal server and the Pod configuration to optimize performance. A graph of the complete results of that testing is shown below.

Anytime we create abstractions to simplify application execution or management we introduce additional layers of code that can result in potentially unwanted delays, known as application latency. By running an application inside a container, then wrapping that container into a Pod we are increasing the distance between what we intend to do, and what is actually being executed. Docker containers are fast becoming all the rage and methods for orchestrating them using tools like Kubernetes are extremely popular. If you dive into this OpenShift blog post there are ways to cut through these layers of code for performance while still retaining the primary management benefits.

What the FEC?
Auto-Detect Finally Here for 25G!

As technology marches forward new challenges arise that were not previously an issue. Consider as mankind moved from walking to horseback we cleared trails where there was once brush covered paths. As we transitioned from horseback to carriages those paths needed to become dirt roads, and the carriages added suspension systems. With the move from carriages to automobiles, we further smoothed the surface traveled by adding gravel. As the automobiles moved faster, we added an adhesive to the gravel creating paved roads. With the introduction of highways, we required engineered roads with multi-layered surfaces. Each generation reduced the variability in the road surface by utilizing new techniques that enabled greater speed and performance. The same holds true for computer networks.

Over the past three decades as we transitioned from 10Mbps to 25Gbps Ethernet we’ve required many innovations to support these greater speeds. The latest of these being Forward Error Correction (FEC). The intent of FEC is to reduce the bit error rate (BER) as the cable length increases. In 2017 we saw the ratification of the IEEE 25GbE specification which provides two unique methods of FEC. There is BASE-R FEC (also known as Firecode) and RS-FEC (known also as Reed Solomon). Both of these FEC algorithms introduce additional network latency as the signal is decoded, BASE-R is about 80 nanoseconds while RS-FEC is about 250 nanoseconds. The complexities don’t end here though, it turns out there are three different Direct Attach (DA) cable types with varying levels of quality, from good, to best we have:

  • CA-25G-L: up to 5m, requires RS-FEC
  • CA-25G-S: up to 3m, lower loss, requires either RS-FEC or BASE-R FEC
  • CA-25G-N: up to 3m, even lower loss, can work with RS-FEC, BASE-R FEC, or no FEC

But wait there’s more, if you order now we’ll throw in auto-negotiation (AN) and link training (LT) as both are required by the 25GbE IEEE standard (10GbE didn’t need these tricks). So what does AN actually negotiate? Two things, link speed and which type, if any, FEC will be utilized. It should be noted that existing 25GbE NICs that have been on the market likely only support one type of FEC. As for LT, it helps to improve the quality of the 25GbE link itself. It turns out though that the current generation of 25GbE switches came out prior to AN being worked out so support is at best poor to mixed. Often manual switch and adapter configuration are required. Oh, and did I mention that optical modules don’t support AN/LT? Well, they don’t, but some will support short links with no FEC.

So where does this leave people who want to deploy 25GbE? You need to be careful that both your network switch and server NICs will work well together. We strongly advise that you do a proof of concept prior to a full deployment. Not all 25G server NICs do both AN/LT because their chips (ASICs) were designed and fabricated prior to the completion of the IEEE specification for 25GbE last year. Solarflare’s 25GbE X2522 server NICs which debut next month include support for all the above, in fact, when initially powered up they will begin by:

  • First looking at cable, is it SFP or SFP28?
  • If it’s SFP28 it will attempt AN/LT, then 25G no AN/LT, then 10G
  • If it’s a 25G link, then it will try and detect which FEC is being used by the switch

Additionally, the server administrator can manually override the defaults and select AN/LT and the FEC type and setting (auto, on, off).

I grew up in New York, and remember listening to Sy Sims on TV say “an educated consumer is our best customer…”

P.S. I’d like to give a special thanks to Martin Porter, Solarflare’s VP of Engineering, for pulling all this together into a few slides.

Visibility + Control = Orchestration

In Taekwondo to win you watch your opponent’s center of gravity (CoG), for the eyes lie. For example, if the CoG moves toward their back foot you can expect a front kick, or if it begins a slight twist without moving forward or backward then a punch from the arm in the direction of the twist is coming. These are mandatory anticipatory movements which are precursors to a pending threat. If my opponent throws a punch or launches a kick without these movements it will be ineffectual. A punch without a twist is a tap. Of course the above is no secret. Skilled attackers lead with a feint to disguise their real intent, but that’s for another time. Cybersecurity is no different, you need to detect a threat, see it, classify it, then act on it. Detecting and seeing the threat is commonly referred to as Visibility. Classifying then acting on the threat is called Orchestration.

Imagine if you could watch the CoG of every server in your data center? In cyber terms that CoG might be every data flow in/out of the server. Placing boundaries and alerts on those flows is the primary role of orchestration. Placing these boundaries is now called micro-segmentation. Recently we suggested that the New Network Edge is the server itself. Imagine if you could watch every data flow from every server, set up zero trust policies to govern in advance which flows are permitted, then the system generates alerts to security operations when other flows are attempted. With solid governance comes the capability to quarantine applications or systems that have gone rogue.  All the while all of this is done within the server’s own NICs, without any host agents or utilizing any local x86 CPU cycles, that’s Solarflare ServerLock.

Below is a screenshot of ServerLock displaying seven groups of hosts, in the dark grey bubbles, with all the flows between those hosts in red. The Database servers group is highlighted, and all the network flows for this group are shown. Note this is a demonstration network. Click on the image below to see a larger version of it. 

The New Network Edge – The Server

Today cleverly crafted spear phishing emails and drive-by downloads make it almost trivial for a determined attacker to infect a corporate workstation or laptop. Wombat’s “State of the Phish 2018” report shows that 76% of InfoSec professionals experienced phishing attacks in 2017. Malware Remote Access Toolkits (RATs) like Remcos for Windows can easily be rebuilt with a new name and bound to legitimate applications, documents or presentations. Apple Mac users, myself included, are typically a smug group when it comes to Malware so for them, there’s MacSpy which is nearly as feature rich. A good RAT assumes total control over the workstation or server on which they are installed then it leverages a secure HTTPS connection back to their command and control server. Furthermore, they employ their own proprietary encryption techniques to secure their traffic prior to HTTPS being applied. This prevents commercial outbound web proxies designed to inspect HTTPS traffic from gaining any useful insights into the toolkits nefarious activities. With the existence of sophisticated RATs, we must reconsider our view of the enterprise network. Once a laptop or workstation on the corporate network is compromised in the above fashion all the classic network defenses, firewalls, IDS, and IPS are rendered useless. These toolkits force us to reconsider that the New Network Edge is the server itself, and that requires a new layer in our Defense in Depth model.

The data on our enterprise servers are the jewels that attackers are paid a hefty sum to acquire. Whether it’s a lone hacker for hire by a competitor, a hacktivist group or a rogue nation state, there are bad actors looking to obtain your companies secrets. Normally the ONLY defenses on the corporate network between workstations and servers are the network switches and software firewalls that exist on both ends. The network switches enforce sub-networks (subnets) and virtualized local area networks (VLANs) that impose a logical structure on the physical network. Access Control Lists (ACLs) then define how traffic is routed across these logical boundaries. These ACLs are driven by the needs of the business and meant to reflect how information should flow between different parts of the enterprise. By contrast, the software firewalls on both the workstations and servers also define what is permitted to enter and leave these systems. As defenses, both these methods fall woefully short, but today they’re the last line of defense. We need something far more rigorous that can be centrally managed to defend the New Network Edge, our servers.

As a representation of the businesses processes, switch ACLs are often fairly loose when permitting systems on one network access to those on another. For example, someone on the inside sales team sitting in their cubical on their workstation has access to the Customer Relationship Management (CRM) system which resides on a server that is physically somewhere else. The workstation and server are very likely on different subnets or VLAN within the same enterprise, but ACLs exist that enable the sales person’s workstation access to customer data provided by the CRM system. Furthermore, that CRM system is actually pulling the customer data from a third system, a database server. It is possible that the CRM server and the database server may be on the same physical server, or perhaps in the same server rack, but very possibly on the same logical network. The question is, is there a logical path from the inside sales person’s workstation to the database server, the answer should be no, but guess what? It doesn’t matter. Once the inside salesperson is successfully spear fished then it’s only a matter of time before the attacker has access to the database server through the CRM server.

The attacker will first enable the keylogger, then watch the sales person’s screen to see what they are doing, harvest all their user ids and passwords, perhaps turn on the microphone and listen to their conversations, and inspect all the outgoing network connections. Next, the attacker will use what they’ve harvested and learned to begin their assault, first on the CRM server. Their goal at this point is to establish a secondary beachhead with the greatest potential reach from which to launch their primary assault while keeping the inside sales person’s workstation as their fallback position. From the CRM server, they should be able to easily access many of the generic service machines: DNS, DHCP, NTP, print, file, and database systems. The point here is that where external attackers often have to actively probe a network to see how it responds, internal RAT based attacks can passively watch and enumerate all the ports and addresses typically used. In doing so they avoid any internal dark space honeypots, tripwires, or sweep detectors. So how do we protect the New Network Edge, the server itself?

A new layer needs to be added to our defense in depth model called micro-segmentation or application segmentation. This enforces a strict set of policies on the boundary layer between the server and the network. Cisco, Arista, and other switch providers, with a switch-based view of the world, would have you believe that doing it in the switch is the best idea. VMWare, with its hypervisor view of the world, would have you believe that their new NSX product is the solution. Others like Illumio and Tuffin would have you believe that a server-based agent is the silver bullet for micro-segmentation. Then there’s Solarflare, a NIC company, with its NIC based view of the world, and its new entrant in the market called ServerLock.

Cisco sells a product called Tetration designed to orchestrate all the switches within your enterprise and provide finely grained micro-segmentation of your network traffic. It requires additional Cisco servers be installed to receive traffic flow data from all the switches, processes the data, then provides network admins with both the visibility and orchestration of the security policies across all the switches. There are several downsides to this approach, it is complex, expensive, and can very possibly be limited by the ACL storage capabilities of the top of rack switches. As we scale to 100s of VMs per system or 1,000s of containers these ACLs will likely be stretched beyond their limits.

VMWare NSX includes both an advanced virtual switch and a firewall that both require host CPU cycles to operate. Again, as we scale to 100s of VMs per system the CPU demands placed on the system by both the virtual switch and the NSX firewall will become significant, and measurable. Also, it should be noted that being an entirely software-based solution NSX has a large attackable surface area that could eventually be compromised. Especially given the Meltdown and Spectre vulnerabilities recently reported by Intel. Finally, VMWare NSX is a commercial product with a premium price tag.

This brings us to the agent-based solutions like Illumio and Tuffin. We’ll focus on Illumio which comes with two components the Policy Compute Engine (PCE) and the Virtual Enforcement Node (VEN). The marketing literature states that the VEN is attached to a workload, but it’s an agent installed on every server under Illumio’s control and it reports network traffic flow data into the PCE while also controlling the local OS software firewall. The PCE then provides visualization and a platform for orchestrating security policies. The Achilles heel of the VEN is that it’s a software agent which means that it both consumes x86 CPU cycles and provides a large attackable surface area. Large in the sense that both its agent and the OS-based firewall on which it depends can both be easily circumvented. An attacker need only escalate their privileges to root/admin to hamstring the OS firewall or disable or blind the VEN. Like VMWare NSX, Illumio and Tuffin are premium products.

Finally, we have Solarflare’s NIC based solution called ServerLock. Unlike NSX and Illumio which rely on Intel CPU cycles to handle firewall filtering, Solarflare executes its packet filtering engine entirely within the chip on the NIC. This means that when an inbound network packet is denied access and dropped it takes zero host CPU cycles, compared to the 15K plus x86 cycles required by software firewalls like NSX or IPTables. ServerLock NICs also establish a TLS-based domain of trust with a central ServerLock Manager similar to Illumio’s PCE. The ServerLock Manager receives flow data from all the ServerLock NICs under management and provides Visibility, Alerting and Policy Management. Unlike Illumio though the flow data coming from the ServerLock NICs requires no host CPU cycles to gather and transmit, these tasks are done entirely within the NIC. Furthermore, once the Solarflare NIC is bound to a ServerLock Manager the local control plane for viewing and managing the NIC’s hardware filter table is torn down so even if an application were to obtain root privilege there is no physical path to view or manage the filter table. At this point the, it is only capable of being changed from the specific ServerLock Manager to which it is bound. All of the above comes standard with new Solarflare X2 based NICs that are priced at or below competitive Intel NIC price points. ServerLock itself is enabled as an annual service sold as a site license.

So when you think of micro-segmentation would you rather it be done in hardware or software?

P.S. Someone asked why there is a link to a specific RAT or why I’ve included a link to an article about them, simple it validates that these toolkits are in-fact real, and readily accessible. For some people, threats aren’t real until they can actually see them. Also, another person asked, what if we’re using Salesforce.com, that’s ok, as an attacker instead of hitting the CRM server I’ll try the file servers, intranet websites, print servers, or whatever that inside salesperson has access to. Eventually, if I’m determined and the bounty is high enough, I’ll have access to everything.