SmartNICs vs. DPUs, Who Wins?

August 25, 2020February 15, 2021 scottcschweitzer 25GbE, Accelerators, FPGA, Infiniband, networking, TCP Offload Engine Accelerators, Broadcom, DPU, Fungible, IPU, Networking, NVIDIA, Pensando, SmartNICs, Xilinx

Last week I hosted an IEEE Hot Interconnects Panel with the above title. We were lucky enough to secure some time from the following luminaries, and it made for an excellent panel:

Andy Gospodarek, Broadcom, Open Sourcerer
Pradeep Sindhu, Fungible, CEO
Michael Kagan, NVIDIA, CTO
Vipin Jain, Pensando, CTO
Gordon Brebner, Xilinx, CTO Staff & Fellow

Clicking on the image below should take you to the 90 minute Youtube video of our panel discussion. For those who are just interested in the highlights you can read below for some of the interesting facts pulled from our discussion.

**IEEE Hot Interconnects Panel: “SmartNICs vs. DPUs, Who Wins?”**

Here are some key points that contain significant value from the above panel discussion:

SmartNICs provide a second computing domain inside the server that could be used for security, orchestration, and control plane tasks. While some refer to this as an air-gapped domain it isn’t, but it is far more secure than running inside the same x86 system domain. This can be used to securely enable bare-metal as a service. — Michael Kagan
Several vendors are actively collaborating on a Portable NIC Architecture (PNA) designed to execute P4 code. When available, it would then be possible to deliver containers with P4 code that could run on any NIC that supported this PNA model. — Vipin Jain
The control plane needs to execute in the NIC for two reasons, first to offload the host CPU from what is quickly become 30% overhead for processing network traffic, and second to improve the determinism of the applications running on the server. –Vipin Jain
App stores are inevitable, when is the question. While some think it could be years, others believe it will happen within a year. Xilinx has partnered with a company that already has one for FPGA accelerators so the leap to SmartNICs shouldn’t be that challenging. –Gordon Brebner
The ISA is un-important, it’s the micro-architecture that matters. Fungible selected MIPS-64 because of it’s support for simultaneous multi-threaded execution with fine-grained context switching. — Pradeep Sindhu. While others feel that the eco-system of tools and the wide access to developers is most important and that is why they’ve selected ARM.
It should be noted that normally the ARM cores are NOT in the data plane.

The first 18 minutes are introductions and marketing messages. While these are educational, they are also somewhat canned marketing messages. The purpose of a panel discussion was to ask questions that the panel hadn’t seen in advance so we could draw out of them honest perspectives and feedback from their years of experience.

IMHO, here are some of the interesting comments, with who made them and where to find them:

18:50 Michael – The SmartNIC is a different computational domain, a computer in-front of a computer, and ideal for security. It can supervise or oversee all system I/O, key thing is that it is a real computer.

23:00 Gordon – Offloading the host CPU to the SmartNIC and enabling programmability of the device is critically important. We’ll also see functions and attributes of switches being merged into these SmartNICs.

24:50 Andy – Not only data plane offload, but control plane offload from the host is also critically important. Also hardware, in the form of on chip logic, should be applied to data plane offload whenever possible so that ARM cores are NOT being placed in the data plane.

26:00 Andy – Dropped the three letter string that most hardware providers cringe when we hear it, SDK. He stressed the importance of providing one. It should be noted that Broadcom at this point, as far as I know, appears to be the only SmartNIC OEM that provides a customer facing SmartNIC SDK.

26:50 Vipin – A cloud based device that is autonomous from the system and remotely manageable. Has it’s own brain, and that truly runs independently of the host CPU.

29:33 Pradeep – There is no golden rule, or rule of thumb like 1Gb/sec/core like what AMD has said. It’s important to determine what computations should be done in the DPU, multiplexing and stateful applications are ideal. General purpose CPUs are made for processing single threaded applications very fast, horrible at multiplexing.

33:37 Andy – 1Gb/core is really low, I’d not be comfortable with that. I would consider DPDK, or XDP and it would blow that metric away. People shouldn’t settle for this metric.

35:24 Michael – Network needs to take care of the network on it’s own, so zero core for an infinite number of Gigabits.

36:45 Gordon – The SmartNIC is a kinda filtering device, where sophisticated functions like IPS, can be offloaded into the NIC.

40:57 Andy – The Trueflow logic delivers a 4-5X improvement in packet processing. There are a very limited number of people really concerned with hitting line rate packet per second at these speeds. In the data center these PPS requirements are not realistic.

42:25 Michael – I support what Andy said, these packet rates are not realistic in the data center.

44:20 Pradeep – We’re having this discussion because general purpose CPUs can no longer keep up. This is not black and white, but a continuum, where does general processing end and a SmartNIC pick up. GRPC as an example needs to be offloaded. The correct interface is not TCP or RDMA, both are too low level. GRPC is a modern level for this communication interface. We need to have architectural innovation because scale out is here to stay!

46:00 Gordon – One thing about being FPGA based is that we can support tons of I/O. With FPGAs we don’t think in terms of cores, we look at I/O volumes, several years ago we first started looking at 100GbE then figured out how to do that and extended it to 400GbE. We can see the current way scaling well into the Terabit range. While we could likely provide Terabit range performance today it would be far to costly, it’s a price point issue, and nobody would buy it, the cost of doing things is also an issue.

48:35 Michael – CPUs don’t manage data efficiently. We have dedicated hardware engines and TCAM along with caches to service these engines, that’s the way it works.

49:45 Pradeep – The person asking the question perhaps meant control flow and not flow control, while they sound the same they mean different things. Control flow is what a CPU does, flow control is what networking does. A DPU or SmartNIC needs to do both well to be successful. It appears, and I could be wrong, that Pradeep is using pipeline to refer to consecutive stages of execution on a single macro resource like a DPU then chain as a collection of pipelines that provide a complete solution.

54:00 Vipin – Sticking with fixed function execution than line rate is possible. We need to move away from focusing on processing TCP packets, and shift focus to messages with a run-to-completion model. It is a general purpose program running in the data path.

57:20 Vipin – When it came to selecting our computational architecture it was all about ecosystem, and widely available resources and tooling. We [Pensando] went with ARM.

58:20 Pradeep – The ISA is an utter detail, it’s the macro-architecture that matters, not the micro instruction architecture. We chose MIPS because of the implementation which is a simultaneous multi-threaded implementation which is far and away a much better fine grained context switching. Much much better than anything else out there. There is also the economic price/performance to be considered.

1:00:12 Michael – I agree with Vipin it’s a matter of ecosystem, we need to provide a platform for people to develop. We’re not putting ARMs on the data path. So this performance consideration Pradeep has mentioned is not relevant. The key is providing an ecosystem that attracts as many developers as possible, and making their lives easier to produce great value on the device.

1:01:08 Andy – I agree 100%, that’s why we selected ARM, ecosystem drove our choice. With ARM their are enough Linux distributions, and you could be running containers on your NIC. The transition to ARM is trivial.

1:02:30 Gordon – Xilinx mixes ARM cores with programmable FPGA logic, and hard IP cores for things like encryption.

1:03:49 Pradeep – The real problem is the data path, but clearly ARM cores are not in the data path so they are doing control plane functions. Everyone says they are using ARM cores because of the rich ecosystem, but I’d argue that x86 has a richer ecosystem. If that’s the case then why NOT keep the control plane then in the hosts? So why does the control plane need to be imbedded inside the chip?

1:04:45 Vipin – Data path is NOT in ARM. We want it on a single die, we don’t want it hoping across many wires and killing performance. The kind of integration I can do by subsuming the ARM cores into my die is tremendous. That’s why it can not be on Intel. [Once you go off die performance suffers, so what I believe Vipin means is that he can configure on the die whatever collection of ARM cores, and hard logic he wants, and wire it together how best he sees fit to meet the needs of their customers. He can’t license x86 cores and integrate them on the same die as he can with ARM cores.] Plus if he did throw an x86 chip on the card it would blow his power budget [PCIe x16 lane cards are limited to 75W].

1:06:30 Michael – We don’t have as tight an integration with data-path and ARMs as Pensando. If you want to segregate computing domains between application tier and infrastructure tier you need another computer and putting an x86 on a NIC just isn’t practical.

1:07:10 Andy – The air-gap, bare-metal as a service, use case is a very popular one. Moving control plane functions off the x86 to the NIC, frees up x86 cores and enables a more deterministic environment for my applications.

1:08:50 Gordon – Having that programable logic alongside the ARM cores gives you both the control plane offload as well as dynamically being able to modify the data plane locally.

1:10:00 Michael – We are all for users programming the NIC we are providing an SDK, and working with third parties to host their applications and services on our NICs.

1:10:15 Andy – One of the best things we do it outreach, where we provide NICs to university developers, they disappear for a few months then return with completed applications or new use cases. Broadcom doesn’t want to tightly control how people use their devices, it isn’t open if it is limited by what’s available on the platform.

1:13:20 Vipin – Users should be allowed to own and define their own SDK to develop on the platform.

1:14:20 Pradeep – We provide programming stacks [libraries?] that are available to users through RestAPIs.

1:15:38 Gordon – We took an early lead in helping define the P4 language for programming network devices. Which became Barefoot Networks switch chips, but we’ve embraced it since very early on. We actually have a P4 to Verilog compiler so you can turn your P4 code into logic. The main SmartNIC functions inside Xilinx are written in P4. Then there are plug-ins where others can add their own P4 functions into the pipeline.

1:17:35 Michael – Yes, an app-store for our NIC, certainly. It’s a matter of how it is organized. For me it is somewhere users can go where they can safely download containerized applications or services which can then run on the SmartNIC.

1:18:20 Vipin – The App Store is a little ways out there, it is a good idea. We are working in the P4 community towards standards. He mentions PNA, the Portable NIC Architecture as an abstraction. [OMG, this is huge, and I wish I wasn’t juggling the balls trying to keep the panel moving as this would have been awesome to dig into. A PNA could then enable the capability to have containerized P4 applications that could potentially run across multiple vendors SmartNICs.] He also mentioned that you will need NIC based applications, and a fabric with infrastrucutre applications so that NICs on opposite sides of a fabric can be coordinated

1:21:30 Pradeep, An App Store at this point may be premature. In the long term something like an App Store will happen.

1:22:25 Michael, things are moving much faster these days, maybe just another year for SmartNICs and an App Store.

1:23:45 Gordon, we’ve been working with Pensando and others on the PNA concept with P4 for some time.

1:28:40 Vipin, ..more coming as I listen again on Wednesday.

For those curious the final vote was three for DPU and two for SmartNIC, but in the end the customer is the real winner.

SmartNICs, the Next Wave in Server Acceleration

April 4, 2020April 4, 2020 scottcschweitzer 25GbE, Accelerators, FPGA, HFT, networking, Security, TCP Offload Engine

As system architects, we seriously contemplate and research the components to include in our next server deployment. First, we break the problem being solved into its essential parts; then, we size the components necessary to address each element. Is the problem compute, memory, or storage-intensive? How much of each element will be required to craft a solution today? How much of each will be needed in three years? As responsible architects, we have to design for the future, because what we purchase today, our team will still be responsible for three years from now. Accelerators complicate this issue because they can both dramatically breath new life into existing deployed systems, or significantly skew the balance when designing new solutions.

Today foundational accelerator technology comes in four flavors: Graphical Processing Units (GPUs), Field Programmable Gate Arrays (FPGAs), Multi-Processor Systems on a Chip (MPSoCs) and most recently Smart Network Interface Cards (SmartNICs). In this market, GPUs are the 900-pound gorilla, but FPGAs have made serious market progress the past few years with significant deployments in Amazon Web Services (AWS) and Microsoft Azure. MPSoCs, and now SmartNICs, blend many different computational components into a single chip package, often utilizing a mix of ARM cores, GPU cores, Artificial Intelligence (AI) engines, FPGA logic, Digital Signal Processors (DSPs), as well as memory and network controllers. For now, we’re going to skip MPSoCs and focus on SmartNICs.

SmartNICs place acceleration technology at the edge of the server, as close as possible to the network. When computational processing of network intense workloads can be accomplished at the network edge, within a SmartNIC, it can often relieve the host CPU of many mundane networking tasks. Normal server processes require that the host CPU spend, on average, 30% of it’s time managing network traffic, this is jokingly referred to as the data center tax. Imagine how much more you could get out of a server if just that 30% were freed up, and what if more could be made available?

SmartNICs that leverage ARM cores and or FPGA logic cells exist today from a growing list of companies like Broadcom, Mellanox, Netronome, and Xilinx. SmartNICs can be designed to fit into a Software-Defined Networking (SDN) architecture. They can accelerate tasks like Network Function Virtualization (NVF), Open vSwitch (OvS), or overlay network tunneling protocols like Virtual eXtensible LAN (VXLAN) and Network Virtualization using Generic Routing Encapsulation (NVGRE). I know, networking alphabet soup, but the key here is that complex routing, and packet encapsulation tasks can be handed off from the host CPU to a SmartNIC. In virtualized environments, significant amounts of host CPU cycles can be consumed by these tasks. While they are not necessarily computationally intensive, they can be volumetrically intense. With datacenter networks moving to 25GbE and 50GbE, it’s not uncommon for host CPUs to process millions of packets per second. This processing is happening today in the kernel or hypervisor networking stack. With a SmartNIC packet routing and encapsulation can be handled at the edge, dramatically limiting the impact on the host CPU.

If all you were looking for from a SmartNICs is to offload the host CPU from having to do networking, thereby saving the datacenter networking tax of 30%, this might be enough to justify their expense. Most of the SmartNIC product offerings from the companies mentioned above run in the $2K to $4K price range. So suppose you’re considering a SmartNIC that costs $3K, with the proper software, and under load testing, you’ve found that it returns 30% of your host CPU cycles, what is the point at which the ROI makes sense? A simplistic approach would suggest that $3K divided by 30% yields a system cost of $10K. So if the cost of your servers is north of $10K, then adding a $3K SmartNIC is a wise decision, but wait, there’s more.

SmartNICs can also handle many complex tasks like key-value stores, encryption, and decryption (IPsec, MACsec, soon even SSL/TLS), next-generation firewalls, electronic trading, and much more. Frankly, the NIC industry is at an inflection point similar to when video cards evolved into GPUs to support the gaming and virtualization market. While Sony coined the term GPU with the introduction of the Playstation in 1994, it was Nvidia five years later in 1999 who popularized the GPU with the introduction of the GeForce 256. I doubt that in the mid-1990s, while Nvidia was designing the NV10 chip, the heart of the GeForce 256, that their engineers were also pondering how it might be used in high-performance computing (HPC) applications a decade later that had nothing to do with graphic rendering. Today we can look at all the ground covered by GPU and FPGA accelerators over the past two decades and quickly see a path forward for SmartNICs where they may even begin offloading the primary computational tasks of a server. It’s not inconceivable to envision a server with a half dozen SmartNICs all tasked with encoding video, or acting as key-value stores, web caches, or even trading stocks on various exchanges. I can see a day soon where the importance of SmartNIC selection will eclipse server CPU selection when designing a new solution from the ground up.

User Level Networking (ULN) is Becoming an Over-Night Success

April 1, 2019April 1, 2019 scottcschweitzer 25GbE, HFT, Infiniband, networking, Onload, RDMA, RoCE, TCP Offload Engine

Rarely is an over-night success, over-night. Often success comes as a result of years or even decades of hard work, refinement, and maturity. ULN is just such a technology, while it is only now becoming fashionable as word leaks out that Google and Tencent have been adopting it internally because they’ve proven significant performance gains, it has been nearly 25 years in the making. Since the mid-1990s we have seen many efforts which have advanced kernel bypass otherwise known as ULN.

With the advent of both Gigabit Ethernet (GbE) and the Linux operating system, we saw the emergence of large (1,024 or more) clusters of high-performance servers. These clusters were often designed to focus on particular computing tasks, typically single applications representing complex computational problems. These problems were particularly thorny because they involved very chatty sophisticated programs that modeled fluid dynamics (ex. Boeing and airflow over a wing) or finite particle analysis (ex. Ford and GM with simulated car crash models) or seismic analysis (ex. Saudi Aramco and oil production). Don’t get me wrong, there were also many more like modeling nuclear weapons storage, but the above were just a few of dozens of classes of problems. So, the HPC crowd was seeking networking which was even faster and more efficient than generic Transmission Control Protocol (TCP) over GbE. They’d also realized that the Linux kernel was beginning to bottleneck their overall performance, so they started to explore options for bypassing the Kernel altogether.

This June the most popular Kernel bypass communications stack, the Message Passing Interface(MPI), will celebrate its 25th anniversary. MPI represented the dawn of a new approach to networking, a ULN communications stack. For MPI to achieve its desired performance objectives, it required a lower level networking device driver. In those early days, you could use the Virtual Interface Architecture(VIA) promoted by Intel, Microsoft and Compaq, which eventually became Infiniband’s Remote Direct Memory Access(RDMA), or Myrinetpromoted by Myricom. It should be noted that these weren’t the only two options, just the two most highly utilized at the time. Since then Myrinet has faded away, and Infiniband has dominated HPC.

In parallel to the maturing of ULN, we’ve had an explosion in core counts on CPUs. This year Intel will begin rolling out premium server-based processor chips supporting up to 48-cores, while AMD counters with a 64. On the surface, this is excellent news, but it further complicates other system-wide server performance issues, most notably access to the network. Since most servers are a dual socket, this brings the potential maximum core counts to 96 and 128 respectively. What we’ve noticed though through internal testing is that often as the total number of processing cores on a server increases beyond ten the operating system typically becomes the networking performance bottleneck. As mentioned previously the High-Performance Computing (HPC) market anticipated this issue long ago.

In 2010 there was a move by several companies to bring HPC technology to markets outside HPC. With this, we saw the introduction of Myricom’s Datagram Bypass Layer(DBL), Solarflare’s OpenOnload, and Voltaire’s Messaging Accelerator(VMA). Both DBL and VMA were born from fifteen years of MPI experience, and they were crafted to provide kernel bypass on Linux. Initially, DBL only supported the Unreliable Datagram Protocol (UDP), and it took Myricom nearly two more years to add Transmission Control Protocol (TCP) support. While Myricom was able to morph their Myrinet eXpress (MX) stack into DBL, the fact remained that they didn’t have their own ULN TCP stack and were torn between licensing one versus building their own. An interesting side note, the initial customer motivation to create DBL was for a storage company called SANBlaze, but Myricom quickly realized that it could also use DBL to accelerate stock market data for Chicago traders.

At that time 10GbE Network Interface Cards (NICs) had a 1/2 round trip for UDP based market data of about 10-15 microseconds. The initial version of DBL brought that down to under five microseconds. In financial trading, there is a direct correlation between time and money, and saving 5-10 microseconds on market data delivery means the difference between winning or losing a bid. At nearly the same time Solarflare also appeared in Chicago promoting its new OpenOnload that accelerated not only UDP but also the more complex TCP sessions. While market data comes in on UDP packets, orders into the exchanges are submitted using TCP. At the same time, and in parallel to this, one of the two biggest HPC Infiniband players Voltaire, later acquired by Mellanox, had crafted its own ULN called VMA. It too had realized that the lucrative financial markets were demanding ULN technology, and the time was right to apply their kernel bypass solution to this problem as well.

For four years, it was a three-way horse race between DBL, OpenOnload, and VMA for the best ULN solution on Linux providing support for both UDP and TCP. Since 2010 ULN for both UDP and TCP has come into production at nearly all of the worldwide financial exchanges, institutional banks, and high-frequency traders. While DBL and VMA still exist today, they make up less than 5% of utilization of ULN technology within financial customers. It turns out that in the fall of 2012 Myricom privately demonstrated to Google the value of using DBL to accelerate a Web2.0 application used extensively throughout Google called Memcached. By March of 2013 Google had acquired the necessary people and intellectual property from Myricom to bring both DBL and Myricom’s latest NIC technology in-house. With the core DBL development team gone, DBL’s utilization within the financial markets waned, and those customers have moved on to OpenOnload. Since then Google has dramatically expanded its use of this ULN technology in-house. Roughly four years ago with the adoption of VMA falling off to less than 2% adoption, Mellanox open-sourced VMA and moved it out to Github. Quietly over the past several years as other cloud providers had recognized Google’s ULN moves, these other players have begun spawning their own ULN projects.

At the same time in 2013 as word leaked out that Google had its own internal ULN project, Intel released their Data Plane Development Kit (DPDK). With DPDK it became much easier for applications to gain access directly to the raw networking device. This did not go unnoticed by China’s Tencent Cloud team as they started with the open source Free-BSD stack, carved out what they needed from it, then ported that on-top of DPDK. The resulting project was called F-Stack, and it can be found on Github today. Other projects like the OpenFastPath Foundation driven by Nokia, ARM, Cavium, and Marvell our advancing their own ULN. So today if you’re seeking out a ULN partner that supports both UDP and TCP your top five options are Solarflare’s Cloud Onload, VMA, F-Stack, OpenFastPath, and Seastar. Only one of these though is commercially available and fully supported, Solarflare’s Onload.

As you consider how you might accelerate your network intensive Web2.0 applications like web servers, software load balancers, in-memory databases, micro-service frameworks, and distributed compute grids you should consider Solarflare’s Cloud Onload. With Cloud Onload we’ve seen performance gains ranging from 50%-400% depending on how network intensive an application is. Over the past decade, Solarflare’s Onload technology has accelerated electronic trading worldwide, and today over 90% of all exchanges, institutional banks, and high-frequency trading shops have installed Onload. The only other ULN technology that even comes close to the worldwide adoption of Onload is MPI, but that’s a ULN stack designed for HPC messaging and it does not support UDP or TCP. If your enterprise relies on any of the Web2.0 classes mentioned above, consider reaching out to Solarflare to learn how they can accelerate your network traffic.

Gone in 98 Nanoseconds

October 18, 2017March 23, 2018 scottcschweitzer HFT, networking, TCP Offload Engine

Imagine a daily race with hundreds of top fuel dragsters all lined up rumbling along in parallel waiting for the same green Christmas tree light before launching off the line. In some electronic markets, with specific products, every weekday morning this is exactly what happens. It’s a race where being the fastest is the primary attribute used to determine if you’re going to be doing business. On any given day only the top finishers are rewarded with trades. Those who transmit their first orders of the day the fastest receive a favorable position at the head of the queue and are likely to do some business that day. In this market, EVERY nanosecond (a billionth of a second) of delay matters, and can be monetized. Last week the new benchmark was set at 98 nanoseconds, plus your trading algorithm, in some cases 150 nanoseconds total tick to trade.

“Latency” is the industry term for the unavoidable network delays, and “Tick to Trade Latency” aggregates together the network travel time for a UDP market data signal to arrive at a trading system, and for that trading system to transmit a TCP order into the exchange. Last year Solarflare introduced Application Nanosecond TCP Send (ANTS) and lowered the “Tick to Trade Latency” bar to 350 Nanoseconds. ANTS executes in collaboration with Solarflare’s Application Onload Engine (AOE) based on an Altera Stratix FPGA. Solarflare further advanced this high-speed trading platform to achieve 250 Nanoseconds. Then in the spring of 2017 Solarflare collaborated with LDA Technologies. LDA brought their Lightspeed TCP cores to the table and replaced the AOE with a Xilinx FPGA board once again lowering the “Tick to Trade Latency” to 120 Nanoseconds. Now through further advances, and moving to the latest Penguin Computing Skylake computing platform, all three partners just announced a STAC-T0 qualified benchmark of 98 nanoseconds “Tick to Trade Latency!”

There was even a unique case in this STAC-T0 testing where the latency was measured at negative 68 nanoseconds, meaning that a trade could be injected into the exchange before the market data from the exchange had even been completely received. Compared to traditional trading systems which require that the whole market data network packet to be received before ANY processing can be done, these advanced FPGA systems receive the market data in the packet in four-byte chunks and can begin processing that data while it is arriving. Imagine showing up in the kitchen before you wife even finishes calling your name for dinner. There could be both good and bad side effects of such rapid action, you have a moment or two to taste a few things before the table is set, or you may get some last minute chores. The same holds true for such aggressive trading.

Last week, in a Podcast with the same name we had a discussion with Vahan Sardaryan, CEO of LDA Technologies, where we went into this in more detail.

Penguin Computing is also productizing the complete platform, including Solarflare’s ANTS technology and NIC, LDA Technologies Lightspeed TCP, along with a high-performance Xilinx FPGA to provide the Ultimate Trading Machine.

The Ultimate Trading Machine

Thinning the 10GbE Herd

February 26, 2013September 2, 2017 scottcschweitzer networking, TCP Offload Engine

This article was originally published in January of 2009 at 10GbE.net.

In 2007 over one million 10GbE network ports were purchased. Many of those were for a switch to switch interconnects but some were to connect servers to networks via 10GbE. Natural selection is now taking effect in the 10GbE NIC market as the big dogs, Intel & Broadcom, start thrashing around in an effort to secure market share as 10GbE matures. Both want to dominate the 10GbE LAN on Motherboard (LoM) market. In the NIC market, four companies likely supply over 80% of the 10GbE NICs purchased and they are Chelsio, Intel, Myricom, and Neterion. The remaining 20% of NIC sales fall to companies like Broadcom, SMC, NetXen, ServerEngines, Tehuti, AdvancedIO, Endace, Napatech, etc… One should be wondering why Broadcom is in the second group, it’s because Broadcom’s focus is on selling 10GbE silicon to OEMs like IBM and HP for LoM projects positioning their silicon on high-end server mother boards and not retailing NIC cards.

Officially the first documented victim is NetEffect, the leader in iWarp (Infiniband for 10GbE) NICs. NetEffect rose from the ashes of a failed Infiniband company, Banderacom, earlier this decade to apply their silicon development skills and Infiniband algorithms to the more stable Ethernet market as a new feature called iWarp. NetEffect in-fact led the iWarp charge, it was the self-proclaimed leader in low-latency iWarp 10GbE NICs. In August NetEffect filed for reorganization in US Bankruptcy Court. With the failure of NetEffect the market has cast its vote and drove a stake through the heart of iWarp, hopefully terminating this feature.

Rumors have been swirling around Teak Technologies, a maker of 10GbE NICs and a switch, for some time. It appears that Teak has not weathered the storm and has since faded away, their domain name is no longer resolving to an IP address. The domain was never transferred from the founder, and the founder announced this spring on Linkedin that he had moved on some time ago. Is it conclusive evidence, no, but would you buy technology from a tech company whose URL won’t resolve to a server?

It is a tough economic climate for start-up NIC companies, particularly those in the bottom 20% as they have likely never had a quarter in the black. Now is a challenging time to be out there seeking another round of capital from ones VCs. Several have been without an injection of new funding for over two years and lack the sales volume required to sustain their own existence much beyond year end. As such we’ve directly questioned one firm to see if they are alive, and another that is widely rumored in the industry to be in trouble, but their marketing departments are still bailing.