Bitcoin Isn’t Anonymous

There is this misconception that one of the key features of Bitcoin as a currency is that it is anonymous, nothing could be further from the truth. In fact, it is even less anonymous than using your credit card as the transaction is posted publicly on the Bitcoin blockchain. Last Wednesday, October 16th, 338 people across 38 countries worldwide learned this first hand. That day the US Department of Justice unsealed indictments against “Welcome to Video” (WTV) and its partners, distributors, and customers. With over one million users WTV was the largest child pornography site ever shut down by law enforcement. Think “Plato’s Boys” and Ron, of “Ron’s Coffee” in the June 2015 “Mr. Robot” pilot, only three times bigger! WTV was executing the complete dark web playbook for conducting illicit activity. They leveraged TOR, The Onion Router network, to distribute content, and Bitcoin to obfuscate funds distribution. What they didn’t know was that companies like Chainalysis exist which crawl through the Bitcoin blockchain and build transaction dependency graphs.   

Chainalysis Graph

Bitcoin was the first and is the most popular digital currency, which makes it easier to use, but it was never designed for anonymity. Think about it, you share the same public wallet ID repeatedly in the clear to accept or send a payment, how can this be anonymous? While a wallet ID isn’t as cut and dry as a credit card number or bank account number and routing ID it is easily traceable through the blockchain. In real life the proceeds from illicit transactions need to eventually be spent on goods and services, otherwise, what’s the point. To do this involves an Exchange that turns Bitcoin into a fiat currency, like US Dollars or UK Pounds Sterling. These exchanges hold the key to translating a public wallet ID into a name and financial institution.     

Law enforcement, working in concert with charities focused on eliminating human trafficking, obtained the public wallet ids used by WTV. Then through Chainalysis’s dependency graph, they could trace customer payments made to WTV as well as payments WTV made to their content suppliers and distributors.  WVT suggested six different Bitcoin exchanges to its customers and partners. From the unsealed indictment, samples were provided of at least three of those exchanges where they translated public wallet IDs into the end user’s name and their banking details. Just another case of following the money. Now I’m not saying that ALL digital currencies are not anonymous. There are at least five newer privacy-based coins like Monero, Dash, ZCash, Verge and Bitcoin Private that exist to provide anonymity, but they’re a story for another day.

Blockchain is the Next Internet

The Internet came into our homes through the phone line, then later cable and more recently fiber. Over 35 years, our digital home connection has grown from 300 bits/sec to one billion. Along the way, we’ve moved from time-share services, which were painful to watch as texted scroll down the screen, to 4K streaming video. Now we download tens of billions of bytes of data in a few minutes with no regard for bandwidth, unless it appears “slow.” The Internet has changed our lives, and guess what? Starting next year, digital currencies will do it again, think Bitcoin, but not Bitcoin itself rather other competing technologies.

Like the Internet, we’ve been eased into digital currencies one step at a time. First with credit cards, then debit cards, and later we added PayPal to handle our eBay sales and purchases. Have you ever left the proceeds from an eBay sale on account with PayPal before eventually spending or moving it into your bank? If you did, then you were using PayPal as a digital wallet, a store of value. Soon after, came Apple Pay, Google Pay and Venmo all more formalized digital wallets specifically designed to store value. The unit of measure within all these systems, at least in the US, is the dollar and they are all digital currency systems. If we look outside the US to China for some examples, we find WeChat and Alipay which are two wildly popular payment platforms.

Alipay is like Apple Pay, but in 2018 it processed nearly 200 billion transactions, by contrast Visa only processed 182 billion transactions worldwide. In June Apple Pay was averaging a measly one billion transactions a month. WeChat is a payment system built on top of a text messaging platform. It also includes a digital wallet element enabling near-frictionless payments between people or businesses, and it processed an amazing 460 billion transactions in 2018. The twist to all of this is that both WeChat and Alipay use the fiat currency of China the Yuan, and both applications are required by Chinese cyber laws to retain all data for six months and to provide backdoors that enable the government to collect whatever data it wants. China doesn’t officially allow access to Facebook and has taken a very hard line on digital currencies like Bitcoin. As you can see we’ve all been properly prepared and programmed to begin accepting other forms of payment. Is it too much of a jump to think you may soon pay an online vendor in something other than dollars?

The catalyst for this epic shift towards digital currencies comes in the form of Facebook which is poised to deliver Libra, their new “stable digital currency”, sometime in 2020. A stable digital currency is one that is pinned to one or more fiat currencies. In Libra’s case, they are accepting deposits in US Dollars, Euros, the British Pound, and the Japanese Yen. This is designed to ensure that the value of Libra doesn’t swing widely around like we’ve seen with Bitcoin over the past several years. Even with these fiat currency tie-downs governments around the globe are fighting Facebook and its release of Libra knowing the danger it represents.

The principal tool a country leverages to control its citizens is its currency. Countries can track currency through the banking and taxing infrastructure already in place. We’ve all watched police dramas where the investigator pulls up the financial records of the suspect, they literally follow the money. Decentralized anonymous digital currency systems provide NO capability to track or control the users of that currency. This is why Facebook will be standing toe to toe with the US Federal government in a series of hearings starting the 23rd of October. Libra has other companies and countries around the globe scrambling to secure a toe-hold in the digital currency market before Facebook dominates everything. In a prior post, we talked about Facebook and Libra, so you can visit that if you’d like to learn a bit more. JP Morgan Chase, Wells Fargo, Fidelity, Amex, and Amazon are all rolling out or supporting digital currency or payment systems. While countries such as China and the US along with the EU are also moving fast to issue alternative digital currencies and bring online payment systems out to secure their position before Libra has a chance to become the fiat (default) currency of the new economy.

Five large companies that are setting the pace for digital currencies are JPMorgan Chase (JPMC), Wells Fargo, Fidelity, American Express (Amex) and Amazon. In June JPMC announced it was beginning trials of its “JPM Coin”. This new coin is a private version of Ethereum running on the Quorum blockchain but developed by JPMC. JPM Coin is designed to be a “stable cash” coin with JPMC depositing cash to secure these tokens. Real-world trials will begin in a few months. Initially, it will be used internally to process transactions and to settle bond and commodity trades. “Wells Fargo Digital Cash” like JPM Coin, is their stable cash token and it will also be used internally to settle transactions throughout the Wells Fargo global network. Wells Fargo has announced that later this year it will be enabling JPMC to also gain access to this network. The real value for both Wells and JPMC is the underlying blockchain secure ledger technology. Amex is also running down the blockchain path as they are a significant player in the Hyperledger project. One of Amex’s largest expenses is its rewards program, so they’re moving to Hyperledger as a means for tracking rewards and for dynamically executing rewards promotions. While JPMC and Wells are pushing coins and blockchains, Fidelity has gotten even more creative. Fidelity is pushing out Kn0x which is insurance for digital currency a customer or institution has stored with Fidelity. This is insurance against theft or loss of the currency, not against any decline in the value of the currencies. One of the biggest concerns around digital currencies is the theft or loss of the tokens themselves. By contrast, Amazon has gone old school in September by teaming up with none other than Western Union to roll out “Amazon PayCode” in the US, it was available Internationally prior to September. This enables Amazon users, who only have hard cash, a way to pay for Amazon products. It should be noted that in 2017 Amazon has offered customers “Amazon Cash” which is an Amazon digital wallet that stores a cash balance. This wallet can then be used to pay for Amazon purchases. Amazon has also been buying up digital cash related domain names in an effort to hedge it’s bet if it were to roll out a coin of its own, along the lines of Overstock’s Ravencoin. Countries are also jumping on the digital currency bandwagon to head off Libra.

While the United States is NOT rolling out a digital dollar anytime soon it is delivering a “real-time payment service” called Fednow designed to settle bank to bank transactions in near real-time. This is in some ways like what Wells and JPMC are doing. It will eventually replace the overnight ACH system currently run by the Federal Reserve. The EU has the Bank of International Settlements or BIS. Which is somewhat shrouded in secrecy and is doing something, but exactly what is still TBD but something akin to a Eurocoin is rumored to be in the works.  Meanwhile, China is poised to release it’s Digital Currency Electronic Payment (DCEP) system that is really nothing more than a digital Yuan. It is expected that Alipay and WeChat will jump right on board and may already have completed their integration efforts. The real question is what is China waiting for?

Digital currencies and electronic payment systems are rapidly becoming pervasive. Today there are nine digital wallets on my phone; Apple, Venmo, and PayPal, exclusively use dollars, at least while I’m in the US. The other six wallets support currencies like Bitcoin, Ravencoin, Ethereum, and a few others. With a bit of effort, I can move value between dollars and these other currencies forever obfuscating the source of these funds. Several of these wallets are gaining value on their own through mining digital currencies so the source for these funds was secure from their inception, but that’s a blog for another day. The point is that digital currencies are here to stay, and they’ll become as pervasive as the Internet over the coming decade.

Today there are nine digital wallets on my phone; Apple, Venmo, and PayPal, exclusively use dollars, at least while I’m in the US. The other six wallets support currencies like Bitcoin, Ravencoin, Ethereum, and a few others. With a bit of effort, I can move value between dollars and these other currencies forever obfuscating the source of these funds. Several of these wallets are gaining value on their own through mining digital currencies so the source for these funds was secure from their inception, but that’s a blog for another day. The point is that digital currencies are here to stay, and they’ll become as pervasive as the Internet over the coming decade.

Digital Currency: The Intrinsic Value Argument

US Treasury Silver Certificate

“Money hasn’t been real since we went off the gold standard. Its become virtual, software, the operating system of our world.”

Mr. Robot, Season 1 eps1.0_hellofriend.mov, June 24, 2015

When friends say that Bitcoin “has no value because there is nothing behind it”, you’re being sucked into the “intrinsic value argument.” Intrinsic value means that the token you’re discussing has an obvious value which requires no faith or belief in ANYTHING. Coins were once a good example of a form of money that had intrinsic value because the metal in the coin was gold or silver and could easily be melted down to make something else of value. Today, most citizens have been seduced by the illusion that their countries currency, in our case the dollar, has intrinsic value. This is a lie; a dollar is worth nothing more than what we collectively believe. This has been true since 1971 when the US completed going off the gold standard. As a boy, I remember coming across US dollar bills with a “Silver Certificate” banner across the top (like the one pictured above). At one time you could take those specific currency notes to a Federal Reserve Bank and exchange them for an equal amount of silver coins, this was the final vestige of the dollar having some sort of intrinsic value. Today those notes are rarely if ever encountered in circulation as most of them have been destroyed or collected. If you were to hand a silver certificate to a twenty-something bank teller today and demand real silver they’d either look at you with a dumb stare or check to see if you’d arrived in a smoking DeLorean.

Now some might argue that you can take a dollar bill and exchange it for quarters or half dollar coins that have an intrinsic value. Years ago, this was true, Quarters minted before 1964 were silver, but from 1964 on the US moved to alloys by mixing in less valuable metals like Nickel, Copper, and Zinc. Today a US Quarter is 92% Copper and 8% Nickel so if you were to melt four of these down, separate out each metal and sell them off at current market prices you’d have lost 86% of the perceived value of your original dollar bill. Note that it doesn’t account for your time spent doing all this work, and the energy required to melt down the coins. This means that even our coins have marginal intrinsic value. Now if you’re reading this in another country guess what, all this applies as well, no country today is on any form of precious metal standard. The ONLY value ANY currency has is our collective belief, our confidence, that it does, in fact, have value.

Furthermore, we regularly buy and sell stocks, bonds, and other securities that also have no intrinsic value, does that make them any more or less “real” than Bitcoin? Today the market capitalization of Bitcoin is $150 billion dollars, making it as valuable as Citigroup. You need only check your wallet to see just how real Citigroup is as you’re likely carrying around a Citigroup card. Is Bitcoin any less real than that seemingly worthless piece of plastic? Companies are legal constructs, that can be created or destroyed overnight. We’d need only look at Enron and Thomas Cook for example. They were once legal constructs just like Citigroup. Our monetary system is based on confidence, nothing more.

“In the fallout of the Great Depression, FDR closed all the banks for a bank holiday and then he reopened them in stages when they were reported to be sound. Later, historians discovered what we in this room now know; that those reports, they were mostly lies. Nevertheless, it worked, it worked because the public believed the government had everything under control. You see? That is the business model for this great nation of ours. Every business day when our market bells ring, we con people to believing in something: The American Dream, family values…; could be freedom fries for all I care. It doesn’t matter! As long as the con works and people buy, sell whatever it is that we want them to.”

CEO Phillip Price, Mr. Robot, eps2.0_unm4sk-pt2.tc, July 13, 2016

While this quote is somewhat dark it crisply demonstrates a historic example of how trust, and by extension confidence, is the foundation of our monetary system. So next time someone says Bitcoin is worthless ask them if they own any stock.

P.S. Mr. Robot returns Sunday night October 6th for its fourth and final season.

Digital Currency: Money, Our Second Social Network

This is the first in a series designed to dispel the mystique of digital currency, think Bitcoin, but trust me we’ll go way beyond that. My goal is to explain in common language all you’ll need to know about digital currency, so you can confidently answer questions when you sit down with your first social network, your family.

As a species, our most significant evolutionary trait is our capability for building social networks. While social networks are found in many other high order mammals, we really do take it to the next level. Let’s face it for our first 36 months outside the womb we’re pretty much a defenseless bag of water rolling around wherever we’re placed. We can’t even effectively flee from even the most basic predator without assistance. From the moment we’re born we establish strong bonds with our parents and siblings, who become our first social network, our family. In our formative years, this network fills all our basic needs. It is this first network that introduces us to the second network we join, often very early in life, and that is the network of money.

Initially, we learn how to spend our first social network’s money as we roll down the aisle of the market and point out the foods we like. Soon members of our first network are sharing their money with us in return for our time when we do a task or achieve a milestone. Many of you up to this point possibly never viewed money as a social network, until recently it hadn’t occurred to me, but in fact, it is. On its face money is nothing more than a worthless token designed exclusively to be exchanged. It is these exchanges that form a network of commerce. If you closely examine your tightest social bonds, outside of your first network, they very likely were started or fueled by money. Outside of family one of my closest friends exists because over a decade ago I exchanged money in return for joining another social network. While I haven’t been associated with that network for years, this friend still shows up whenever I need him most, and it’s no longer about money.   

Terminology is critical to our understanding. Mentally we often have trouble grasping something until we can assign it a name. For years you’ve carried money around in your pocket, but have you ever considered it as your fiat currency? Here in the US, our fiat currency is the dollar. It’s very possible you’ve taken it for granted so long you never even viewed it this way. A fiat currency is the one officially sanctioned and managed by the government, another network, under which you live. If you only need one currency, in my case the dollar, then the concept of fiat becomes mute, but what happens when you begin to use more than one?

Sitting here in North Carolina a month ago for the first time in my life I need to spend some Bitcoin which I’d been gifted a few years earlier. Bitcoin is the most well-known digital currency, but it is NOT a fiat currency because it hasn’t been issued by a government. The vendor I wished to purchase a small digital currency mining rig from in China ONLY accepted Bitcoin, NOT my US dollars via credit card. At the time Bitcoin was trading around $10,000 USD so buying something for $250 USD meant I was spending a fraction of a Bitcoin. I’m only aware of a single fractional unit of a Bitcoin and that’s called a Satoshi which is equal to one ten-millionth of a Bitcoin. So, I shelled out roughly 2.5M Satoshi for my rig and anxiously awaited its arrival. We’ll dive more into Bitcoin and Satoshi in future posts, but I thought it a prudent example where the fiat currency wasn’t accepted. Next year we’ll have Libra, Facebook’s digital currency, and that’s already giving those who manage our fiat currencies serious concerns.

In computer science, there’s a concept called Metcalfe’s law which states that the value of a network is equal to the square of the number of nodes or members in that network. My extended family has roughly fifty members, so its potential value is fifty squared or 2,500. My friend has nearly 100 in his extended family, so his family’s value is 10,000. Metcalfe’s law came along with computer networking, long after Alexander Graham Bell had invented the telephone, but Bell was aware that the value of his invention would only be truly realized once it had been widely adopted. Within 10 years of its invention over 100,000 phones had been installed in the US. Bell died 46 years after his invention knowing that his new network had changed the world.  

Facebook is the largest social network our species has ever created. With over two billion active users this means that one person in four uses this network monthly. Right now, the clear majority of those two billion people live their lives with their fiat currency, and unless they travel they rarely if ever deal with another. Next year Facebook will issue its fiat currency, Libra, and it will change everything. Now it should be noted that Facebook isn’t a country so the concept of them issuing currency has raised some serious concerns from those who do issue currency. Countries around the globe are taking Facebook’s Libra head-on because they know it represents a Pandora’s box of problems for their own monetary systems, and here’s the main reason why.

Looking at Facebook as a network we could say its value is two squared or four, for now, let’s drop the all the billions as it just makes the numbers incomprehensibly large. The US has a population of 0.327 billion so the value of it as a network (again without the billions) is 0.1 and China has a population of 1.386 billion, so the value of its network is 1.9. If we now view these countries populations as networks, we see that Facebook’s value is twice that of the US and China combined. Extend that to currencies and you can why all the fuss, and why you might need to understand digital currency.

Next, we’ll dismiss the intrinsic value argument, that’s where grandpa says Bitcoin is worthless because there’s nothing behind it. You can then counter with the US went off the gold standard decades ago so why isn’t the dollar worthless? We’ll provide you with that side of the argument.    

Size Matters, Especially in Computing

Yes, this is a regular size coffee cup

The only time someone says size doesn’t matter is when they have an abundance of what it is that’s being discussed. Back in the 1980s some of us took logic design and used discrete 7400 series chips to build out our projects. A 7400 has four two-input NAND gates, with four corresponding outputs, as well as power and ground pins. It is a simple 14 pin package about 3/4 of an inch long and maybe a quarter-inch wide that contains a grand total of sixteen transistors. Many of the basic gates we needed for our designs used that same exact package form factor which made for great fun. Thankfully we had young eyes back then because often times we’d be up till all hours of the night breadboarding our projects. We knew it was too late when someone would invariably slip up, insert a chip backward, and we’d all enjoy the faint whiff of burnt silicon.

Earlier this month Xilinx set a new world record by producing a field-programmable gate array (FPGA) chip which is a distant cousin of the 7400 called the Virtex Ultrascale+ VU19P. Instead of 16 transistors, it has 35 billion, with a “B”. Also, instead of four simple two-input, one output logic gates, it has nine million programmable system logic cells. A system logic cell is a “box” with six inputs and one output that is fully configurable and highly networked. Each individual little “box” is programmed by providing a logic table that maps all the possible six input combinations to the single output. So why does size matter?

Imagine you gave one child a quart-sized Ziplock bag of Legos and another several huge tackle boxes of pre-sorted bricks including Lego’s own robotics kit. Assuming both children have similar abilities and creativity which do you think will create the most compelling model? The first child’s solution wouldn’t be much larger than an apple and entirely static. While it could be revolutionary, it is limited to the constraints of the set of blocks provided. By contrast, the second child could produce a two-foot-tall robot that senses distance and moves freely about the room without bumping into walls. Which solution would you find compelling? In this case size matters in both the number and type of bricks available to the builder.  

The system logic cells mentioned above are much like small Lego bricks in that they can easily replicate the capability of more complex bricks by combining several smaller ones. FPGAs are also like Legos in that you can quickly tear down a model and re-use the build blocks to assemble a new model. For the past 30 years, FPGAs have had limitations that have prevented them from going mainstream. First, it was their speed and size, then it was the complexity of programming them. FPGAs were hard to configure, but the companies behind this technology learned from the Graphical Processing Unit (GPU) market and realized they needed tools to make programming FPGAs easier. Today new tools exist to port C/C++ programs into FPGA bitstreams. Some might think that the decade of 2010 was the age of the GPU, while 2020 is shaping up to become the age of the FPGA.     

x86 Has Hit the Wall, and Now Come the Accelerators – Part 3

TV’s Original A-Team

Accelerators are like calling in a special forces team to address a serious competitive threat. By design, a special forces team, known as “A Detachment” or “A-Team” consists of two officers and ten sergeants, all of which are cross-trained in five different skill areas: weapons, engineering, medical, communications, and operations intelligence. This enables the detachment to survive for months or even years in a hostile area without any operational support. Accelerators are the computational equivalent.

A well-designed accelerator has different blocks of silicon to address each of the four primary computational workloads we discussed in part two:

  • Scaler, working with integers, and letters
  • Floating-point, the real numbers with decimal points
  • Vector, one-dimensional arrays of floating-point numbers
  • Artificial Intelligence (AI), vectors with low precision floating point mixed with integers

If workload types are much like special forces skills then what types of physical computational cores can we leverage in an accelerator design that are optimized to address these specific tasks?

For scaler problems, Intel’s x86 platform has led for decades as far back as the early 1980s. Quietly over the last 25 years, the ARM architecture has evolved. In the past five years, ARM has demonstrated everything necessary for it to be a serious data center player. Add to that ARM’s architecture licensing model which has led to third parties developing their cores which are instruction set compatible. Both of these factors have resulted in at least a dozen companies from Apple to Samsung developing their ARM core designs. Today ARM cores can be found in everything from Nest Thermostats to Apple iPhones. Today the most popular architecture for workload acceleration is the ARMv8-A. Specifically, the Cortex-A72 design which supports both 32bit and 64bit computing, with 1-4 computational cores. Today the Broadcom Stingray, Mellanox Bluefield, NXP Layerscape, and Xilinx Versal all use the ARM Cortex-A72.

When it comes to accelerating floating point, the current trend has been towards Graphical Processing Units (GPUs). While GPUs have been around for about a decade now, it wasn’t until the NVIDIA Tesla debuted that they were viewed as a real computational accelerator. GPUs are also suitable for the third workload model vector processing. In essence, GPUs can kill two birds with one computational stone. Another solution that can also accelerate certain types of floating-point operations are digital signal processing (DSP) engines. DSPs are very good at real-world computational problems that have a high degree of multiple-accumulates and matrix operations. Here is where some accelerator boards are stronger than others. While the Broadcom Stingray only has a cryptographic engine designed to handle single-pass hashing and encryption/decryption (both scaler tasks) it lacks any sort of added acceleration for floating-point math. Mellanox’s Bluefield chip also doesn’t include any silicon specifically dedicated to floating-point. What they do promote is the fact that Bluefield provides GPUDirect so the processor can communicate directly with GPUs on another PCIe card. NXP only has ARM cores, so no additional floating-point support is provided. By contrast, Xilinx’s Versal architecture includes anywhere from 472 to 3,984 DSP engines depending on the chip series and model.

Artificial Intelligence (AI) workloads leverage vector processing but instead of high precision floating point, they only require low precision or integer numbers. Again Broadcom, Mellanox, and NXP all fall short as they don’t include any silicon to process these workloads directly. Mellanox as mentioned earlier, does support GPUDirect for passing AI workloads to another PCIe board but that’s a far cry from on-chip dedicated silicon. Xilinx’s Versal architecture includes anywhere from 128 to 400 AI engines for accelerating these workloads.

Finally, the most significant differentiator is the inclusion of FPGA logic, also known as adaptable engines. This is something unique to only Xilinx accelerator cards. This is the capability to take frequently called upon routines written in C/C++ and port them over to dedicated logic which can improve the performance of a routine by at least 8X.

In the case of Xilinx’s new Versal architecture, the senior officer is an ARM Cortex-R5 for real-time workloads. The junior one officer, and the one who does much of the work, is an ARM Cortex-A72 quad-core processor. The two ARM engines are primarily for control plane functions. Then Versal has AI cores, DSP engines, and adaptable engines (FPGA logic) to accelerate the volumetric workloads. When it comes to application acceleration in hardware the Xilinx Versal is the A-Team!

x86 Has Hit the Wall, and Now Come the Accelerators – Part 2

Before we return to accelerators as a solution, we need to make a pit stop and explore the how behind the why. The why is simple; we buy a product or service to solve a problem. We intellectually evaluate stories and experiences, distill out the solutions that apply then affix those to tangible objects or services we can acquire. Rarely does someone buy an iPad to own an iPad, they have a specific use case in mind as their justification for that expense. The same holds for servers and accelerator cards. At this point in our technological evolution, the how for most remains a mystery which needs some explanation. 

When a technician visits your home to fix a broken appliance, they don’t just walk in with a lone flat-bladed screwdriver. They carry a pretty large toolbox which was explicitly assembled to repair appliances. The contents of that tool box are different than those of a carpenter’s or automotive mechanic’s. While all three might have a screwdriver, only the carpenter would have a wood chisel, and the mechanic a torque wrench. Different problems demand different tools. For the past several decades, many of us have viewed the x86 architecture as the computational tool to solve ALL our information processing issues. Guess what, a great many things don’t optimize well to the x86 model, but if you throw enough clock cycles and CPU cores at most problems, a solution will eventually be reached.

The High-Performance Compute (HPC) market realized this many years ago, so they built heterogeneous computing environments with schedulers for each type of problem. They classified problems into scaler, floating-point, and vector. Since then we’ve added, Artificial Intelligence (AI), also known as Machine Learning (ML). Scaler problems are the ones that deal with integers (numbers without a decimal point) which is often how we represent text. So, for example, a database lookup of your name to fetch your address is entirely a scaler problem. Next, we have floating-point, or calculations with a decimal point, the real numbers. These require different computational routines, and as early as 1983, we introduced special numerical co-processors (early accelerators) in our PCs to handle this specific class of problems (ex. Intel 8087). Today we can farm these class of problems out to Graphical Processing Units (GPUs) as they have many parallel cores explicitly designed for this purpose.

Then there’s the mysterious class called vector computing. A vector is a one-dimensional array of numbers. Some might argue that vectors are just a special case of floating-point problems, and they are, but their treatment at the processing level sets them far apart. Consider the Pythagorean theorem. Solving for C when you know A and B requires not only a floating-point processor but many steps to arrive at the value for C. For illustration let’s say it takes ten CPU instructions to arrive at a value for C, it’s probably more. Now imagine you have a set of 256 values for A and a corresponding set of 256 values for B, this would take 2,560 instructions to produce a solution, the complete set C. A vector processor will load the entire set of A and B values at the same time into CPU registers, square the results in one instruction, sum them in another, square-root the last result in another then present the solution set C in a final instruction, a few instructions instead of 2,560. Problems like weather forecasting map extremely well into the vector processing model.

Finally, there is the fourth, relatively new, class of problems that fall into the realm of AI or ML. Here the math being done is vector based, its a mix of both integer (scaler) and real numbers, but with intentionally low precision. The difference being that the value computed doesn’t always need to be perfect, just close enough. Much like when you do your taxes, and you leave off the change in your calculations. The IRS is okay with whole numbers because they’re good enough. Your self-driving car can drift an inch or so in any direction, and it won’t make any difference as it will still be more accurate than your Grandma Nat behind the wheel.

So now, back to the problem at hand, how do we accelerate today’s complicated workloads? For the past three decades, we’ve been taking a scaler platform, the x86 processor with floating-point capabilities, and using it as a double-ended screwdriver with both a flat and a Philips head to address every problem we have. How do we move forward?

Stay tuned for part three, where we cover hardware acceleration platforms.

x86 Has Hit the Wall, and Now Come the Accelerators

“… when you have access to the vastness of space, you realize there’s only one resource worth fighting over… even killing for: More time. Time is the single most precious commodity in the universe.”

— Kalique Abrasax, Jupiter Ascending (2015)

Computing is humanities purest quest to convert time into work. In 2000 IBM demonstrated slicing one second into 10 billion units (10GHz) and then squeezing computational work out of each unit. At the time IBM had defined a new 130-nanometer process they called “CMOS 9S“. It was planned for future generation PowerPC chips. In parallel IBM was ramping up production of the POWER4 at 1.9GHz. Now you may be asking yourself, “but wait a minute I’ve never seen any production 10GHz CPUs, especially not 20 years ago,” and you’re correct. IBM’s POWER6 was as close as we’ve gotten with one version of that chip advertised at 5GHz, and in the lab they achieved 6GHz. I’ve also heard IBM reps brag about 7GHz with POWER8 if you turn half the cores off. So why has computing hit the wall at 4-5GHz and computation not reached 10GHz over the last twenty years?

Intel explained this five years ago in the blog post, “Why has CPU frequency ceased to grow?” The problem has a name called the “conveyor level.” Imagine a CPU as a conveyor belt driven assembly line with four workstations labeled A through D. Since an assembly line is a serial process the worker at station B can’t start until the worker at station A finishes. Ideally, each station is designed to take the same amount of time to finish their work, so the following station isn’t impacted. The slowest worker then defines the speed of the conveyor on any given day. So if the most time-consuming stage in the CPU pipeline is 250 picoseconds, then the clock frequency is 4GHz. There is also the issue of heat.

As an electron races through a computer circuit, it experiences a form of friction, known as resistance. Just like rubbing your hands together on a cold day produces heat, so does an electron zipping through a computer circuit. When designing any chip heat is the enemy. The smaller the chip geometry, today its seven nanometers, the more devices you can pack into a given space on a chip. More devices mean more heat. That same square centimeter of space at 7nm still has the same thermal limitations it did at 130nm 20 years ago. Sure we can use fancy liquid systems to rapidly wick heat away from the chip, instead of relying on airflow over an area limited heat sink, but at the end of the day, every watt of power the chip consumes becomes heat. Now there are individual circuits throughout the chip specifically designed to detect and respond to over-heating situations. The last thing anyone wants is a smoldering piece of silicon where their CPU once was. In the 7GHz example above, the IBM representative said that if you viewed the POWER8 chip as a big chessboard and you turned off all the CPU cores on the white squares than all the cores on the black squares could be clocked at nearly twice the speed or 7GHz. Why is this interesting?

For some computational problems its much better to have two consecutive computations in the same unit of time than two unrelated ones. Electronic trading, also known as high-frequency trading (HFT) is the premier market-driven problem that benefits most from increasing clock frequency. Traders often ascribe a dollar value to a millionth of a second, and it varies from market to market based on the rules and volumes of each market. In the end, though it always boils down to the trader’s speed and response to a market signal. If I’m faster than you at making the right decision, then I win the business and book the profit. Sticking with HFT, where do accelerators fit in?

Traders lease connections to exchanges. The closer and faster they can respond to signals from those connections, the more competitive they will be. Suppose my trading platform requires signals from the market to travel through my server, then another switch on my private network, back through a second server, then finally out to the market. The networking alone, even with kernel bypass through two servers and a switch could easily be several microseconds. Add a few more microseconds for trading logic in both servers, and you could be looking at almost ten microseconds to submit a trade in response to a signal. Two years ago Solarflare with LDA Technologies demonstrated 98 nanoseconds tick to trade. This was using accelerator technology and compared to the trading platform mentioned above; it is three orders of magnitude faster. That’s the difference between walking from NYC to LAX versus flying at Mach 5 and arriving in an hour. Time matters and acceleration is not just for HFTs anymore. Why do you think Google bought Myricom, Amazon picked up Annapurna Labs, Nvidia purchased Mellanox, or Xilinx acquired Solarflare?

Please stay tuned, more to come in part two. In the meantime feel free to check out previous articles on this topic:

The Importance of “Local”

Binary translates to “Local”

We’ve all attended large industry international trade conferences hosting tens of thousands of people. These are spectacles designed to raise brand awareness, educate those in attendance about industry advances, network with colleagues you haven’t seen in a spell, all while promoting new products and services. By contrast there are also smaller regional industry trade shows that are scaled-down versions of these larger events with many of the same objectives, and then there are Security BSides events.

For those not familiar with BSides, they were started in 2009 to further educate folks on cybersecurity at the city and regional level. Think Blackhat, but on a Saturday at the local civic center, and with perhaps 200 people instead of 19,000. Let’s face it, most security engineers are introverts so socializing at significant events like Blackhat is uncomfortable. While bringing a few coworkers or friends on a Saturday to a BSides event can be downright fun. Let’s face who doesn’t want to sit for 20-30 minutes in the lock-pick village with their friends to test their skills on some of MasterLock, Schlage or Kwikset’s most common products. It’s heartwarming to teach a NOOB (short for a newbie) how to pick a lock, then watch their excitement when the hasp clicks open for the first time.

Then there’s always the Capture the Flag (CTF) or wireless CTF for when you’re not interested in the session(s) being offered. If you’ve not played a security capture the flag event before then you really are missing something. It is a challenging series of puzzles served up Jeopardy-style. Say 10 points if you can decrypt this phrase. Or 20 points if you can determine whose attacking your machine on five different ports. Perhaps another 50 points if you can write a piece of code that can read a web page, unscramble five words, and post the five proper words back to the website in three seconds before the clock expires and the words are no longer valid. It’s an intellectual problem solving competition at its finest, and did I mention there is a leaderboard. Often projected high on the wall for all to see throughout the day are the teams with the highest scores. It really warms the heart when your team is the second on the board and it stays in the top five most of the day. While we were the second on the board at BSides Asheville, we didn’t stay in the top five for long.

More seriously though, for a $20 entry fee (which includes a T-shirt) these BSides events offer an affordable local event for cybersecurity engineers and hobbyists. BSides enables socially challenged people the opportunity to step out of their shell, and reach out to similar like-minded individuals while networking in a comfortable and technical space. You can bond over lock-picking, a CTF challenge, during lunch or between sessions. Bring one of your nerd friends as a wingman, or better yet several to form a CTF team, and make a day of it. If you’d like to check out an online CTF one of our favorites is RingZer0. If you want to see the hacker side of the Technology Evangelist, W3bMind5, or read about his team’s experiences at BSides Asheville then they can be found at RedstoneCTF.

The RedstoneCTF team may be attending BSidesCLT on September 28th and BSidesRDU on October 19th.

7nm, Miniaturization to Integration

Last night while channel surfing I came across Men in Black III, and was dropped right into the scene where a 1969 Tommy Lee Jones was placing Will Smith into the Neuralizer pictured on the left. For those not familiar with the original 1997 MiB franchise a Neuralizer is a cigar-sized plot device for washing peoples memories of an alien encounter that is normally carried inside their jacket pocket. The writers were clearly poking fun at miniaturization and how much humanity has come to take it for granted.

Those of us who grew up in the 1960s and 70s lived through the miniaturization wave as the Japanese led the industry by shrinking radios and televisions from cabinet sized living room appliances to handheld devices. One year for Father’s Day in the late 70s we bought my dad a portable black and white TV with a radio that ran on batteries so he could watch it on the boat in the evenings. It was roughly the size of three laptops stacked on top of one another. It may sound corny now, but it was amazing back then. Today we watch theater quality movies in color, on a much larger screen from a device that drops into our pocket and don’t think twice about it. We’ve grown accustom to technology improving at a rapid rate, and it’s now expected, but what happens when that rate is no longer sustainable?

Last year the industry began etching chips with a new seven nanometer process, which is equivalent to Intel’s 10nm process. Apple’s A12 Bionic chip that powers their XR and XS series iPhones is one of the first using this new 7nm process. This chip contains 6,900 million transistors and is arguably one of the most advanced devices every produced by mankind. By contrasts, my first computer in 1983 was a TRS-80 Model III powered by the Zilog Z80 processor. The Z80 used a 4,000nm process and only contained 8,500 transistors. So in 35 years we’ve reduced the process size by three orders of magnitude resulting in a transistor density improvement of six orders of magnitude, wow! How do we top that, and where are we in the grand scheme of the physics of miniaturization?

In a 1965 paper by Gordon Moore, then founder of Fairchild Semiconductor and later CEO of Intel, Gordon stated that the density of integrated circuits would double every year, now known as Moore’s Law. From 1970 through 2014 this “law” had essentially proved true. Before Intel’s current 10nm geometry their prior generation was 14nm and that was achieved in 2014 so it’s taken them five years to accomplish 10nm. Not exactly Moore’s law, but that’s the tip of the iceberg. As the industry goes from 14nm to 7nm/10nm physics is once again throwing up a roadblock, this hasn’t been the first one, but it could be the last one. Chips are made using Silicon, and Silicon atoms have a diameter of about 0.2 nanometers. So at a seven nanometers node size, we’re talking 35 or so silicon atoms, which isn’t a very large number. It turns out that below seven nanometers, as we have fewer and fewer silicon atoms to manage electron flows, things get dicey. Chips begin to experience quantum effects, most notably those pesky electrons, which are about a millionth of a nanometer in size, begin to exhibit something called quantum tunneling. This means that they no longer behave like they are supposed to and they move between devices etched into the silicon with a sort of reckless disregard for the “normal” rules of physics. This has been known though for some time.

Back in 2016 a team at Lawrence Berkley National Labs demonstrated a one nanometer transistor device, but that leveraged Carbon nanotubes to manage electron flow and stave off the quantum tunneling effect. For those not familiar with Carbon nanotubes think teeny tiny diamond straws where the wall of the straw is one atom thick. While using Carbon nanotubes to solve the problem is ingenious, it doesn’t fit into how we make chips today as you can’t etch a Carbon nanotube using conventional chip fabrication processes. So while it’s a solution to the problem it’s one that can’t easily be utilized. So we may be working at 7nm for some time to come. This only means that one aspect of miniaturization has ground to a halt. When I’ve used the term chip above to represent an integrated circuit the more precise term is actually a “die.”

Until recently it was common practice to place a single “die” inside a package. A package is what most of us think of as the chip as it has a bunch of metal pins coming out of the bottom or sides. In recent years the industry has developed new techniques that allow us to layer multiple dies onto one another within the same physical package enabling the creation of very complex chips. This is similar to a seven-layer cake where different types of cake can be in each layer and the icing can be used to convey flavors across the cake layers. This means that a chip can contain several and eventually many dies, or layers. A recent example of this is Xilinx’s new Versal chip line.

Within the Versal chip package there are multiple dies that contain two different pairs of ARM CPU cores, hundreds of Artificial Intelligence (AI) engines, thousands of Digital Signal Processors (DSP), a huge Field Programmable Gate Array (FPGA) area, several classes of memory, and multiple programmable memory, PCIe and Ethernet controllers. The Versal platform is a flexible toolbox of computational power, with the ARM cores handling traditional CPU and real-time processing tasks. The AI cores churn through new machine learning workloads while the DSPs are leveraged for advanced signal processing, think 5G, and the FPGA can be used as the versatile computational glue to pull all these complex engines together. Finally, we have the memory, PCIe and Ethernet controllers to interface with the real world. So while Intel and AMD focus on scaling the number CPU cores on the chip and NVidia works to improve Graphical Processing Unit (GPU) density Xilinx’s is the first to go all-in on chip-level workload integration. This is the key to accelerating the data center going forward.

So until we solve the quantum tunneling problem, with new fabrication techniques, we can utilize advances in integration as shown above to move the industry forward.