
Before we return to accelerators as a solution, we need to make a pit stop and explore the how behind the why. The why is simple; we buy a product or service to solve a problem. We intellectually evaluate stories and experiences, distill out the solutions that apply then affix those to tangible objects or services we can acquire. Rarely does someone buy an iPad to own an iPad, they have a specific use case in mind as their justification for that expense. The same holds for servers and accelerator cards. At this point in our technological evolution, the how for most remains a mystery which needs some explanation.
When a technician visits your home to fix a broken appliance, they don’t just walk in with a lone flat-bladed screwdriver. They carry a pretty large toolbox which was explicitly assembled to repair appliances. The contents of that tool box are different than those of a carpenter’s or automotive mechanic’s. While all three might have a screwdriver, only the carpenter would have a wood chisel, and the mechanic a torque wrench. Different problems demand different tools. For the past several decades, many of us have viewed the x86 architecture as the computational tool to solve ALL our information processing issues. Guess what, a great many things don’t optimize well to the x86 model, but if you throw enough clock cycles and CPU cores at most problems, a solution will eventually be reached.
The High-Performance Compute (HPC) market realized this many years ago, so they built heterogeneous computing environments with schedulers for each type of problem. They classified problems into scaler, floating-point, and vector. Since then we’ve added, Artificial Intelligence (AI), also known as Machine Learning (ML). Scaler problems are the ones that deal with integers (numbers without a decimal point) which is often how we represent text. So, for example, a database lookup of your name to fetch your address is entirely a scaler problem. Next, we have floating-point, or calculations with a decimal point, the real numbers. These require different computational routines, and as early as 1983, we introduced special numerical co-processors (early accelerators) in our PCs to handle this specific class of problems (ex. Intel 8087). Today we can farm these class of problems out to Graphical Processing Units (GPUs) as they have many parallel cores explicitly designed for this purpose.

Then there’s the mysterious class called vector computing. A vector is a one-dimensional array of numbers. Some might argue that vectors are just a special case of floating-point problems, and they are, but their treatment at the processing level sets them far apart. Consider the Pythagorean theorem. Solving for C when you know A and B requires not only a floating-point processor but many steps to arrive at the value for C. For illustration let’s say it takes ten CPU instructions to arrive at a value for C, it’s probably more. Now imagine you have a set of 256 values for A and a corresponding set of 256 values for B, this would take 2,560 instructions to produce a solution, the complete set C. A vector processor will load the entire set of A and B values at the same time into CPU registers, square the results in one instruction, sum them in another, square-root the last result in another then present the solution set C in a final instruction, a few instructions instead of 2,560. Problems like weather forecasting map extremely well into the vector processing model.
Finally, there is the fourth, relatively new, class of problems that fall into the realm of AI or ML. Here the math being done is vector based, its a mix of both integer (scaler) and real numbers, but with intentionally low precision. The difference being that the value computed doesn’t always need to be perfect, just close enough. Much like when you do your taxes, and you leave off the change in your calculations. The IRS is okay with whole numbers because they’re good enough. Your self-driving car can drift an inch or so in any direction, and it won’t make any difference as it will still be more accurate than your Grandma Nat behind the wheel.
So now, back to the problem at hand, how do we accelerate today’s complicated workloads? For the past three decades, we’ve been taking a scaler platform, the x86 processor with floating-point capabilities, and using it as a double-ended screwdriver with both a flat and a Philips head to address every problem we have. How do we move forward?
Stay tuned for part three, where we cover hardware acceleration platforms.
[…] stay tuned, more to come in part two. In the meantime feel free to check out previous articles on this […]