I’ll return to the “Expanded Role of HPC Accelerators” in my next post, but before doing so, we need to take a step back and look at how data is stored to understand how best to operate on and accelerate that data. When you look under the hood of your car, you’ll find that there are at least six different types of mechanical fasteners holding things together.
Hoses often require a slotted screwdriver to remove, while others a Phillips’s head. Some panels require a star-like Torx wrench while others a hexagonal Allen, but the most common one you’ll find are hexagonal headed bolts and nuts in both English and Metric sizes. So why can’t the engineers who design these products select a single fastener style? Simple, each problem has unique characteristics, and these engineers are often choosing the most appropriate solution. The same is true for data types within a computer. Ask almost anyone, and they’ll say that data is stored in a computer in bytes, just like your engine has fasteners.
Computers process data in many end-user formats, from simple text and numbers to sounds, images, video, and much more. Ultimately, it all becomes bits organized, managed, and stored as bytes. We then wrap abstraction layers around these collections of bytes as we shape them into numbers that our computer can then process. We know that some numbers are “natural,” they’re also called “integers,” meaning they have no fractional component, for example, one, two, and three. Simultaneously, other “real” numbers contain a decimal point that can have anywhere from one to an infinite collection of digits to the decimal’s right. Computers process data using both of these numerical formats.
Textual data is relatively easy to grasp as it is often stored as an integer without a positive or negative sign. For example, the letter “A” is stored using a standard that has assigned it the value 65, which as a single byte is “01000001.” In Computer Science, how a number is represented can be just as important as the number itself’s value. One can store the number three as an integer, but when it is divided by the integer value two, some people would say the result is 1.5, but they could be wrong. Depending on the division operator used, some languages support several, and the data type assigned to the product there could be several different answers, all of which would be correct within their context. As you move from integer to real numbers or, more specifically, floating-point numbers, these numerical representations expand considerably based on the degree of precision required by the problem at hand. Today, some of the more common numerical data types are:
- Integers, which often take a single byte of storage, and can be signed or unsigned. Integers can come in at least seven different distinct lengths from a nibble stored as four bits, a byte (eight bits), a half-word (sixteen bits), word (thirty-two bits), double word (sixty-four bits), octaword (one hundred and twenty-eight bits), and an n-bit value.
- Half Precision floating-point, also known as FP16 is where the real number is expressed in 16 bits of storage. The numerical value is stored in ten bits; then there are five bits for the exponent and a single sign bit.
- Single Precision, also known as floating-point 32 or FP32, for 32 bits. Here we have 23 bits for the fractional component, eight for the exponent, and one for the sign.
- Double Precision, also known as floating-point 64 or FP64, for 64 bits. Finally, we have 52 bits for the fractional component, 11 for the exponent, and one for the sign.
There are other types, but these are the major ones. How computational units process data differs broadly based on the architecture of the processing unit. When we talk about processing units, we’re specifically talking about the Central Processing Unit (CPUs), Graphical Processing Unit (GPUs), Digital Signal Processor (DSPs), and Field Programmable Gate Array (FPGAs). CPUs are the most general. They can process all of the above data types and much more; they are the BMW X5 sport utility vehicle of computation. They’ll get you pretty much anywhere you want; they’ll do it in style and provide good performance. GPUs are that tricked out Jeep Wrangler Rubicon designed to crawl over pretty much anything you can throw at it while still carrying a family of four. On the other hand, DSPs are dirt bikes, they’ll get one person over rough terrain faster than anything available, but they’re not going to shine when they hit the asphalt. Finally, we have FPGAs, and they’re the Dodge Challenger SRT Demon in the pack; they never leave the asphalt; they just leave everything else on it behind. All that to say that GPUs and DSPs are designed to operate on floating-point data, while FPGAs do much better with integer data. So why does this matter?
Every problem is not suited to a slot headed fastener; sometimes you’ll need a Torx, while others a hexagonal headed bolt. The same is true in computer science. If your system’s primary application is weather forecasting, which is floating-point intense, you might want vast quantities of GPUs. Conversely, if you’re doing genetics sequencing, where data is entirely integer-based, you’ll find that FPGAs may outperform GPUs, delivering up to a 100X performance per watt advantage. Certain aspects of Artificial Intelligence (AI) have benefited more from using FP16 based calculations over using FP32 or FP64. In this case, DSPs may outshine GPUs in calculations per watt. As AI emerges in key computational markets moving forward, we’ll see more and more DSPs applications; one of these will be electronic trading.
Today the cutting edge of electronic trading platforms utilize FPGA boards that have many built-in ultra-high-performance networking ports. These boards, in some cases, have upwards of 16 external physical network connections. Trading data and orders into markets are sent via network messages, which are entirely character; hence integer, based. These FPGAs contain code blocks that rapidly process this integer data, but computations slow down considerably as some of these integers are converted to floating-point for computation. For example, some messages use a twelve-character format for the price where the first six digits are the whole number, and the second six digits represent the decimal number. So, a price of $12.34 would be represented as the character string “000012340000.” Other fields also use twelve-character values for a number, but the first ten digits are the whole number, and the last two the decimal value. In this case, 12,572.75 shares of a stock would be represented as “000001257275.” Now, of course, doing financial computations maintaining the price or quantity as characters is possible; it would be far more efficient if each were recast as single-precision (FP32) numbers. Then computation could be rapidly processed. Here’s where a new blend of FPGA processing, to rapidly move around character data, and DSP computing for handling financial calculations using single precision math will shine. Furthermore, DSP engines are an ideal platform for executing trained AI-based algorithms that will drive financial trading moving forward into the future.
Someday soon, we’ll see trading platforms that will execute entirely on a single high-performance chip. This chip will contain a blend of large blocks of adaptable FPGA logic; we’re talking millions of logic tables, along with thousands of DSP engines and dozens of high-performance network connections. This will enable intelligent trading decisions to be made and orders generated in tens of a billionth of a second!