The Jetsons, Video & Acceleration

Fifty-eight years ago, last month “The Jetsons” zipped into pop-culture in a flying car and introduced us to many new fictional technologies like huge flat screen TVs, tablet-based computing and video chat. Some of these technologies, had been popularized long before “The Jetson’s,” specifically video chat known previously as videotelephony was popularized as far back as the 1870s, but it took technology over a century to deliver the first commercially viable product. Today with the pandemic separating many of us from our loved ones and our workplaces, video chat has become an instrumental part of our lives. What most people don’t realize though is that video is extremely data intensive, especially at the viewing resolutions we’ve all become accustomed. Doing the computational work requires translating a high definition video (1080p) into a half dozen different resolutions, known as transcoding, to support most devices, including mobile, and to support various bandwidths. This is often done in real time and typically requires one or more CPU cores on a standard server. Now consider all the digital video you consume daily, both in video chats, and via streaming services, all this content needs to be transcoded for your consumption.

Transcoding video can be very CPU intensive, and the program typically used, FFmpeg, is very efficient at utilizing all the computational resources available. On Linux Ubuntu 16.04 using FFmpeg 2.8.17 my experience is that unconstrained FFmpeg will consume 92% of all the available compute power of the system. My test system is an AMD Ryzen 5 3300G clocked at 4.2Ghz with a Xilinx Alveo U30 Video Accelerator card. This is a hyperthreaded quad-core system. For the testing I produced two sample videos, one from Trevor Noah’s October 15^th, 2020 “Cow Hugging” episode and the other was John Oliver’s October 11^th, 2020 “Election 2020” episode. Using the code mentioned above here are the results in seconds of three successive runs using both files running through the AMD processor, and offloading transcoding into the Xilinx Alveo U30.

Raw data from testing. — Raw Data – Transcoding on AMD Ryzen 5 Versus Xilinx U30 Alveo Video Accelerator

From this, one can make several conclusions, but the one I see most fitting is that the Xilinx Alveo U30 can transcode content 8X faster than a single AMD Ryzen 5 core at 4.2Ghz. Now, this is still development level code; the general availability code has not shipped yet. It is also only utilizing one of the two encoding engines on the U30, so additional capacity is available for parallel encoding. As more is understood, this blog post will be updated to reflect new developments.

Updates:

10/20/20 – It has been suggested that I share the options used when calling ffmpeg for both the AMD CPU execution and the Xilinx Alveo U30. Here are the two sets of command line options used.

The script that calls the AMD processor to do the work used the following options with ffmpeg:

-f rawvideo -b:v 10M -c:v h264 -y /dev/null

The script that calls the Xilinx Alveo U30 used the following options:

-f rawvideo -b:v 10M -maxbitrate 10M -c:v mpsoc_vcu_h264 -y /dev/null

Dropping out the “-maxbitrate 10M” on one Alveo run later in the day yesterday didn’t seem to change much, but this will be further explored. Also it has been suggested that I look into the impact of using “-preset” which affects quality, and how that might perform differently on both platforms.