Solarflare’s adapter offers 1,024 virtualized network interfaces (VNICs) on each 10GbE port each with their own receive, and transmit hardware. Traditionally Solarflare’s OpenOnload mapped application sockets to these VNICs, but a new version allows sockets to be dynamically moved between OpenOnload stacks. This allows multi-threaded applications like Memcached that leverage a single listener thread and many worker threads to easily map each worker thread to its own VNIC interface and move traffic between them.
Using two Dell PowerEdge R620 servers, each with two 10-core Intel E5-2660 v2 processors and 32GB of memory running RHEL 6.5 with Hyper-threading enabled. The first server, the traffic generator, ran memslap, and was configured with a pair of Solarflare SFN6122F adapters to create sufficient load. The second server, the one testing Memcached performance, had both an Intel X520-DA2 and a Solarflare SFN7122F with OpenOnload. To balance performance, and scalability each Memecached instance only had five cores (10 threads), so with 20 cores on the server we loaded four Memcached instances, two per CPU. We then tested using one core per instance scaling to five cores per instance. Here are some interesting data points:
- From four cores to 20 cores performance was pretty linear Solarflare went from 5 Mops to 21.9 Mops while Intel went from 1.7 Mops to 7.4 Mops.
- Latency in microseconds for a single get request saw substantial improvement going from 750us (4 cores) down to 180us (20 cores) while Intel went from 3,000us down to 425us.
- For batches of 48 get requests the latency in microseconds was also very compelling going from 10,000us (4 cores) down to 2,000us (20 cores) compared to Intel with 30,000us down to 6,000us.
If you’re serious about Memcached you should really read this 10-page whitepaper as it goes into much deeper detail.
