In my last blog post, I investigated a Linux performance issue affecting a specific customer workload. In this post, I will introduce the tool I created to drive that investigation.
Recently, a customer was running a test where data was loaded into MemSQL via
LOAD DATA. The customer’s third-party benchmarking tool found that MemSQL took twice as long to load the same amount of data as a competing database; however, the numbers reported by this tool did not make sense. Local tests had shown MemSQL could process ~200 MB/s of data on a single machine, but the third-party tool reported MemSQL was only processing ~60 MB/s of data. Looking at raw TCP socket data on both sides of the connection, I realized that the benchmarking tool was inadvertently throttling itself when running against MemSQL. When I used the standard
mysql client to run the
LOAD DATA command directly, performance doubled as MemSQL saturated the network card and matched competitor performance at 128MB/s.
The problem here was that the customer was passively running their tool and only looking at the final result. I follow a methodology known as active benchmarking, a technique for validating the results of a benchmark. The original description summarizes the active benchmarking process quite well:
While the benchmark is running, analyze the performance of all components involved using other tools, to identify the true limiter of the benchmark.
Every active benchmarking session is a fully-fledged performance investigation that takes more time and effort. But it turns up the actual bottleneck and reveals the maximum possible performance.
When I first joined MemSQL, I found myself frequently writing one-off scripts to simulate complicated workloads for these active benchmarking investigations. Having previously used
fio to test filesystems, I wanted a similar tool for defining a custom workload and running it efficiently. I found the existing tools to be fairly restrictive:
mysqlslap is only really designed to execute one type of query at a time, and
sysbench only supports a very small set of workloads as first class citizens. I wanted to test more complicated workloads.
sysbench cap the number of concurrent connections to the configured number of threads if the database cannot keep up with the workload. I want to test and observe queueing in the database (not in my benchmarking tool) as queries pile up.
sysbench are designed first and foremost be stand-alone tests: they want to create their own managed tables with their own data.
- HammerDB is a graphical tool that does not appear to have easy ways to configure otherwise. I wanted to use simple text files for defining workloads (à la
fio) so I could share them via email or check them into a git repository.
In addition, I had concerns about the performance overhead of running these thread-based workload drivers that generate queries or data. I commonly run the workload generator on the database server, where it is unacceptable for the generator to compete with the database for resources. I needed a tool to be as lightweight as possible.
Ultimately, these concerns motivated me to write a new tool,
dbbench is a fast, lightweight database workload generator that executes a workload defined a flexible configuration file.
dbbench can behave like the other stand-alone testing tools, but it really shines as a tool for active benchmarking complicated workloads. The following configuration file describes a common analytics workload: analytics queries on a table with streaming writes that are periodically deleted from a table:
; Insert 100 rows per second.
query=insert into t (value) select rand() * 1000
[clean old rows from table]
; Every 100s, clean old rows from the table
query=delete from t where insert_date < now() - interval 1 hour
[read nearest values]
; 8 clients reading as fast as possible.
query=select count(*) from t where value between 900 and 1000
Given this configuration file,
dbbench will run forever and log some simple statistics about each job. While it is running, I can very easily observe interesting performance behaviors such as how the performance of read queries changes dramatically after the table is cleaned up. In the snippet from the logs below, the average latency of the “read nearest values” job drops from 225ms to 56ms after the table is cleaned:
2016/01/20 02:56:06 <strong>streaming inserts</strong>: latency 3.078086ms±1.173286ms; 100 transactions (100.177 TPS); 100 rows (100.177 RPS)
2016/01/20 02:56:06 <strong>read nearest values</strong>: latency <em>225.831701ms</em>±63.85481ms; 39 transactions (28.282 TPS); 39 rows (28.282 RPS)
2016/01/20 02:56:06 <strong>clean old rows from table</strong>: latency 7.824990134s±0; 2 transactions (0.155 TPS); 3259108 rows (253306.618 RPS)
2016/01/20 02:56:07 <strong>streaming inserts</strong>: latency 2.785273ms±931.039µs; 100 transactions (100.870 TPS); 100 rows (100.870 RPS)
2016/01/20 02:56:07 <strong>read nearest values</strong>: latency <em>56.268517ms</em>±5.004933ms; 143 transactions (131.732 TPS); 143 rows (131.732 RPS)
I recently used a very similar configuration to test a customer workload and noticed that write query throughput dropped precipitously after the table cleanup was executed. It turned out that the background log merger thread was misconfigured to be aggressive on the MemSQL instance. When the table cleanup executed, the background thread triggered and used up so many disk IOPs that the transaction log was unable to keep up with incoming writes:
dbbench was instrumental in this investigation because it provided a simple way to describe the workload and a reliable way to execute it. By monitoring and questioning the data throughout the execution of the benchmark, I was able to observe and fix a complicated performance anomaly.
dbbench version 0.1 is publicly available under an Apache license. I’m actively improving and stabilizing
dbbench. MemSQL is already using it for many customer Proof of Concepts and performance testing. Try out or contribute to
dbbench on Github today.
- During the investigation, I found an easy way to improve this to 1GB/s ↩
- We were bottlenecked on the 1 Gigabit Ethernet card ↩
- Including in the benchmarking tool itself ↩
- For example, the new sysbench 0.5 Lua interface has a noticable CPU overhead ↩