Performance analysis often gets bottlenecked by lack of visibility. At MemSQL, we architected our database to easily observe its inner workings. Observability allows our engineers to easily identify components that need to be faster. Faster components mean our database’s performance skyrockets. These tools also enable support engineers to react quickly and precisely to customer needs. In the spirit of using the best available tools to which we have access, the performance team is currently evaluating next-generation tools just recently available in Linux.

The newest tool for observing the Linux operating system is the “Berkeley Packet Filter” (BPF). BPF allows users to run a small piece of code quickly and safely inside the operating system. Originally used for packet filtering, it has since been enhanced from its eponymous use-case to support dynamic tracing of the Linux operating system. For example, it is possible to write a small BPF program that prints every time a particular file was accessed by a user.

The power of the Berkeley Packet Filter, when used with Userland Statically Defined Tracepoints (USDT), expands beyond the operating system to the database. USDT probes are well defined locations in the database where BPF programs run, allowing engineers to ask questions previously unanswerable. For example, engineers can now examine the interactions between the database and the operating system by running BPF programs in each at the same time.

Adding a USDT static tracepoint is as easy as a single macro call, which declares the probe and its arguments. This probe fires when each query is executed and records the query string:

To use this USDT probe, we need to attach a BPF program to it. We write our program in C and use the BPF Compiler Collection to compile it to BPF and attach it to our probes. The following BPF script traces queries and records their latencies:

We run query_latency.py, a script that wraps the above BPF program using the BCC toolchain, and get a nice histogram of query latencies:

Once engineers have the ability to trace when a thread is executing a MemSQL query, they can ask more interesting questions about how the database interacts with Linux. For example, engineers can investigate and determine how long queries are spending acquiring locks. With BPF, engineers are able to instrument the start and end of the queries as above, and additionally instrument the futex system call itself (used in Linux to acquire and release locks) to trace how long it takes acquire locks while executing our query:

What about how a query spends its time? On- and off-CPU flamegraphs are helpful, but they are too coarse for query investigations. We instrumented the kernel scheduler tracepoints to conditionally collect information for threads that queries run on. This tracing tells us how long the thread of a query spends in various states (waiting, running, blocked, I/O, and sleeping).

The power of BPF allows us to inspect our database at runtime and ask precise questions. Increased observability provided by BPF improves the rate of performance and optimizes customer interaction with the MemSQL database. Overall, BPF provides the observability necessary to build a transparent and easily accessible modern in-memory database.

Access scripts, documentation, and additional reference information on BCC and BPF here.

After joining the MemSQL performance team this summer, Kyle Laracey will be returning to Brown University in the fall. At Brown, he studies computer science, is a teaching assistant for CS167: Operating Systems, and is expected to graduate in May 2017.