This is a repost of an article by Ankur Goyal, VP of Engineering, published on Medium ⇒
The terms rowstore and columnstore have become household names for database users. The general consensus is that rowstores are superior for online transaction processing (OLTP) workloads and columnstores are superior for online analytical processing (OLAP) workloads. This is close but not quite right — we’ll dig into why in this article and provide a more fundamental way to reason about when to use each type of storage.
One of the nice things about SQL-based databases is the separation of logical and physical concepts. You can express logical decisions as schemas (for example, the use of 3NF) and SQL code (for example, to implement business logic), and for the most part avoid thinking about physical implementation details or runtime considerations. Then, based on what your workload is trying to do, you can make a series of physical decisions which optimize performance and cost for your workload. These physical decisions include where to put indexes, what kind of indexes to use, how to distribute data, how to tune the database, and even which database product to use (if you use ANSI SQL). Importantly, making physical decisions does not require changing SQL code.
Until recently, indexes were almost always backed by B-Trees and occasionally hash tables. This started to change when Sybase IQ and then more popularly C-Store/Vertica hit the market and provided incredible cost-savings and performance for data-warehouse workloads with columnstore indexes. Columnstores have hit the mainstream and are the primary storage mechanism in modern data-warehouse technology (e.g. Redshift, Vertica, HANA) and are present in mainstream databases (Oracle, SQL Server, DB2, MemSQL). Nowadays, one of the key physical decisions for a database workload is whether to use a rowstore or columnstore index.
Let us frame the discussion about when to use a rowstore or columnstore by boiling down the fundamental difference in performance. It’s actually quite simple:
- Rowstores are better at random reads and random writes.
- Columnstores are better at sequential reads and sequential writes.
Feeling déjà vu? This is a fairly familiar concept in computer science, and it’s pretty similar to the standard tradeoff between RAM and disk. This reasoning also obviates several myths around rowstores and columnstores:
- Is a RAM-based rowstore faster than a disk-based columnstore? Not necessarily — if the workload has sequential reads (e.g. an analytical workload with lots of scans) a columnstore can be significantly faster.
- Are writes slow in a columnstore? Not necessarily — if the writes are mostly ordered and you don’t need to run updates, then a columnstore can be as fast or even faster to write into than a rowstore, even for relatively small batches.
- Are columnstores bad at concurrent writes? It depends on the type of disk. Both rotational and solid-state disks are good at sequential writes, but solid-state disks tend to be significantly faster to write into concurrently; therefore, columnstores running on SSDs can be very fast at concurrent writes.
Rowstores for Analytics, Columnstores for Transactions
Let’s look at a few use cases which violate the common belief that rowstores are superior for transactions and columnstores are superior for analytics. These are based on workloads that we’ve seen at MemSQL, but these observations are not specific to MemSQL.
In analytical workloads, a common design choice is whether to append (aka log-oriented insertion) or update/upsert. For operational analytics, the upsert pattern is especially common because by collapsing overlapping rows together with an update, you partially-aggregate the result set as you write each row, making reads significantly faster. These workloads tend to require single-row or small-batch random writes, so a rowstore is a significantly better choice as columnstores can’t handle this pattern of writes at any reasonable volume. As an aside, the read pattern is often still scan-oriented, so if the data were magically in a columnstore, reads could be a lot faster. It still makes sense to use a rowstore, however, because of the overwhelming difference in write performance.
Another example of rowstores in analytical workloads is as dimension tables in a traditional star schema analytics workload. Dimension tables often end up on the inside of a join and are seeked into while scanning an outer fact table. We’ve seen a number of customer workloads where we beat columnstore-only database systems simply because MemSQL can back dimension tables with a rowstore and benefit from very fast seeks (MemSQL even lets you use a lock-free hashtable as a type of rowstore index, so you have a pre-built hash join). In this case, rowstores are the superior choice because dimension tables need to be randomly, not sequentially, read.
Finally, columnstores can be used for transactional workloads as well, in particular workloads that are computationally analytic but have operational constraints. A common use case in ad-tech is to leverage a dataset of users and groups (which users belong to) to compute overlap on-demand, i.e. the number of users who are in both Group A and Group B. Doing so requires scanning every row for all users in both Group A and Group B, which can be millions of rows. This computation is significantly faster in a columnstore than a rowstore because the cost of executing the query is dominated by the sequential scan of user ids. Furthermore, sorting by group id not only makes it easy to find the matching user ids but also to scan them with high locality (since all user ids in a group end up stored together). With MemSQL, we were able to get this query to consistently return within 100 ms over 500 billion rows stored in the columnstore. When your workload is operational and fundamentally boils down to a sequential scan, then it can run significantly better in a columnstore. As a side benefit, columnstores offer exceptional data compression so this workload has a relatively small hardware footprint of less than ten nodes on Amazon and fit in SSD.
Because these workloads bleed the traditional definitions of OLTP and OLAP, they are often referred to as Hybrid Transactional/Analytical Processing (HTAP) workloads.
Conclusions and Caveats
The main takeaway is pretty straightforward. Rowstores excel at random reads/writes and columnstores excel at sequential reads/writes. You can look at almost any workload and determine which bucket it falls into. Of course, there are still a few caveats:
- If you have both random and sequential operations, it’s usually better to go with a rowstore. Rowstores tend to be better at sequential operations than columnstores are at random operations. However, if storage space is a primary design constraint, you may want to still consider using a columnstore.
- Some databases, namely MemSQL and Oracle, have hybrid tables with both rowstore and columnstore functionality. In MemSQL, there is a small rowstore sitting alongside every columnstore table that is used for small-batch writes (e.g. singleton inserts and updates). Oracle redundantly stores rowstore data in an in-memory columnstore to transparently speed up reads which would be faster against column-oriented data (see A Case for Fractured Mirrors). Of course, this comes at the expense of write performance since you have to pay the cost of inserting into both stores every time, rather than whichever is cheapest.
- Sorting in columnstores is a big deal, especially for workloads with tight requirements on performance, because it enables consistent performance with low latency. Generally, commercial columnstores (Redshift, Vertica, MemSQL) support sorting but open source and SQL-on-Hadoop databases do not (Kudu is a notable exception).
Thanks to Gary Orenstein, Andy Pavlo, and Nikita Shamgunov for reviewing drafts of this post.