Engineering

dbbench database workloads

New Release of dbbench Streamlines Database Workload Testing

Our performance engineering team is committed to delivering high quality tools. Since we released dbbench 7 months ago, it has been widely adopted across our engineering and sales teams as the definitive tool for testing database workloads. Today we are announcing availability of a new version of dbbench, as well as a package of high level tools to enhance it. In this latest release, we enhanced both the flexibility and ease of use of the tool. We augmented capabilities of dbbench and added a...


Third Normal Form, Star Schema, and a
Performance Centric Data Strategy

Keeping it Straight Data value comes from sharing, so staying organized and providing common data access methods across different groups can bring big payoffs. Companies struggle daily to keep data formats consistent across applications, departments, people, divisions, and new software systems installed every year. Passing data between systems and applications is called ETL, which stands for Extract, Transform, and Load. It is the process everyone loves to hate. There is no glamour in...


RBAC security

MemSQL 5.1 Enhances Security for Real-Time Enterprises

Enterprises seek real-time data and analytics solutions to stay current in competitive, fast-evolving markets. Companies dealing in private information, such as healthcare organizations, financial institutions, and the public sector have historically been limited in their pursuit of real-time results, given stringent security requirements. Today, we announce the availability of MemSQL 5.1. This release adds Role-Based Access Control (RBAC) to the already powerful MemSQL 5, unlocking the gateway...


real-time monitoring

Monitoring A/B Experiments In Real Time

This post originally appeared on the Pinterest Engineering Blog by Bryant Xiao. As a data driven company, we rely heavily on A/B experiments to make decisions on new products and features. How efficiently we run these experiments strongly affects how fast we can iterate. By providing experimenters with real-time metrics, we increase our chance to successfully run experiments and move faster. We have daily workflows to compute hundreds of metrics for each experiment. While these daily metrics...


Should You Use a Rowstore or a Columnstore?

This is a repost of an article by Ankur Goyal, VP of Engineering, published on Medium ⇒ The terms rowstore and columnstore have become household names for database users. The general consensus is that rowstores are superior for online transaction processing (OLTP) workloads and columnstores are superior for online analytical processing (OLAP) workloads. This is close but not quite right — we’ll dig into why in this article and provide a more fundamental way to reason about when...


dbBench

dbbench: Bringing Active Benchmarking to Databases

In my last blog post, I investigated a Linux performance issue affecting a specific customer workload. In this post, I will introduce the tool I created to drive that investigation. Recently, a customer was running a test where data was loaded into MemSQL via LOAD DATA. The customer’s third-party benchmarking tool found that MemSQL took twice as long to load the same amount of data as a competing database; however, the numbers reported by this tool did not make sense. Local tests had shown...


Investigating Linux Performance

Investigating Linux Performance with Off-CPU Flame Graphs

The Setup As a performance engineer at MemSQL, one of my primary responsibilities is to ensure that customer Proof of Concepts (POCs) run smoothly. I was recently asked to assist with a big POC, where I was surprised to encounter an uncommon Linux performance issue. I was running a synthetic workload of 16 threads (one for each CPU core). Each one simultaneously executed a very simple query (select count(*) from t where i > 5) against a columnstore table. In theory, this ought to be a CPU...


Streamliner Python

Introducing a Performance Boost for Spark SQL, Plus Python Support

This month’s MemSQL Ops release includes performance features for Streamliner, our integrated Apache Spark solution that simplifies creation of real-time data pipelines. Specific features in this release include the ability to run Spark SQL inside of the MemSQL database, in-browser Python programming, and NUMA-aware deployments for MemSQL. We sat down with Carl Sverre, MemSQL architect and technical lead for Ops development, to talk about the latest release. Q: What’s the coolest thing...


Oracle and MemSQL Together

Using Oracle and MemSQL Together

Photo: Martin Taylor We often hear “How can I use MemSQL together with my Oracle database?” As a relational database, MemSQL is similar to an Oracle database, and can serve as an alternative to Oracle in certain scenarios. Here is what sets MemSQL apart: MemSQL is a distributed system, designed to run on multiple machines with a massively parallel processing architecture. An Oracle database, on the other hand, resides in a single, large machine, or a smaller fixed cluster size. MemSQL has...


Technical Deep Dive into MemSQL Streamliner

MemSQL Streamliner, an open source tool available on GitHub, is an integrated solution for building real-time data pipelines using Apache Spark. With Streamliner, you can stream data from real-time data sources (e.g. Apache Kafka), perform data transformations within Apache Spark, and ultimately load data into MemSQL for persistence and application serving. Streamliner is great tool for developers and data scientists since little to no code is required – users can instantly build their...