Author: Wayne Song

Technical Deep Dive into MemSQL Streamliner

MemSQL Streamliner, an open source tool available on GitHub, is an integrated solution for building real-time data pipelines using Apache Spark. With Streamliner, you can stream data from real-time data sources (e.g. Apache Kafka), perform data transformations within Apache Spark, and ultimately load data into MemSQL for persistence and application serving. Streamliner is great tool for developers and data scientists since little to no code is required – users can instantly build their...


How to Deploy MemSQL on the Mesosphere DCOS

The Mesosphere Datacenter Operating System (DCOS) is a distributed operating system designed to span all machines in a datacenter. It provides mechanisms for deploying applications across the entire system with a few simple commands. MemSQL is a great fit for deployment on DCOS because of its distributed, memory-optimized design. For example, users can scale computation and storage capacity by simply adding nodes. MemSQL deploys across commodity hardware and cloud, giving users the flexibility...


MemSQL Spark Connector

Run Real-Time Applications with Spark and the MemSQL Spark Connector

Apache Spark is one of the most powerful distributed computing frameworks available today. Its combination of fast, in-memory computing with an architecture that’s easy to understand has made it popular for users working with huge amounts of data. While Spark shines at operating on large datasets, it still requires a solution for data persistence. HDFS is a common choice, but while it integrates well with Spark, its disk-based nature can impact performance in real-time applications (e.g....


Load Files from Amazon S3 and HDFS with the MemSQL Loader

One of the most common tasks with any database is loading large amounts of data into it from an external data store. Both MemSQL and MySQL provide the LOAD DATA command for this task; this command is very powerful, but by itself, it has a number of restrictions: It can only read from the local filesystem, so loading data from a remote store like Amazon S3 requires first downloading the files you need. Since it can only read from a single file at a time, loading from multiple files requires...