hadoop streaming

With Hadoop Summit Europe underway today, we wanted to share some thoughts on how MemSQL fits in to the Hadoop ecosystem.

While MemSQL and Hadoop are both data stores, they fill different roles in the data processing and analytics stack. The Hadoop Distributed File System (HDFS) enables businesses to store large volumes of immutable data, but by design, it is used almost exclusively for batch processing. Moreover, newer execution frameworks, that are faster and storage agonistic, are challenging MapReduce as businesses’ batch processing interface of choice.

Lambda Architecture

A number of MemSQL customers have implemented systems using the Lambda Architecture (LA). LA is a common design pattern for stream-based workloads where the hot, recent data requires fast updates and analytics, while also maintaining long-term history on cheaper storage. Using MemSQL as the real-time path and HDFS as the historical path has been a winning combination for many companies. MemSQL serves as a real-time analytics serving layer, ingesting and processing millions of streaming data points a second. MemSQL gives analysts immediate access to operational data via SQL. Long-term analytics and longer running, batch-oriented workflows are pushed to Hadoop.

Use Case: Real-Time Analytics at Comcast

As an example, MemSQL customer Comcast focuses on real-time operational analytics. By using MemSQL and Hadoop together, Comcast can proactively diagnose potential issues from real-time intelligence and deliver the best possible video experience. Their Lambda architecture writes one copy of data to a MemSQL instance and another one to Hadoop.

hadoop-with-memsql

MemSQL enables Comcast to run lightning fast real-time analytics on large, changing datasets and makes their analytics infrastructure more performant overall. Instead of just logging all Xfinity data and analyzing it hours or days later, MemSQL gives Comcast the power to get both viewership and infrastructure monitoring metrics in real time. HDFS provides a quasi-infinite data store where they can run machine learning jobs and other “offline” analytics.

Watch the recorded session from Strata+Hadoop World to find out more on how MemSQL helps Comcast improve their Xfinity platform to work with millions of users, process enormous volumes of data and, at the same time, perform advanced real-time analytics.

If you’re interested in test driving an in-memory database, give MemSQL a try for free, or give us a ring at (855) 463-6775.