Big data, as a concept and practice, has been around for quite some time now. Most companies have responded to the influx of data by adapting their data management strategy. However, managing data in real time still poses a challenge for many enterprises. Some have successfully incorporated streaming or processing tools that provide instant access to real-time data, but most traditional enterprises are still exploring options. Complicating the matter further, most enterprises need access to both historical and real-time data, which require distinct considerations and solutions.
Of the many approaches to managing real-time and historical data concurrently, the Lambda Architecture is by far the most talked about today. Like the physical aspect of the Greek letter it is named for, the Lambda architecture forks into two paths: one is a streaming (real-time) path, the other a batch path. Thus, it accommodates real-time high-speed data service along with an immutable data lake. Oftentimes a serving layer sits on top of the streaming path to power applications or dashboards.
A Fork in the Road
Many Internet-scale companies, like Pinterest, Zynga, Akamai, and Comcast have chosen MemSQL to deliver the high-speed data component of the Lambda architecture. Some customers have chosen to fork the input stream in order to push data into MemSQL and a data lake, like HDFS, in parallel.
Here is an example of the Comcast Lambda Architecture:
The great thing about MemSQL is that it can fulfill both sides of the Lambda architecture, not just the real-time component. Some customers use the MemSQL in-memory rowstore to support real-time streaming, and then use the disk-based columnstore as the batch service and data lake.
Here is an example of a financial services customer using MemSQL for both layers:
Real Time Analytics
In this era of ubiquitous big data, it is not enough for companies to merely process data. Analyzing that data to detect patterns, which can be immediately applied to maximizing operational efficiency, is the real driver of business value. MemSQL delivers real-time analytics on a rapidly changing data set, making it an ideal match for the characteristics of the Lambda Architecture speed service. Other data stores have limitations that inhibit high-speed data ingestion, lack analytical capabilities, or cannot scale affordably. MemSQL delivers a complete solution: the ability to handle millions of transactions per second while simultaneously performing complex multi-table join queries. Let’s dig into some of the features that make MemSQL a great solution for implementing the Lambda architecture.
MemSQL uses a distributed shared nothing architecture that scales on commodity hardware and local storage, supporting petabytes of data. MemSQL is a memory-first, relational database that also offers a disk-based columnstore. In-memory optimization delivers high-speed data ingestion while simultaneously delivering analytics on the changing data set. The disk-based columnstore provides historical data management and access to historical data trends to leverage in combination with the “hot” data to deliver real-time analytics.
MemSQL supports the ingestion of unstructured, structured and semi-structured data. Flexibility to align a structure to data in support of analytics meets the business requirements of the operation. Real-time analytics requires a real-time data structure, which MemSQL supports through a fully relational model. Furthermore, MemSQL supports the ingestion of unstructured and semi-structured (JSON) data into key-value pairs.
Full ANSI SQL support makes MemSQL readily accessible to data analysts, business analysts and data scientists reducing application code requirements. Plugging data visualization and query tools into the analytics architecture delivers immediate value from data to the business.
MemSQL also has extended SQL including JSON support. Traversing a JSON document is similar to SQL with extensions to traverse the key-value pairs.
Open Source Connectors
MemSQL offers users several connectors for smooth integration with other data sources. One example is MemSQL Streamliner: a fully integrated Apache Spark solution. Streamliner provides easy deployment of Spark — a critical component for building real-time data pipelines that delivers advanced data enrichment and transformation. Another important connector is the MemSQL Loader, which can easily important data from HDFS, as well as import and synchronize data from Amazon S3.
Customers are investing in MemSQL as they realize the value of data in real time along with the power of SQL to analyze it. Pinterest, Akamai, Zynga, Comcast, and Tapjoy have all deployed MemSQL to power mission-critical applications. Customers from many industries have invested either for performance improvement, the power and familiarity of SQL, or the low cost to scale (shared nothing commodity servers and storage). These include financial services, advertising technology, energy, automotive, and retail, among others.
To learn more about implementing MemSQL for Lambda Architecture, watch David Abercrombie, Data Analytics Engineer at Tapjoy, share his story: