spark

In-Memory and Apache Spark

Video: The State of In-Memory and Apache Spark

Strata+Hadoop World was full of activity for MemSQL. Our keynote explained why real-time is the next phase for big data. We showcased a live application with Pinterest where they combine Spark and MemSQL to ingest and analyze real-time data. And we gave away dozens of prizes to Strata+Hadoop attendees who proved their latency crushing skills in our Query Kong game. During the event, Mike Hendrickson of O’Reilly Media sat down with MemSQL CEO Eric Frenkiel to discuss: The state of in-memory...


Pinterest Apache Spark Demo

How Pinterest Measures Real-Time User Engagement with Spark

Setting the Stage for Spark With Spark on track to replace MapReduce, enterprises are flocking to the open source framework in effort to take advantage of its superior distributed data processing power. IT leads that manage infrastructure and data pipelines of high-traffic websites are running Spark–in particular, Spark Streaming which is ideal for structuring real-time data on the fly–to reliably capture and process event data, and write it in a format that can immediately be queried by...


Operationalize Spark

Operationalizing Spark with MemSQL

In Short: Combining the data processing prowess of Spark with a real-time database for transactions and analytics, where both are memory-optimized and distributed, leads to powerful new business use cases. MemSQL Spark Connector links at end of this post. Data Appetite and Evolution Our generation of, and appetite for, data continues unabated. This drives a critical need for tools to quickly process and transform data. Apache Spark, the new memory-optimized data processing framework, fills this...


MemSQL Spark Connector

Run Real-Time Applications with Spark and the MemSQL Spark Connector

Apache Spark is one of the most powerful distributed computing frameworks available today. Its combination of fast, in-memory computing with an architecture that’s easy to understand has made it popular for users working with huge amounts of data. While Spark shines at operating on large datasets, it still requires a solution for data persistence. HDFS is a common choice, but while it integrates well with Spark, its disk-based nature can impact performance in real-time applications (e.g....