Spark

Spark Summit 2017

The Machine Learning Track at Spark Summit

Spark Summit 2017 kicks off in less than two weeks with a program that includes more than 175 talks led by top experts in the Apache Spark ecosystem. From developer tutorials and research demos to real-world case studies and data science applications, these 5 sessions will take your machine learning skills to the next level. 5 Machine Learning talks to check out at Spark Summit 2017: Apache Spark MLlib’s Past Trajectory and New Directions (Joseph Bradley, Databricks) – This talk...


ArcGIS, Spark & MemSQL Integration

This is a guest post by Mansour Raad of Esri. We were fortunate to catch up with him at Strata+Hadoop World San Jose. This post is replicated from Mansour’s Thunderhead Explorer blog ArcGIS, Spark & MemSQL Integration Just got back from the fantastic Strata + Hadoop 2017 conference where the topics ranged from BigData, Spark to lots of AI/ML and not so much on Hadoop explicitly, at least not in the sessions that I attended. I think that is why the conference is renamed Strata + Data from...


Spark Summit Boston

MemSQL at Spark Summit East 2017

Last week we announced the release of the MemSQL Spark 2 Connector with support for both Apache Spark 2.0 and 2.1. At Spark Summit Boston East 2017 next week we will showcase our new connector that operationalizes powerful advanced analytics. February 7-9 John B. Hynes Convention Center 900 Boylston Street, Boston, MA 02115 https://spark-summit.org/east-2017/ MemSQL CTO and Co-founder, Nikita Shamgunov and product manager, Steven Camiña will also deliver the following talks at the conference....


PowerStream

Using MemSQL and Spark for Machine Learning

At Spark Summit in San Francisco, we highlighted our PowerStream showcase application, which processes and analyzes data from over 2 million sensors on 200,000 wind turbines installed around the world. We sat down with one of our PowerStream engineers, John Bowler, to discuss his work on our integrated MemSQL and Apache Spark solutions. What is the relationship between MemSQL and Spark? At its core, MemSQL is a database engine, and Spark is a powerful option for writing code to transform data....


Powerstream Spark Demo

IoT at Global Scale: PowerStream Wind Farm Analytics with Spark

At Spark Summit East in New York, we unveil PowerStream, an Internet of Things (IoT) simulation with visualizations and alerts based on real-time data from 2 million sensors across global wind farms. Renewable energy, such as wind power, is a viable alternative to traditional sources. For example, Danish wind turbines set a new world record for generating energy in 2015. According to recently published data, wind power now accounts for 42.1% of the total electricity consumption in Denmark. As...


lambda architecture

Rethinking Lambda Architecture for Real-Time Analytics

Big data, as a concept and practice, has been around for quite some time now. Most companies have responded to the influx of data by adapting their data management strategy. However, managing data in real time still poses a challenge for many enterprises. Some have successfully incorporated streaming or processing tools that provide instant access to real-time data, but most traditional enterprises are still exploring options. Complicating the matter further, most enterprises need access to...


Streamliner Python

Introducing a Performance Boost for Spark SQL, Plus Python Support

This month’s MemSQL Ops release includes performance features for Streamliner, our integrated Apache Spark solution that simplifies creation of real-time data pipelines. Specific features in this release include the ability to run Spark SQL inside of the MemSQL database, in-browser Python programming, and NUMA-aware deployments for MemSQL. We sat down with Carl Sverre, MemSQL architect and technical lead for Ops development, to talk about the latest release. Q: What’s the coolest thing...


Coinalytics Taps MemSQL to Fuel Blockchain Analytics

Bitcoin has occupied headlines of technology and business publications over the past several years. The concept of digital currency, or cryptocurrency, rocked the financial industry, and public opinion about the applications of Bitcoin continues to ebb and flow. Today, Bitcoin is being overshadowed by another technology: blockchain. Blockchain is a public ledger for Bitcoin and other cryptocurrencies. This is where the real money is, say payment industry experts. Blockchain is a distributed...


Technical Deep Dive into MemSQL Streamliner

MemSQL Streamliner, an open source tool available on GitHub, is an integrated solution for building real-time data pipelines using Apache Spark. With Streamliner, you can stream data from real-time data sources (e.g. Apache Kafka), perform data transformations within Apache Spark, and ultimately load data into MemSQL for persistence and application serving. Streamliner is great tool for developers and data scientists since little to no code is required – users can instantly build their...