real-time data warehousing

Real-Time Data Warehousing for the Real-Time Economy

In the age of manual decision making based on predictable data formats, data feeds, and batch processing times, enterprise businesses stayed current with ad hoc analyses and periodic reports. To generate analyses and reports, businesses relied on the traditional data warehouse. Using extraction, transformation, and load batch processes, the traditional data warehouse standardized disparate data into normalized schemas and pre-computed cubes. With the data shaped into pre-configured dimensions...


Arrays a Hidden Gem in MemSQL

Arrays - A Hidden Gem in MemSQL

Released this March, MemSQL 6 Beta 1 introduced MemSQL Procedural SQL (MPSQL). MPSQL supports the creation of: User-Defined Functions (UDFs) Stored Procedures (SPs) Table-Valued Functions (TVFs) User-Defined Aggregate Functions (UDAFs) A Hidden Gem: Array Types There’s a hidden gem in MemSQL 6 Beta 1 that we didn’t document at first — array types!  These make programming much more convenient. Since we compile your extensions to machine code, the performance is fantastic. And you...


Durable Storage for Real-Time Analytics with MemSQL and Spark

Apache Spark has made a name for itself as a powerful data processing engine for transforming large datasets in a swift, distributed manner. After using Spark to complete such transformations, you often want to store your data in a persistent and efficient format for long-term access. The common solution of storing data in HDFS solves the issue of persistence, but suffers efficiency issues as a result of the HDFS disk-based architecture. The MemSQL Spark Connector solves both of these issues by...


Database Multi-Tenancy in the Cloud and Beyond

In today’s wave of Enterprise Cloud applications, having trust in a data store behind your software-as-a-service (SaaS) application is a must. Thus, multi-tenancy support is a critical feature for any enterprise-grade database. This blog post will cover the ways to implement multi-tenancy, and best practices for doing so in MemSQL. As customer table sizes grow, you will need to scale out your multi-tenant database across dozens of machines. To support rich analytics about your customers or...


machine learning at scale

Video: Scoring Machine Learning Models at Scale

At Strata+Hadoop World, MemSQL Software Engineer, John Bowler shared two ways of making production data pipelines in MemSQL: 1) Using Spark for general purpose computation 2) Through a transform defined in MemSQL pipeline for general purpose computation In the video below, John runs a live demonstration of MemSQL and Apache Spark for entity resolution and fraud detection across a dataset composed of a hundred thousand employees and fifty million customers. John uses MemSQL and writes a Spark job...


The Analytics Race Amongst The World's Largest Companies

The Analytics Race Amongst The World’s Most Valuable Companies Data is fueling the world’s most valuable companies. Today the list is topped by Apple, Google, Microsoft, Amazon, and Facebook. These top companies harness data to drive outsized value. While the companies are unique, they share a more common approach to analytics than you might expect. The Rapid Rise of Data Capture for Analytics In a short span, entire industries have been born that didn’t exist previously. Each of these...


verify trusted users

Protecting Against the Insider Threat - Verify Your Trusted Users

Continuing with the blog post, Protecting Against the Insider Threat, where the theme was the “Separation of Administrative Duties” as a way to disintermediate the Database Administrator (DBA) from the data, this blog will focus on the need to “Trust, but Verify” your users. It is always the assumption that employees will act in their best interest for the good of the organization and their customers. But, what if they don’t? Unfortunately, this can be the reality in some organizations...


real-time analytics at UBER

Video: Real-Time Analytics at UBER Scale

At Strata+Hadoop World, James Burkhart, technical lead on real-time data infrastructure at Uber, shared how Uber supports millions of analytical queries daily across real-time data with Apollo, Uber’s internal analytics querying language. James covers architectural decisions and lessons learned from building an exactly-once ingest pipeline that captures raw events across in-memory row storage and on-disk columnar storage. He also details how Uber uses a custom metalanguage and query layer by...


real-time nano marketing

Real-Time and The Rise of Nano-Marketing

The tracking and targeting of our online lives is no secret. Once we browse to a pair of shoes on a website, we are reminded about them in a retargeting campaign. Lesser known efforts happen behind the scenes to accumulate data and scan through it in realtime, delivering the perfect personalized campaign. Specificity and speed are converging to deliver nano-marketing. If you are a business leader, you’ll want to stay versed in these latest approaches. If not, as a consumer, you’ll likely...


big data changing

From Big to Now: The Changing Face of Data

Data is changing. You knew that. But the dialog over the past 10 years around big data and Hadoop is rapidly moving to data and real-time. We have tackled how to capture big data at scale. We can thank the Hadoop Distributed File System for that, as well as cloud object stores like AWS S3. But we have not yet tackled the instant results part of big data. For that we need more. But first, some history. Turning Point for the Traditional Data Warehouse Internet scale workloads that emerged in the...