Data Warehouse Rescue

Seeking a Rescue from a Traditional RDBMS

In the Beginning Years ago, organizations used transactional databases to run analytics. Database administrators struggled to set up and maintain OLAP cubes or tune report queries. Monthly reporting cycles would slow or impact application performance because all the data was in one system. The introduction of custom hardware, appliance-based solutions helped mitigate these issues, and the resulting solutions were transactional databases with column store engines that were fast. Stemming from...


real-time data warehousing

Real-Time Data Warehousing for the Real-Time Economy

In the age of manual decision making based on predictable data formats, data feeds, and batch processing times, enterprise businesses stayed current with ad hoc analyses and periodic reports. To generate analyses and reports, businesses relied on the traditional data warehouse. Using extraction, transformation, and load batch processes, the traditional data warehouse standardized disparate data into normalized schemas and pre-computed cubes. With the data shaped into pre-configured dimensions...


Arrays a Hidden Gem in MemSQL

Arrays - A Hidden Gem in MemSQL

Released this March, MemSQL 6 Beta 1 introduced MemSQL Procedural SQL (MPSQL). MPSQL supports the creation of: User-Defined Functions (UDFs) Stored Procedures (SPs) Table-Valued Functions (TVFs) User-Defined Aggregate Functions (UDAFs) A Hidden Gem: Array Types There’s a hidden gem in MemSQL 6 Beta 1 that we didn’t document at first — array types!  These make programming much more convenient. Since we compile your extensions to machine code, the performance is fantastic. And you...


Durable Storage for Real-Time Analytics with MemSQL and Spark

Apache Spark has made a name for itself as a powerful data processing engine for transforming large datasets in a swift, distributed manner. After using Spark to complete such transformations, you often want to store your data in a persistent and efficient format for long-term access. The common solution of storing data in HDFS solves the issue of persistence, but suffers efficiency issues as a result of the HDFS disk-based architecture. The MemSQL Spark Connector solves both of these issues by...


Database Multi-Tenancy in the Cloud and Beyond

In today’s wave of Enterprise Cloud applications, having trust in a data store behind your software-as-a-service (SaaS) application is a must. Thus, multi-tenancy support is a critical feature for any enterprise-grade database. This blog post will cover the ways to implement multi-tenancy, and best practices for doing so in MemSQL. As customer table sizes grow, you will need to scale out your multi-tenant database across dozens of machines. To support rich analytics about your customers or...


machine learning at scale

Video: Scoring Machine Learning Models at Scale

At Strata+Hadoop World, MemSQL Software Engineer, John Bowler shared two ways of making production data pipelines in MemSQL: 1) Using Spark for general purpose computation 2) Through a transform defined in MemSQL pipeline for general purpose computation In the video below, John runs a live demonstration of MemSQL and Apache Spark for entity resolution and fraud detection across a dataset composed of a hundred thousand employees and fifty million customers. John uses MemSQL and writes a Spark job...


The Analytics Race Amongst The World's Largest Companies

The Analytics Race Amongst The World’s Most Valuable Companies Data is fueling the world’s most valuable companies. Today the list is topped by Apple, Google, Microsoft, Amazon, and Facebook. These top companies harness data to drive outsized value. While the companies are unique, they share a more common approach to analytics than you might expect. The Rapid Rise of Data Capture for Analytics In a short span, entire industries have been born that didn’t exist previously. Each of these...


verify trusted users

Protecting Against the Insider Threat - Verify Your Trusted Users

Continuing with the blog post, Protecting Against the Insider Threat, where the theme was the “Separation of Administrative Duties” as a way to disintermediate the Database Administrator (DBA) from the data, this blog will focus on the need to “Trust, but Verify” your users. It is always the assumption that employees will act in their best interest for the good of the organization and their customers. But, what if they don’t? Unfortunately, this can be the reality in some organizations...


real-time analytics at UBER

Video: Real-Time Analytics at UBER Scale

At Strata+Hadoop World, James Burkhart, technical lead on real-time data infrastructure at Uber, shared how Uber supports millions of analytical queries daily across real-time data with Apollo, Uber’s internal analytics querying language. James covers architectural decisions and lessons learned from building an exactly-once ingest pipeline that captures raw events across in-memory row storage and on-disk columnar storage. He also details how Uber uses a custom metalanguage and query layer by...


real-time nano marketing

Real-Time and The Rise of Nano-Marketing

The tracking and targeting of our online lives is no secret. Once we browse to a pair of shoes on a website, we are reminded about them in a retargeting campaign. Lesser known efforts happen behind the scenes to accumulate data and scan through it in realtime, delivering the perfect personalized campaign. Specificity and speed are converging to deliver nano-marketing. If you are a business leader, you’ll want to stay versed in these latest approaches. If not, as a consumer, you’ll likely...