August 13, 2015

The Resurgence of Scala for Big Data

Emily Friedman

Marketing Communications for SingleStore.

Big Data Scala by the Bay, Aug 16-18, is shaping up to be an engaging event, and will bring together top data engineers, data scientists, developers, and data managers who use the Scala language to build big data pipelines.

At the SingleStore booth, we will showcase how enterprises can streamline this process by building their own real-time data pipelines using Apache Kafka, Apache Spark and operational databases. Many of our customers are moving to this real-time data pipeline: a simplified Lambda Architecture that minimizes overhead while delivering remarkably fast analytics on changing datasets.

To provide more perspective on the intersection of Scala and in-memory databases, we sat down with, Ben Campbell, our in-house Scala expert.

Q: Describe the technical underpinnings of Scala.
Scala is notable, and has achieved widespread use, largely because of the way it combines two distinct programming paradigms: object-oriented and functional. Object-oriented programming is, of course, familiar to most with C++ or Java — nearly all programmers have some familiarity with one or both of those languages. Functional programming, on the other hand, is less well-known, having historically been consigned largely to academic theory and niche applications. By combining the two approaches, Scala has been able to do what its functional predecessors have not: achieve widespread adoption by a community largely reared on the object-oriented paradigm.

There’s an interesting analogy between Scala and C++, which was the breakout object-oriented language. C++ was not the first object-oriented language, nor was it a pure object-oriented language. However, C++ became widely adopted because it bridged the gap between C, a non-object-oriented language in widespread use at the time, and the object-oriented approach. Scala has done something similar: based on Java — it makes use of Java libraries and compiles to Java bytecode through the Java virtual machine — it has been relatively easy to adopt for a generation raised on the object-oriented paradigm. But Scala can also be used in a highly functional manner. So programmers coming from a Java background tend to increasingly embrace Scala’s functional features with time.

Q: What is the functional programming paradigm, and how does it differ from alternatives?
Functional programming treats computation as a problem of evaluating mathematical functions. On the other hand, object-oriented programming treats computation as a series of changes in state. Functional programming avoids such state changes, and hence there is no requirement for mutable data. Scala is an interesting hybrid of these two approaches — it can be written in a functional style, or in a more traditional Java-like style, with mutable state.

Q: Why is functional programming, and hence Scala, important for Big Data?
As background, object-oriented programming is useful for projects that involve creating increasingly elaborate objects from simpler primitives. Similarly, functional programming is well-suited for applications that compose increasingly elaborate functions from simpler functional primitives. This is often the case in data science, explaining the growing interest in functional programming approaches.

As for Big Data, the term implies a set of problems that are too large to handle with conventional approaches — which generally entails a certain amount of parallelism. However, parallel processing is plagued by changes in state: if two parallel processes are attempting to change the same data, the result might be delayed (at best) or unpredictable (at worst). By reducing or eliminating mutability, functional approaches tend to lead to programs that naturally and simply handle concurrency and scalability.

Q: What are some of the important use cases for Scala?
Scala gained a lot of publicity in 2009, when Twitter announced it would be adopting the language for much of its backend. Since then, a number of other large enterprises have followed suit. But perhaps the biggest development in Scala has been Apache Spark, the big data processing framework. As somewhat of a successor to Apache Hadoop (whose MapReduce model was itself loosely based on functional processing), Spark is seeing enormous growth in adoption and interest — and the fact that it is written in Scala is drawing many to the language. Other notable implementations of Scala include the messaging queue Kafka, and several mathematical and machine learning libraries (e.g. ScalaNLP and BIDMach).

Q: What does the success of Scala bode for the future of the programming landscape?
With its hybrid object-oriented / functional approach, Scala will serve as somewhat of a gateway drug, helping to gradually transform the landscape towards more functional approaches. While Scala is fully object-oriented, tools like Slick allow it to interface with relational databases to implement more of a functional-relational approach to data. The increasing interest in scalable functional programming thus dovetails with a resurgence of interest in scalable relational database technologies, such as SingleStore.

We hope to see you at Big Data Scala! For conference details, click here: http://bigdatascala.bythebay.io/.

Engineering