November is nearly upon us, with the spotlight on Election 2016. This election has been amplified by millions of digital touchpoints. In particular, Twitter has risen in popularity as a forum for voicing individual opinions as well as tracking statements directly from the candidates. Pew Research Center states that “In January 2016, 44% of U.S. adults reported having learned about the 2016 presidential election in the past week from social media, outpacing both local and national print newspapers.” The first 2016 Presidential debate “between Donald Trump and Hillary Clinton was the most-tweeted debate ever. All told, there were 17.1 million interactions on Twitter about the event.”

By now, most people have probably seen both encouraging and deprecating tweets about two candidates: Hillary Clinton and Donald Trump. Twitter has become a real-time voice for the public watching along with debates and campaign announcements. We wanted to hone in on the sentiments expressed in real time. Using Apache Kafka, MemSQL, Machine Learning and our Pipelines Twitter Demo as a base, we are bringing real-time analytics to Election 2016.

Hillary vs Trump Real-Time Twitter Sentiment

Click here to access the live demo

Post-Election, we have shut down the demo. View a screencap of it running at the bottom of this post.

Introducing our latest live demonstration, Election 2016: Real-Time Twitter Analytics. We analyze the sentiment –attitude, emotion, or feeling– of every tweet about Clinton and Trump as it is tweeted. Now, anyone can see how high or low in the negative or positive tweets are trending at any given point. We’re giving everyone access to the broader scope of how each candidate is doing according to the Twittersphere.

How it Works

First, we wrote a python script to collect tweets and retweets that contain the words Hillary, hillary, Trump, or trump directly from Twitter.com. We picked the words “Hillary” and “Trump” as descriptors since they are the most used for the candidates. The script pushes this content to an Apache Kafka queue in real time. Messages in this Kafka queue are then streamed using MemSQL Pipelines. Released in September 2016 at Strata+Hadoop World, Pipelines features a brand new SQL command CREATE PIPELINE, enabling native ingest from Apache Kafka and creation of real-time streaming pipelines.

The CREATE PIPELINE statement looks like this:

The CREATE TABLE statement for the tweets table in MemSQL is shown below:

Note: we create tweets as a columnstore table so it can handle large amounts of data for analytics. We also utilize persisted computed columns in MemSQL to parse JSON data for categorizing each tweet by candidate. MemSQL natively supports the JSON data format.

When the twitter_pipeline is run, data in the tweets table looks like this:

Next, we created a second pipeline that pulled from the same Kafka topic, but instead of storing directly into a table, we perform real-time sentiment analysis with a MemSQL Pipelines transform that leverages the Python Natural Language Toolkit (nltk) Vader module. The CREATE PIPELINE statement for the second pipeline looks like this:

Combining data from these two MemSQL pipelines, we can perform analytics using SQL. For example, we can create a histogram of tweet sentiment through the following query:

Lastly we constructed a User Interface (UI). We built the graph using WebSockets and React to visualize the rolling average tweet sentiment for both candidates, drawn in real time.

Click here to access the live demo

Post-Election, we have shut down the demo. View a screencap of it at the bottom of this post.

hillary vs clinton real-time chart

Stream Twitter data into MemSQL as shown here by following these instructions: https://github.com/memsql/pipelines-twitter-demo

Try MemSQL Pipelines today! Download the latest version of MemSQL at memsql.com/download

election2016