Image Recognition at the Speed of Memory Bandwidth

MH

Michael Harris

Senior Software Engineer

Image Recognition at the Speed of Memory Bandwidth

SingleStore is a real-time data warehouse and a perfect system for large scale operational analytics. SingleStore provides millisecond response times for analytical queries and is a part of the critical path for real-time applications.

We often hear from our customers that they want to do various types of artificial intelligence (AI) and machine learning (ML) model evaluations for IoT data, as well as imagery, in real time.

A good example of this is when you need to find similar images in a large corpus of image data. For instance when you point a camera at a person and are quickly able to determine if that person is in a database. This is what is referred to as real-time facial recognition.

from-images-to-feature-vectorsFrom Images to Feature Vectors

Facial recognition is a subject of ongoing research to efficiently extract feature vectors from images using deep learning. Here is a reference to a modern approach: http://www.robots.ox.ac.uk/~vgg/software/vgg_face/.

For the purpose of this post, we will assume that this is a somewhat solved problem and we can efficiently extract feature vectors from any incoming image. Once those feature vectors are produced, all you need to do is insert them into a SingleStore table with the following simple schema.

CREATE TABLE features (
id bigint(11) NOT NULL AUTO_INCREMENT,
feature_vector binary(4096) DEFAULT NULL,
KEY id (id) USING CLUSTERED COLUMNSTORE
)

A typical way to insert the vectors is to use Apache Spark, which enables quick parallel data transfer into SingleStore.

similarity-searchSimilarity Search

There are two frequently used approaches to measuring the similarity between vectors: cosine similarity (cosine of the angle between the vectors) and Euclidean distance. Cosine similarity is defined as the dot product of the vectors, divided by the product of the vector norms (length of the vectors). If the vectors are normalized, the cosine similarity is simply the dot product of the vectors (since the product of the norms is 1).

(Yes, SingleStore is the database that does dot product and cosine similarity – the term one of our customers used in the Google search that ended with their using SingleStore for a large deployment.)

To search using cosine similarity we can simply run this query to find similar images.

SELECT
id
FROM
features
WHERE
DOT_PRODUCT(feature_vector, <Input>) > 0.9

Input is a feature vector extracted from an incoming image, and 0.9 is a constant that was experimentally tuned, which corresponds to an angle of less than 26 degrees between the feature vector and the input.

Euclidean distance is also frequently used to measure similarity. It is defined as the norm of the vector resulting from the subtraction of two input vectors. The EUCLIDEAN_DISTANCE built-in can also be used to efficiently measure the similarity between vectors.

This query performs a full table scan, which seems like it might be slow, but we will share our approach to perform this computation at memory bandwidth speed.

performancePerformance

Here is our set of assumptions:

  • Memory Speed: 50GB/sec
  • Each image feature vector contains 1024 features, resulting in 4KB/vector

So, if we are limited by memory bandwidth, that means we can search 12.5 million images a second per node or 1 billion images a second on a 100 node cluster. Let’s verify that’s actually true. I developed a simple test by creating a SingleStore columnstore table with the schema above and populated it with 12.5 million random 4KB normalized feature vectors. The machine I used has a 6-core Xeon E5 processor. When I ran the search query, I got a 0.25 second response time.

How can SingleStore run this faster than memory bandwidth? The answer is compression of columnstore tables. Because the random vectors were normalized, they were able to be compressed from 50GB down to a size that can be read from memory in less than 0.25 seconds.

This shows that the DOT_PRODUCT computation can be done faster than 50GB/sec, and if no compression is applied, memory bandwidth is the limiting factor.

SingleStore uses a fast vectorized table scan leveraging Intel’s latest instruction sets: AVX2 and AVX512. SingleStore also uses these instruction set extensions to compute DOT_PRODUCT itself.

conclusionConclusion

Because you can perform image recognition at in-memory speed, your bottleneck for similarity computation is not necessarily compute. We realize that there are other algorithms that gain efficiency by avoiding the full table scan and only lose a small amount of accuracy. However, you can achieve good practical results with a very straightforward implementation.

future-workFuture Work

Currently, we are adding more primitives to enable more machine learning use cases. We are also exploring GPUs, which have much higher memory bandwidth (up to 1TB/sec) to enable real-time scoring for more complex AI/ML problems.

try-it-for-yourselfTry It For Yourself

If you want to try real-time image recognition out for yourself, you can download the newest version of the SingleStoreDB Self-Managed 6 beta, and look at the documentation for the DOT_PRODUCT function.


Share