Arrays a Hidden Gem in MemSQL

Arrays – A Hidden Gem in MemSQL

Eric Hanson
Eric Hanson

Released this March, MemSQL 6 Beta 1 introduced MemSQL Procedural SQL (MPSQL). MPSQL supports the creation of:

  • User-Defined Functions (UDFs)
  • Stored Procedures (SPs)
  • Table-Valued Functions (TVFs)
  • User-Defined Aggregate Functions (UDAFs)

A Hidden Gem: Array Types

There’s a hidden gem in MemSQL 6 Beta 1 that we didn’t document at first — array types!  These make programming much more convenient. Since we compile your extensions to machine code, the performance is fantastic. And you don’t have to leave the comfort of your database language to get it.

To declare and initialize a new array of 10 integers, you simply say

declare a array(int) = create_array(10);

You might we wondering, how does MemSQL know what kind of array to create when you call the create_array() function?  The secret is type inference. Our compiler knows the data type of the variable you’re assigning the result of create_array() to. It uses that to determine the data type of the array.

Our arrays are zero-based, meaning the positions of the array of 10 integers above are numbered 0..9.

Real Examples

Let’s try some real examples. Here’s a function to create an array of size n containing a pseudo-random sequence of numbers:

delimiter //
create function random_array(n bigint)
  returns array(bigint not null) as
declare
      result array(bigint not null) = create_array(n);
      a bigint not null = 573829473;
      b bigint not null = 837562837;
      m bigint not null = 100000000;
      crnt bigint not null = 17;
begin
      for i in 0 .. (n - 1) loop
          crnt *= a;
          crnt += b;
          crnt %= m;
          result[i] = crnt;
      end loop;
      return result;
end //
delimiter ;

Suppose you want to display the array — you’ll need to convert it to text:

delimiter //
create function to_text(a array(bigint not null))
  returns longtext as
declare
        result longtext = "[";
        comma bool = false;
begin
    for i in 0 .. (length(a) - 1) loop
            if comma then
                  result = concat(result, ", ");
            end if;
            comma = true;
            result = concat(result, a[i]);
    end loop;
    return concat(result, "]");
end //
delimiter ;

Let’s put it all together:

memsql> select to_text(random_array(5));
+----------------------------------------------------+
| to_text(random_array(5)) |
+----------------------------------------------------+
| [92663878, 16439131, 15870800, 37651237, 23070938] |
+----------------------------------------------------+
1 row in set (0.37 sec)

MemSQL dynamically compiles code in functions. The first time (or sometimes first few times) that you run them, they can take a little longer to run due to compilation. Hence the 0.37 sec time above. Let’s see what happens if we run it again:

memsql> select to_text(random_array(5));
+----------------------------------------------------+
| to_text(random_array(5)) |
+----------------------------------------------------+
| [92663878, 16439131, 15870800, 37651237, 23070938] |
+----------------------------------------------------+
1 row in set (0.00 sec)

So, after we compile the function, it runs in milliseconds!

Speaking of performance, in our tests, a merge-sort UDF written in MPSQL can sort an array of 200,000 numbers in 0.2 seconds. Try doing that in another database language!

Multidimensional Arrays

You can declare multidimensional arrays too, like so:

declare b array(array(int));

Here’s an example function that creates a two-dimensional array and sums the contents of all the elements:>

delimiter //
create or replace function md_array_sum() returns int as
declare
      b array(array(int));
      s int = 0;
begin
      b = [[0, 100], [100, 0]];
      for i in 0 .. length(b) - 1 loop
        for j in 0 .. length(b[i]) - 1 loop
            s += b[i][j];
        end loop;
      end loop;
      return s;
end //
delimiter ;

Notice that it creates an array literal [[0, 100], [100, 0]] and assigns it to b. This is another way to make an array, as an alternative to create_array(). Our multi-dimensional arrays are called “ragged arrays” because individual sub-arrays can contain different numbers of elements, although we didn’t do that in this example.

Let’s take it for a spin:

memsql> select md_array_sum();
+----------------+
| md_array_sum() |
+----------------+
|      200       |
+----------------+
1 row in set (0.00 sec)

Try MemSQL 6 Beta

Arrays are an amazing addition to MemSQL. You can learn more about arrays and MemSQL Procedural SQL in our online documentation. Download and try MemSQL 6 Beta 1 today. Enjoy!



MemSQL, Multidimensional Arrays, Array Types, MemSQL Beta Introduction