Big Array

Big arrays are specially designed for advanced users in big data analysis. Regular arrays use continuous memory. If there is not enough continuous memory, an out of memory exception will occur. A big array consists of many small memory blocks instead of one large memory block. Therefore big arrays help relieve the memory fragmentation issue. This, however, may come with light performance penalty for certain operations. For most users who don’t need to worry about the memory fragmentation issue, use regular arrays instead of big arrays.

A big array’s minimum size is 16 MB.Users can declare a big array with the function bigarray Functions and operations on regular arrays also apply to big arrays.

When we call the array function, if there is not enough continuous memory block available, or if the memory occupied by the array exceeds a certain threshold (the default threshold is 512 MB), the system will create a big array instead. We can override the default threshold in the configuration file by setting the attribute regularArrayMemoryLimit to a different value.

Syntax

bigarray(dataType, initialSize, [capacity], [defaultValue])

or

bigarray(template, [initialSize], [capacity], [defaultValue])

Examples

// For many data types, the default values are 0. For String and Symbol, the default values are NULLs.
$ x=bigarray(int,10,10000000);
$ x;
[0,0,0,0,0,0,0,0,0,0]

// default value is set to 1
$ x=bigarray(int,10,10000000,1);
$ x;
[1,1,1,1,1,1,1,1,1,1]

$ x=bigarray(int,0,10000000).append!(1..100);
$ x[0];
1
$ sum x;
5050
$ x[x>50&&x<60];
[51,52,53,54,55,56,57,58,59]

$ x=array(double, 40000000);
$ typestr x;
HUGE DOUBLE VECTOR

For sequential operations, the performance of arrays and that of big arrays are nearly identical.

$ n=20000000
$ x=rand(10000, n)
$ y=rand(1.0, n)
$ bx= bigarray(int, 0, n).append!(x)
$ by= bigarray(double,0,n).append!(y);

$ timer(100) wavg(x,y);
Time elapsed: 4869.74 ms
$ timer(100) wavg(bx,by);
Time elapsed: 4762.89 ms

$ timer(100) x*y;
Time elapsed: 7525.22 ms
$ timer(100) bx*by;
Time elapsed: 7791.83 ms

For random access, big arrays have light performance penalty.

$ indices = shuffle 0..(n-1);
$ timer(10) x[indices];
Time elapsed: 2942.29 ms
$ timer(10) bx[indices];
Time elapsed: 3547.22 ms