bigarray

Syntax

bigarray(dataType|template, [initialSize], [capacity], [defaultValue])

Arguments

dataType is the data type for the big array.

template is an existing big array. The existing big array serves as a template and its data type determines the new big array’s data type.

initialSize is the initial size (in terms of the number of elements) of the big array. If the first parameter is a data type, then initialSize is required; if the first parameter is an existing big array, then initialSize is optional;

capacity is is the amount of memory (in terms of the number of elements) allocated to the big array. When the number of elements exceeds capacity, the system will first allocate memory of 1.2~2 times of capacity, copy the data to the new memory space, and release the original memory.

defaultValue is the default value of the big array. For many data types, the default values are 0. For string and symbol, the default values are NULLs.

Details

Big arrays are specially designed for advanced users in big data analysis. Regular arrays use continuous memory. If there is not enough continuous memory, an out of memory exception will occur. A big array consists of many small memory blocks instead of one large block of memory. Therefore big arrays help relieve the memory fragmentation issue. This, however, may come with light performance penalty for certain operations. For most users who don’t need to worry about the memory fragmentation issue, they should use regular arrays instead of big arrays.

A big array’s minimum size or capacity is 16 MB. Users can declare a big array with the function bigarray. Functions and operations on regular arrays also apply to big arrays.

When we call the array function, if there are not enough continuous memory blocks available, or if the memory occupied by the array exceeds a certain threshold (the default threshold is 256 MB), the system creates a big array instead.

Examples

$ x=bigarray(int,10,10000000);
$ x;
[0,0,0,0,0,0,0,0,0,0]

// default value is set to 1
$ x=bigarray(int,10,10000000,1);
$ x;
[1,1,1,1,1,1,1,1,1,1]

$ x=bigarray(int,0,10000000).append!(1..100);
$ x[0];
1
$ sum x;
5050
$ x[x$ 50&&x<60];
[51,52,53,54,55,56,57,58,59]

$ x=array(double, 40000000);
$ typestr x;
HUGE DOUBLE VECTOR

Performance comparison of arrays and big arrays:

// for sequential operations, the performance of arrays and that of big arrays are nearly identical.     $ n=20000000
$ x=rand(10000, n)
$ y=rand(1.0, n)
$ bx= bigarray(int, 0, n).append!(x)
$ by= bigarray(double,0,n).append!(y);

$ timer(100) wavg(x,y);
Time elapsed: 4869.74 ms
$ timer(100) wavg(bx,by);
Time elapsed: 4762.89 ms

$ timer(100) x*y;
Time elapsed: 7525.22 ms
$ timer(100) bx*by;
Time elapsed: 7791.83 ms

// for random access, big arrays have light performance penalty.
$ indices = shuffle 0..(n-1);
$ timer(10) x[indices];
Time elapsed: 2942.29 ms
$ timer(10) bx[indices];
Time elapsed: 3547.22 ms