Vector

Vector is a container that holds a list of objects in a given order. The index of elements in a vector always starts from 0. Vectors can be modified or augmented. Data in a vector can be of different types, but the performance is optimized when all elements in a vector are of the same type. This section discusses vectors with scalar elements of the same data type (also called “typed vectors”). The next section Tuple (ANY Vector) discusses vectors with elements of different data types or with elements that are not scalars (also called “tuples” or “ANY vectors”).

Creating Vectors

There are multiple ways of creating a vector:

1. Space-separated elements

$ x = 3 6 1 5 9;
$ x;
[3,6,1,5,9]
// typed vectors are always displayed within square brackets. We will see later that tuples are always displayed within round brackets.

$ x=10 2.5;
$ typestr x;
FAST DOUBLE VECTOR
// note this is a DOUBLE vector, not a tuple. The system interprets 10 as a DOUBLE, not an INT.

$ x = 3 NULL 6 1 NULL 9;
$ x;
[3,,6,1,,9]
$ typestr x;
FAST INT VECTOR

There is a convenient way to create a STRING vector with backquotes (`) if all of its elements are one word:

$ x=`IBM`MS`GOOG`YHOO`EBAY;
$ x;
["IBM","MS","GOOG","YHOO","EBAY"]

If an element of a STRING vector contain space, it must be enclosed by double quotes or single quotes.

$ x="Goldman Sachs" ‘Morgan Stanley’;
$ x;
["Goldman Sachs","Morgan Stanley"]

2. Comma-separated elements in square brackets.

$ x = [3,6,1,5,9];
$ x;
[3,6,1,5,9]
$ size x;
5

// compare x with the following:
$ y = [3 6 1 5 9];
$ y;
([3,6,1,5,9])
$ size y;
1
// since we don't use comma within the square brackets of y, y is a one-element vector.

3. A vector from a seq.

$ x=1..10
$ x;
[1,2,3,4,5,6,7,8,9,10]

4. A vector from a random sequence. The example below generates a random vector with function rand.

$ x=rand(3,10);
// generate a random vector of size 10 with 0,1 and 2.
$ x;
[0,0,2,1,1,2,1,1,2,0]

$ timestamp=09:30:00+rand(60,5);
$ timestamp;
[09:30:25,09:30:00,09:30:40,09:30:19,09:30:53]

$ price=5.0+rand(100.0,5)
$ price;
[32.826156,13.066499,52.872136,70.885178,104.408126]

5. Use the array.

$ x = array(int, 0);
// initialize an empty integer vector

$ x;
[]
$ x.append!(1..10);
// after a vector is created, its length could be extended with the append! function.
$ x;
[1,2,3,4,5,6,7,8,9,10]

DolphinDB offers a “capacity” parameter in the array function. If used properly, it can enhance the performance. For details, please see array function.

6. A vector from a matrix column. For example, the first column of matrix m:

$ m=1..6$2:3;
$ m;

#0

#1

#2

1

3

5

2

4

6

$ m[0];
[1,2]

7. A vector from a table column. For example, trades.qty indicates column qty from table trades.

$ trades=table(`A`B`C as sym, 100 200 300 as qty);
$ trades.qty;

[100, 200, 300]

Accessing Vectors

We can access a vector with X[Y], where Y can be an integer, a boolean vector, an integer vector or a pair.

  • Accessing a vector by position. To access the i-th element in vector X, use X[i]. i starts from 0.

$ x=3 6 1 5 9;
$ x[1];
6
  • Accessing a sub sequence of a vector with boolean indexing or boolean expression. The boolean indexing vector should have the same size as the original vector.

$ x=3 6 1 5 9;
$ y= true false true false false;
$ x[y];
[3,1]

$ x[x>3];
[6,5,9]

$ x[x%3==0];
[3,6,9]
  • Accessing a sub sequence of vector X with an indexing vector, which contains the indexes of the values being retained.

$ x=3 6 1 5 9;
$ y=4 0 2;
$ x[y];
[9,3,1]
  • Accessing a sub sequence of vector X with a pair in the format of X[a:b], where 0<=a, b<=size(X). This indexing is upper bound exclusive.

$ x=3 6 1 5 9;
$ x[1:3];
[6,1]

$ x[3:1];
[1,6]
// in reverse order

$ x[1:];
[6,1,5,9]
// accessing the vector elements staring from position 1

$ x[:3];
[3,6,1]
// accessing the first 3 elements

$ x[:];
[3,6,1,5,9]
// accessing all elements of x

Function size returns the number of elements of a vector, while count returns the number of non NULL elements.s

$ x=3 6 NULL 5 9;
$ size x;
5
$ count x;
4

Modifying Vectors

The elements of a vector can be modified. Appending new elements to a vector is also permitted. However, it is not allowed to delete or insert elements at any position except the last position. To delete the last element, use the function pop!.

$ x = 3 6 1 5 9;

$ x[1]=4;
$ x;
[3,4,1,5,9]

$ x.append!(10 11 12);
[3,4,1,5,9,10,11,12]
$ x;
[3,4,1,5,9,10,11,12]

$ x.pop!();
12
$ x;
[3,4,1,5,9,10,11]

A new copy of x is assigned to y with statement y=x. As a result, statement “y=x; y[i]=a;” does not change vector x. This is different from Python.

$ x = 3 6 1 5 9;
$ y=x;
$ y;
[3,6,1,5,9]

$ y[1]=5;
$ y;
[3,5,1,5,9]

$ x;
[3,6,1,5,9]

If we use &y=x, both x and y point to the same object. Modifying either x or y will update the other.

$ x = 3 6;
$ &y=x;
$ y;
[3,6]

$ y[1]=5;
$ x;
[3,5]

$ x[0]=6;
$ y;
[6,5]

Replacing a sub vector through statement: x[begin:end] = a vector or a scalar; note that the upper bound is exclusive.

$ x = 3 4 1 5 9;
$ x[3:5]=7..8;
$ x;
[3,4,1,7,8]

$ x[2:]=1;
$ x;
[3,4,1,1,1]

$ x[:]=2;
$ x;
[2,2,2,2,2]

Replacing a sub vector with a boolean expression: x[boolean expression] = y

$ x=`IBM`MS`GOOG`YHOO`EBAY;
$ x[x==`MS]=`GS
$ x;
["IBM","GS","GOOG","YHOO","EBAY"]

$ x=1..10;
$ x[x%3==0]=99;
$ x;
[1,2,99,4,5,99,7,8,99,10]

$ x=6 4 2 0 2 4 6;
$ x[x>3];
[6,4,4,6]

$ shares=500 1000 1000 600 2000;
$ prices=25.5 97.5 19.2 38.4 101.5;
$ prices[shares>800];
[97.5,19.2,101.5]

Appending a vector with different data types

What happens if we append typed vectors with data of different types? This operation can be successful only if the new elements can be converted to the vector’s data type. Otherwise, this operation fails. For example, we cannot append an INT vector with a STRING.

// append an INT vector with STRING
$ x=1 2 3;
$ typestr x;
FAST INT VECTOR
$ x.append!(`orange);
Incompatible type. Expected: INT, Actual: STRING

// append an INT vector with DOUBLE
$ x.append!(4.3);
[1,2,3,4]
$ typestr x;
FAST INT VECTOR

// append an INT vector with BOOL
$ x.append!(false);
[1,2,3,4,0]
$ typestr x;
FAST INT VECTOR

// append a STRING vector with INT
$ x=`C `GS `MS;
$ x.append!(4);
["C","GS","MS","4"]
$ x[3];
4
$ typestr x[3];
STRING

Manipulating Vectors

Function reverse returns a new vector in the reverse order.

$ x=1..10;
$ y=reverse x;
$ y;
[10,9,8,7,6,5,4,3,2,1]
$ x;
[1,2,3,4,5,6,7,8,9,10]

Function shuffle returns a new vector that randomly reorganize the order of the elements of the original vector.

$ x=1..10;
$ shuffle x;
[9,2,10,3,1,6,8,4,5,7]
$ x;
[1,2,3,4,5,6,7,8,9,10]

Function shuffle!() changes the input variable by randomly reorganizing the order of its elements.

$ x=1..10;
$ shuffle!(x);
[8,10,1,3,2,4,7,5,6,9]
$ x;
[8,10,1,3,2,4,7,5,6,9]

Function join concatenates two vectors and returns a new vector.

$ x=1..3;
$ y=4..6;
$ z=join(x, y);
$ z;
[1,2,3,4,5,6]
$ x join y join y;
[1,2,3,4,5,6,4,5,6]

Function cut(X, a) divides a vector into sub vectors, where a is the size of the sub vectors. To merge a list of vectors, use function flatten.

$ x=1..10;
$ x cut 2;
([1,2],[3,4],[5,6],[7,8],[9,10])  // this is a tuple.

$ x cut 3;
([1,2,3],[4,5,6],[7,8,9],[10])   // the remaining element "10" forms an vector.

$ x cut 5;
([1,2,3,4,5],[6,7,8,9,10])

$ x cut 9;
([1,2,3,4,5,6,7,8,9],[10])

$ flatten (x cut 9);
[1,2,3,4,5,6,7,8,9,10]

Function take(X, n) takes n elements from vector X starting with the first element. If n is larger than the size of X, it restarts from the first element.

$ x=3 6 1 5 9;
$ x take 3;
[3,6,1]
$ take(x,12);
[3,6,1,5,9,3,6,1,5,9,3,6]

The following 3 functions make it very convenient to deal with lead-lag relations in time series analysis.

prev(X): move all elements of a vector one position to the right

next(X): move all elements of a vector one position to the left

move(X,a): move all elements of a vector k positions to the right when k is a positive integer; move -k positions to the left when k is a negative integer.

$ x=3 6 1 5 9;
$ y=prev x;
$ y;
[,3,6,1,5]
$ z = next x;
$ z;
[6,1,5,9,]

$ v=x move 2;
$ v;
[,,3,6,1]
$ x move -2;
[1,5,9,,]

Searching in Vectors

For each element in X, function in(X,Y) checks whether it exists in vector Y.

$ x = 4 5 16;
$ y = 3 6 1 5 9 4 19 31 2 8 7 2;
$ x in y;
[1,1,0]
// both 4 and 5 are in y; 16 is not in y;

$ in(x,y);
[1,1,0]

Function at returns the positions of elements of vector X that meet certain conditions.

$ x=1 2 3 2 1;
$ at(x==2);
[1,3]

$ at(x>2);
[2]

Function find(X, Y) returns the position(s) of the first occurrence(s) of each element of Y in vector X, where Y could be a scalar or vector. If an element in Y does not exist in X, the function returns -1.

$ x = 8 6 4 2 0 2 4 6 8;
$ find(x,2);
3

$ y= 6 0 7;
$ x find y;
[1,4,-1]

When searching in a large vector for elements in another large vector with function find, the system would build a dictionary to optimize the performance. However, when searching a couple of values against a large vector, the system would not construct a dictionary to optimize the performance. Whether to build a dictionary or not in searching is dynamically determined. If we already have a sorted vector and we only need to search a small amount of data, binsrch is a better fit, as building dictionaries for super large data sets can take a significant amount of time and memory.

Function binsrch(X, Y) returns the index(es) of the first occurrences of the elements of Y in vector X, where Y could be a scalar or vector. X has to be sorted in an ascending order. If an element in Y is not in X, then it returns -1.

$ x = 1..10;
$ x binsrch 2;
1
$ y= 4 5 12;
$ x binsrch y;
[3,4,-1]

Function searchK(X, a) returns the value at the k-th smallest position in vector X, where X must be a vector and a must be a scalar. Sorting is not required for vector X.

$ x=9 9 6 6 6 3 0 0;
$ searchK(x,1);
0
$ searchK(x,2);
0
$ searchK(x,3);
3
$ searchK(x,4);
6
$ searchK(x,5);
6
$ searchK(x,6);
6
$ searchK(x,7);
9

When we calculate price to earnings ratios of stocks at a certain date, we use earnings that was announced most recently, not earnings announced on that date as corporate earnings are only announced 4 times in a year. In such situations, the asof function comes in handy.

Assume a vector X has been sorted in ascending order. For each element y of Y, function asof(X, Y) returns the index of the last element in X that is no greater than y. In practice, both X and Y are often temporal types. Also see “asof-join”.

// data about phone number changes
$ x=[2005.12.01, 2007.03.15, 2010.12.24, 2013.08.31];
$ y="(201) 1234-5678" "(212) 8467-5523" "(312) 1258-6679" "(212) 4544-8888";

// get the phone number on date 2005.12.01 and 2010.12.25
$ y[asof(x, [2005.12.01,2010.12.25])];
["(201) 1234-5678","(312) 1258-6679"]

Sorting elements in Vectors

Function sort(X, [boolean]) sorts the values of a vector. It returns a new sorted vector.

$ x = 3 6 1 5 9;
$ y = sort x;
$ y;
[1,3,5,6,9]
$ y = sort(x, false);
// sort x in descending order
$ y;
[9,6,5,3,1]

Function isort(X, [boolean]) returns the indexes of the elements of sort(X) in X. X[isort X] is equivalent to sort(X).

$ isort x;
[2,0,3,1,4]
// 2 0 3 1 4 are the indexes of 1 3 5 6 9 in x

$ isort(x, false);
[4,1,3,0,2]

$ x[isort x];
[1,3,5,6,9]

In contrast,:doc:rank(X, [boolean]) </FunctionsandCommands/FunctionReferences/r/rank> returns the indexes of the elements of X in sort(X).

$ rank x;
[1,3,0,2,4]

$ rank(x, false);
[3,1,4,2,0]

Function sort! produces an in-place sort.

$ x= 3 6 1 5 9;
$ sort!(x);
[1,3,5,6,9]
$ x;
[1,3,5,6,9]

Using Vectors with Operators

All operators can take vectors as input arguments. For more examples and details, please refer to chapters Chapter 4: Operators and Chapter 13: Functions and Commands.

$ x = 1 2 3;
$ y = 4 5 6;

$ x * y;
[4,10,18]

$ x / y;
[0,0,0]
// when both x and y are integers, "/" means integer division, which is equivalent to applying floor function after division.

$ x \ y;
[0.25,0.4,0.5]

$ 3 * x;
[3,6,9]

$ x ** y;
32
// inner product: 1*4 + 2*5 + 3*6

In most scenarios, operations with NULL operand produce NULL value. But there are exceptions. Please see Null Value Operations for details.

$ x = 3 NULL 1 NULL 9;
$ y = 4 5 2 3 NULL;
$ x+y;
[7,,3,,]
$ x>0;
[1,0,1,0,1]

Using Vectors in Functions

The following are examples of applying functions to vectors. For details, please refer to chapter Chapter 13: Functions and Commands.

$ x= 3 6 1 5 9;

$ avg x;
4.8

$ med x;
5

$ sum x;
24

$ std x;
3.03315

$ log x;
[1.098612,1.791759,0,1.609438,2.197225]

$ exp x;
[20.085537,403.428793,2.718282,148.413159,8103.083928]

$ x pow 2;
[9,36,1,25,81]

$ x = 1 2 3;
$ y = 1 2 2;
$ x wavg y;
2.2
// calculate weighted average for vector x: (1*1+2*2+3*2)/(1+2+2)

$ x = 3 NULL NULL NULL 9;
$ avg x;
6
// calculate the average of vector x, ignoring NULL values.

$ y = 1 2 3 4 5;
$ x wavg y;
8

Vector of functions

Vectors can be the container for a list of functions. This is especially useful when running the same inputs through multiple functions that return the same types of data.

$ x=[3.1, 4.5, 6];
$ desp=[std, avg, sum, median, max, min];
$ desp(x);
[1.450287,4.533333,13.6,4.5,6,3.1]

// use both element-wise and aggregation functions at the same time.
$ desp=[log, std, avg, sum, median, max, min];
$ desp(x);

log

std

avg

sum

med

max

min

1.131402

1.450287

4.533333

13.6

4.5

6

3.1

1.504077

1.450287

4.533333

13.6

4.5

6

3.1

1.791759

1.450287

4.533333

13.6

4.5

6

3.1