indexedTable

Syntax

indexedTable(keyColumns, X, [X1], [X2], …..)

or

indexedTable(keyColumns, capacity:size, colNames, colTypes)

or

indexedTable(keyColumns, table)

Arguments

keyColumns is a string scalar or vector indicating the name(s) of the primary key column(s).

For the first scenario: X, X1, …. are vectors.

For the second scenario:

capacity is the amount of memory (in terms of the number of rows) allocated to the table. When the number of rows exceeds capacity, the system will first allocate memory of 1.2~2 times of capacity, copy the data to the new memory space, and release the original memory. For large tables, these steps may use significant amount of memory.

size is the initial size (in terms of the number of rows) of the table. For this scenario, size must be 0.

colNames is a string vector of column names.

colTypes is a string vector of data types.

For the third scenario, table is a table. Please note that keyColumns cannot have duplicate values.

Details

Create an indexed table, which is a special type of in-memory table with primary key. The primary key can be one column or multiple columns.

When appending to the indexed table, if a new row has the same primary key value as an existing row, the existing row will be overwritten with the new row.

If the filtering conditions in a SQL statement don’t use “or” operator, contain the first key column, the filtering condition for the first key column uses “=” or “in” operator, and there are at most 2 “in” operators, query performance on an indexed table is optimized and is better than that on an ordinary in-memory table. In comparison, for optimized query performance on a keyed table, the filtering conditions must include all key columns.

Please refer to the optimized SQL query in keyed table .

Examples

Example 1. Create an indexed table.

The first scenario:

$ sym=`A`B`C`D`E
$ id=5 4 3 2 1
$ val=52 64 25 48 71
$ t=indexedTable(`sym`id,sym,id,val)
$ t;

sym

id

col1

A

5

52

B

4

64

C

3

25

D

2

48

E

1

71

The second scenario:

$ t=indexedTable(`sym`id,1:0,`sym`id`val,[SYMBOL,INT,INT])
$ insert into t values(`A`B`C`D`E,5 4 3 2 1,52 64 25 48 71);

The third scenario:

$ tmp=table(sym, id, val)
$ t=indexedTable(`sym`id, tmp);

Create an indexed in-memory partitioned table:

$ t=indexedTable(`sym`id,sym,id,val)
$ db=database("",VALUE, sym)
$ pt=db.createPartitionedTable(t,`pt,`sym).append!(t);

Example 2. Update an indexed table.

$ t=indexedTable(`sym,1:0,`sym`datetime`price`qty,[SYMBOL,DATETIME,DOUBLE,DOUBLE])
$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:00 2018.06.08T12:30:00 2018.06.08T12:30:00,50.3 45.6 58.0,5200 4800 7800)
$ t;

sym

datetime

price

qty

APPL

2018.06.08T12:30:00

50.3

5200

IBM

2018.06.08T12:30:00

45.6

4800

GOOG

2018.06.08T12:30:00

58

7800

Insert a new row with duplicate primary key value as an existing row. The existing row will be overwritten:

$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:01 2018.06.08T12:30:01 2018.06.08T12:30:01,65.8 45.2 78.6,5800 8700 4600)
$ t;

sym

datetime

price

qty

APPL

2018.06.08T12:30:01

65.8

5800

IBM

2018.06.08T12:30:01

45.2

8700

GOOG

2018.06.08T12:30:01

78.6

4600

The primary key cannot be updated:

$ update t set sym="C_"+sym;
Can't update a key column.

Example 3. Query on an indexed table.

In some cases, queries on an indexed table are optimized. In this section we will compare the performance of queries on indexed tables vs ordinary in-memory tables.

For the following examples, we first create an ordinary in-memory table t and an indexed table t1 with 1 million records each:

$ id=shuffle(1..1000000)
$ date=take(2012.06.01..2012.06.10, 1000000)
$ type=take(0..9, 1000000)
$ val=rand(100.0, 1000000)
$ t=table(id, date, type, val)
$ t1=indexedTable(`id`date`type, id, date, type, val);

Use the first key column in the filtering condition:

$ timer(100) select * from t where id=500000;
Time elapsed: 177.286 ms

$ timer(100) select * from t1 where id=500000;
Time elapsed: 1.245 ms

$ timer(100) sliceByKey(t1, 500000)
Time elapsed: 0.742 ms

$ timer(100) select * from t where id in [500000, 600000, 700000];
Time elapsed: 1134.429 ms

$ timer(100) select * from t1 where id in [500000, 600000, 700000];
Time elapsed: 1.377 ms

If the filtering condition for the first key column does not use “=” or the “in” operator, then the performance of a query on an indexed table is not optimized:

$ timer(100) select * from t where id between 500000:500010;
Time elapsed: 641.544 ms

$ timer(100) select * from t1 where id between 500000:500010;
Time elapsed: 599.752 ms

Use the first key column and the third key column in the filtering conditions:

$ timer(100) select * from t where id=500000, type in [3,6];
Time elapsed: 172.808 ms

$ timer(100) select * from t1 where id=500000, type in [3,6];
Time elapsed: 1.664 ms

If the filtering conditions do not use the first key column, then the performance of a query on an indexed table is not optimized:

$ timer(100) select * from t where date in [2012.06.03, 2012.06.06];
Time elapsed: 490.182 ms

$ timer(100) select * from t1 where date in [2012.06.03, 2012.06.06];
Time elapsed: 544.015 ms

$ timer(100) select * from t where date=2012.06.03, type=8;
Time elapsed: 205.443 ms

$ timer(100) select * from t1 where date=2012.06.03, type=8;
Time elapsed: 204.532 ms

With more than 2 “in” operators in the filtering conditions, the performance of a query on an indexed table is not optimized:

$ timer(100) select * from t where id in [100,200], date in [2012.06.03, 2012.06.06], type in [3,6];
Time elapsed: 208.714 ms

$ timer(100) select * from t1 where id in [100,200], date in [2012.06.03, 2012.06.06], type in [3,6];
Time elapsed: 198.674 ms

Related function: keyedTable