keyedTable

Syntax

keyedTable(keyColumns, X, [X1], [X2], …..)

or

keyedTable(keyColumns, capacity:size, colNames, colTypes)

or

keyedTable(keyColumns, table)

Arguments

keyColumn is a string scalar or vector indicating the name(s) of the primary key column(s).

For the first scenario: X, X1, …. are vectors.

For the second scenario:

capacity is the amount of memory (in terms of the number of rows) allocated to the table. When the number of rows exceeds capacity, the system will first allocate memory of 1.2~2 times of capacity, copy the data to the new memory space, and release the original memory. For large tables, these steps may use significant amount of memory.

size is the initial size (in terms of the number of rows) of the table. For this scenario, size must be 0.

colNames is a string vector of column names.

colTypes is a string vector of data types.

For the third scenario, table is a table. Please note that keyColumns in table cannot have duplicate values.

Details

Create an keyed table, which is a special type of in-memory table with primary key. The primary key can be one column or multiple columns.

When appending to the keyed table, if a new row has the same primary key value as an existing row, the existing row will be overwritten with the new row.

If the filtering conditions in a SQL statement don’t use “or” operator, contain all keyColumns, and each condition uses “=” or “in” operator, and there are at most 2 “in” operators, query performance on a keyed table is optimized and is better than that on an ordinary in-memory table.

Please refer to the optimized SQL query in indexed table .

Please refer to Example 3 for details.

Examples

Example 1. Create a keyed table.

The first scenario:

$ sym=`A`B`C`D`E
$ id=5 4 3 2 1
$ val=52 64 25 48 71
$ t=keyedTable(`sym`id,sym,id,val)
$ t;

id

x

val

A

5

52

B

4

64

C

3

25

D

2

48

E

1

71

The second scenario:

$ t=keyedTable(`sym`id,1:0,`sym`id`val,[SYMBOL,INT,INT])
$ insert into t values(`A`B`C`D`E,5 4 3 2 1,52 64 25 48 71);

The third scenario:

$ tmp=table(sym, id, val)
$ t=keyedTable(`sym`id, tmp);

Create a keyed in-memory partitioned table:

$ sym=`A`B`C`D`E
$ id=5 4 3 2 1
$ val=52 64 25 48 71
$ t=keyedTable(`sym`id,sym,id,val)
$ db=database("",VALUE,sym)
$ pt=db.createPartitionedTable(t,`pt,`sym).append!(t);

Example 2. Update a keyed table.

$ t=keyedTable(`sym,1:0,`sym`datetime`price`qty,[SYMBOL,DATETIME,DOUBLE,DOUBLE])
$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:00 2018.06.08T12:30:00 2018.06.08T12:30:00,50.3 45.6 58.0,5200 4800 7800)
$ t;

sym

datetime

price

qty

APPL

2018.06.08T12:30:00

50.3

5200

IBM

2018.06.08T12:30:00

45.6

4800

GOOG

2018.06.08T12:30:00

58

7800

Insert a new row with duplicate primary key value as an existing row. The existing row will be overwritten:

$ insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:01 2018.06.08T12:30:01 2018.06.08T12:30:01,65.8 45.2 78.6,5800 8700 4600)
$ t;

sym

datetime

price

qty

APPL

2018.06.08T12:30:01

65.8

5800

IBM

2018.06.08T12:30:01

45.2

8700

GOOG

2018.06.08T12:30:01

78.6

4600

Insert new rows among which there are duplicate primary key values:

$ insert into t values(`MSFT`MSFT,2018.06.08T12:30:01 2018.06.08T12:30:01,45.7 56.9,3600 4500)
$ t;

sym

datetime

price

qty

APPL

2018.06.08T12:30:01

65.8

5800

IBM

2018.06.08T12:30:01

45.2

8700

GOOG

2018.06.08T12:30:01

78.6

4600

MSFT

2018.06.08T12:30:01

56.9

4500

The primary key cannot be updated:

$ update t set sym="C_"+sym;
Can't update a key column.

Example 3. Query on a keyed table.

In some cases, queries on a keyed table are optimized. In this section we will compare the performance of queries on keyed tables and ordinary in-memory tables.

For the following examples, we first create a keyed table and an ordinary in-memory table with 1 million records each:

$ id=shuffle(1..1000000)
$ date=take(2012.06.01..2012.06.10, 1000000)
$ type=rand(9, 1000000)
$ val=rand(100.0, 1000000)
$ t=table(id, date, type, val)
$ kt=keyedTable(`id`date`type, id, date, type, val);

Example 3.1

$ timer(100) select * from t where id=500000, date=2012.06.01, type=0;
Time elapsed: 161.574 ms

$ timer(100) select * from kt where id=500000, date=2012.06.01, type=0;
Time elapsed: 1.483 ms

$ timer(100) sliceByKey(t1, (500000, 2012.06.01, 0))
Time elapsed: 0.705 ms

Example 3.2

$ timer(100) select * from t where id in [1, 500000], date in 2012.06.01..2012.06.05, type=5;
Time elapsed: 894.241 ms

$ timer(100) select * from kt where id in [1, 500000], date in 2012.06.01..2012.06.05, type=5;
Time elapsed: 2.322 ms

With more than 2 “in” operators in the filtering conditions, however, a query on a keyed table is not optimized.

Example 3.3

$ timer(100) select * from t where id in [1, 500000], date in 2012.06.01..2012.06.05, type in 1..5;
Time elapsed: 801.347 ms

$ timer(100) select * from kt where id in [1, 500000], date in 2012.06.01..2012.06.05, type in 1..5;
Time elapsed: 834.184 ms

If the filtering conditions do not include all key columns, a query on a keyed table is not optimized.

Example 3.4

$ timer(100) select * from t where id=500000, date in 2012.06.01..2012.06.05;
Time elapsed: 177.113 ms

$ timer(100) select * from kt where id=500000, date in 2012.06.01..2012.06.05;
Time elapsed: 163.265 ms

Related function: indexedTable