createPartitionedTable

Syntax

createPartitionedTable(dbHandle, table, tableName, [partitionColumns], [compressMethods])

Arguments

dbHandle the distributed database where the partitioned table will be saved. The database can be either in the local file system, or in the distributed file system. If the first parameter directory for dbHandle is unspecified, the result is an in-memory partitioned table.

table a table or a list of tables. The schema of table will be used to construct the new partitioned table.

tableName a string indicating the name of the distributed table to be saved on disk, or the name of the in-memory partitioned table.

partitionColumns a string or a string vector indicating the partitioning column(s). For a composite partition, partitionColumns is a string vector.

compressMethods a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods (“lz4” or “delta”). If unspecified, use LZ4 compression method. Please note that the delta compression method can only be used for SHORT, INT, LONG or temporal data types.

Details

Create an empty partitioned table on disk or in memory with the same schema as the specified table. To create a table on disk, parameter table must be a table. To create an in-memory partitioned table, parameter table can be a table or a tuple of tables.

  • If the parameter table is a table: generate an empty partitioned table on disk with the schema of the model table. This function is used with append! or tableInsert to generate a partitioned table. It cannot be used to create a partitioned table with sequential domain.

  • If the parameter table is a list of tables: create an in-memory partitioned table. The number of tables given by the parameter table must be the same as the number of partitions in the database.

Note:

  • Only the schema of table is used. None of the rows in table is imported to the newly created partitioned table.

  • In a distributed OLAP database, the maximum number of handles (including temporary handles*) to partitioned tables is 8,192 per node.

*temporary handles: If no handle is specified when you create a partitioned table in a distributed database with createPartitionedTable, each database creates a temporary handle to hold the return value. If you create multiple tables under the same database, the temporary handle for the database is overwritten each time.

Examples

Example 1. Create a DFS table

$ n=1000000;
$ t=table(2020.01.01T00:00:00 + 0..(n-1) as timestamp, rand(`IBM`MS`APPL`AMZN,n) as symbol, rand(10.0, n) as value)
$ db = database("dfs://rangedb_tradedata", RANGE, `A`F`M`S`ZZZZ)
$ Trades = db.createPartitionedTable(table=t, tableName="Trades", partitionColumns="symbol", compressMethods={timestamp:"delta"});

At this point, the table Trades is empty. The schema of Trades is the same as the schema of table t. Next, we append table t to table Trades.

$ Trades.append!(t);

Now the contents of table Trades have been updated on disk. In the local file system, the system does not dynamically refresh the contents of tables. We need to load the table into memory before we can work with it interactively.

$ Trades=loadTable(db,`Trades);
$ select min(value) from Trades;
0

The in-memory table t is saved as a DFS table Trades on disk.

After appending data to a DFS table, we don’t need to use function loadTable to load the table before querying the table, as the distributed file system automatically refreshes the table after appending operations. After system restarts, however, we need to use loadTable to load a DFS table before querying the table.

Example 2. Create in-memory partitioned tables

Example 2.1. Create a partitioned in-memory table

$ n = 200000
$ colNames = `time`sym`qty`price
$ colTypes = [TIME,SYMBOL,INT,DOUBLE]
$ t = table(n:0, colNames, colTypes)
$ db = database(, RANGE, `A`D`F)
$ pt = db.createPartitionedTable(t, `pt, `sym)

$ insert into pt values(09:30:00.001,`AAPL,100,56.5)
$ insert into pt values(09:30:01.001,`DELL,100,15.5)

Example 2.2. Create a partitioned keyed table

$ n = 200000
$ colNames = `time`sym`qty`price
$ colTypes = [TIME,SYMBOL,INT,DOUBLE]
$ t = keyedTable(`time`sym, n:0, colNames, colTypes)
$ db = database(, RANGE, `A`D`F)
$ pt = db.createPartitionedTable(t, `pt, `sym)

$ insert into pt values(09:30:00.001,`AAPL,100,56.5)
$ insert into pt values(09:30:01.001,`DELL,100,15.5)

Example 2.3. Create a partitioned stream table.

Please note that when creating a partitioned stream table, the second parameter of createPartitionedTable must be a tuple of tables, and its length must be equal to the number of partitions. Each table in the tuple represents a partition. In the following example, trades_stream1 and trades_stream2 form a partitioned stream table trades. We can’t directly write data to trades. Instead, we need to write to trades_stream1 and trades_stream2.

$ n=200000
$ colNames = `time`sym`qty`price
$ colTypes = [TIME,SYMBOL,INT,DOUBLE]
$ trades_stream1 = streamTable(n:0, colNames, colTypes)
$ trades_stream2 = streamTable(n:0, colNames, colTypes)
$ db=database(, RANGE, `A`D`F)
$ trades = createPartitionedTable(db,[trades_stream1, trades_stream2], "", `sym)

$ insert into trades_stream1 values(09:30:00.001,`AAPL,100,56.5)
$ insert into trades_stream2 values(09:30:01.001,`DELL,100,15.5)

$ select * from trades;

time

sym

qty

price

09:30:00.001

AAPL

100

56.5

09:30:01.001

DELL

100

15.5

Example 2.4. Create a partitioned MVCC table.

Similar to creating a partitioned stream table, to create a partitioned MVCC table, the second parameter of createPartitionedTable must be a tuple of tables, and its length must be equal to the number of partitions. Each table in the tuple represents a partition. In the following example, trades_mvcc1 and trades_mvcc2 form a partitioned MVCC table trades. We can’t directly write data to trades. Instead, we need to write to trades_mvcc1 and trades_mvcc2.

$ n=200000
$ colNames = `time`sym`qty`price
$ colTypes = [TIME,SYMBOL,INT,DOUBLE]
$ trades_mvcc1 = mvccTable(n:0, colNames, colTypes)
$ trades_mvcc2 = mvccTable(n:0, colNames, colTypes)
$ db=database(, RANGE, `A`D`F)
$ trades = createPartitionedTable(db,[trades_mvcc1, trades_mvcc2], "", `sym)

$ insert into trades_mvcc1 values(09:30:00.001,`AAPL,100,56.5)
$ insert into trades_mvcc2 values(09:30:01.001,`DELL,100,15.5)

$ select * from trades;

time

sym

qty

price

09:30:00.001

AAPL

100

56.5

09:30:01.001

DELL

100

15.5