createDistributedInMemoryTable

Syntax

createDistributedInMemoryTable(tableName, colNames, colTypes, globalPartitionType, globalPartitionScheme, globalPartitionColumn, [localPartitionType], [localPartitionScheme], [localPartitionColumn])

Details

Create a distributed in-memory table and store it on different nodes based on the specified partitioning scheme. This function can only be executed on a data node or compute node.

The distributed in-memory table is stored on different nodes (data nodes/compute nodes) based on the global partitioning scheme. Each node can only hold one partition. Concurrent reads and writes are supported in a distributed in-memory table. Therefore, it is suitable for scenarios that require distributed computing on in-memory tables.

A distributed in-memory table can be partitioned on 2 different levels:

  • global (between nodes): a distributed in-memory table is partitioned and the partitions are stored on different nodes.

  • local (within node): data within each node can be partitioned again.

If both global and local partitioning is adopted, the partitioning scheme of the distributed in-memory table is equivalent to a composite domain.

Note:

(1) Distributed in-memory tables can only be created in cluster mode.

(2) The number of global partitions must be greater or equal to 2 and less than the total number of data nodes and compute nodes.

(3) Before accessing the distributed in-memory table from other nodes, use function loadDistributedInMemoryTable first to load the table.

(4) Drop the distributed in-memory table with command dropDistributedInMemoryTable.

(5) Transactions are not supported currently for distributed in-memory tables.

Arguments

tableName is a STRING scalar indicating the name of a distributed in-memory table.

colNames is a STRING vector of column names.

colTypes is a vector indicating data types of columns specified by colNames.

globalPartitionType is a required parameter indicating the partition type for a table in a cluster, which only supports RANGE, HASG, VALUE, and LIST.

globalPartitionScheme is a required parameter indicating the partitioning scheme that describes how the partitions are created in a cluster.

The partition type and partitioning scheme are shown as follows:

Partition Type

Partition Type Symbol

Partitioning Scheme

range domain

RANGE

A vector. Any two adjacent elements of the vector define the range for a partition.

hash domain

HASH

A tuple. The first element is the data type of partitioning column. The second element is the number of partitions.

value domain

VALUE

A vector. Each element of the vector defines a partition.

list domain

LIST

A vector. Each element of the vector defines a partition.

globalPartitionColumn is a required parameter. It is a STRING scalar indicating the partitioning column for a table in a cluster.

The above parameters are specified for a cluster. The distributed in-memory table is partitioned and the partitions are stored on each node based on the above partitioning schemes.

The data within each node can be partitioned again through the following parameters.

localPartitionType is an optional parameter indicating the partition type within a node, which only supports RANGE, HASG, VALUE, and LIST.

localPartitionScheme is an optional parameter indicating the partitioning scheme that describes how the partitions are created within a node.

localPartitionColumn is an optional parameter. It is a STRING scalar indicating the partitioning column within a node.

Examples

  • Create a distributed in-memory table

The cluster in this example has two data nodes: node1 and node2. Create a distributed in-memory table on node1. The number of partitions should be less than the total number of data nodes and compute nodes, thus HASH partitioning is recommended.

$ pt = createDistributedInMemoryTable(`dt, `time`id`value, `DATETIME`INT`LONG, HASH, [INT, 2],`id)
  • Load a distributed in-memory table

Load a distributed in-memory table on node2 and insert data into it.

$ time = take(2021.08.20 00:00:00..2021.08.30 00:00:00, 40);
$ id = 0..39;
$ value = rand(100, 40);
$ tmp = table(time, id, value);

$ pt = loadDistributedInMemoryTable(`dt)
$ pt.append!(tmp);

$ select * from pt;
  • Check whether a distributed in-memory table exists

$ objs(true)
  • Delete a distributed in-memory table

$ dropDistributedInMemoryTable(`dt)