createTable

Syntax

createTable(dbHandle, table, tableName, [compressMethods], [sortColumns], [keepDuplicates=ALL])

Arguments

dbHandle is a distributed database handle.

table is a table object. An empty dimension table is created based on the schema of table.

tableName is a string indicating the name of the dimension table to be created.

compressMethods a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods (“lz4” or “delta”). If unspecified, use LZ4 compression method. Please note that the delta compression method can only be used for SHORT, INT, LONG or temporal data types.

sortColumns is a STRING scalar/vector that specifies the columns to sort the ingested data within each partition. The sort columns must be of integral, temporal, string, or symbol type. Note that sortColumns is not necessarily consistent with the partitioning column.

  • If multiple columns are specified for sortColumns, the last column must be a time column. The preceding columns are used as the sort keys and they cannot be of TIME, TIMESTAMP, NANOTIME, or NANOTIMESTAMP type.

  • If only one column is specified for sortColumns, the column is used as the sort key, and it can be a time column or not.

  • It is recommended to specify frequently-queried columns for sortColumns and sort them in the descending order of query frequency, which ensures that frequently-used data is readily available during query processing.

  • The number of sort key entries (which are unique combinations of the values of the sort keys) within each partition may not exceed 1000 for optimal performance. This limitation prevents excessive memory usage and ensures efficient query processing.

keepDuplicates specifies how to deal with records with duplicate sortColumns values. It can have the following values:

  • ALL: keep all records;

  • LAST: only keep the last record;

  • FIRST: only keep the first record.

Details

This function creates an empty dimension table. A dimension table is a non-partitioned table in a DFS database, used to store small datasets with infrequent updates.

After a dimension table is loaded into memory with function loadTable, the memory taken up by the table will not be released by the system automatically. It is not recommended to store more than 2 million records in a dimension table. To clear the cached data, you can manually call command clearCachedDatabase().

Just like partitioned tables, a dimension table can have multiple replicas (determined by the configuration parameter dfsReplicationFactor).

To enable concurrent writes, updates or deletes on a dimension table, set the configuration parameter enableConcurrentDimensionalTableWrite to true.

Examples

Example1

$ db=database("dfs://db1",VALUE,1 2 3)
$ timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
$ sym = `C`MS`MS`MS`IBM`IBM`C`C`C
$ price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
$ qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
$ t = table(timestamp, sym, qty, price);

$ dt=db.createTable(t,`dt).append!(t);
$ select * from dt;

timestamp

sym

qty

price

09:34:07

C

2200

49.6

09:36:42

MS

1900

29.46

09:36:51

MS

2100

29.52

09:36:59

MS

3200

30.02

09:32:47

IBM

6800

174.97

09:35:26

IBM

5400

175.23

09:34:16

C

1300

50.76

09:34:26

C

2500

50.32

09:38:12

C

8800

51.29

Example 2

$ db = database("dfs://demodb", VALUE, 1..10)
$ t=table(take(1, 86400) as id, 2020.01.01T00:00:00 + 0..86399 as timestamp, rand(1..100, 86400) as val)
$ dt = db.createTable(t, "dt", {timestamp:"delta", val:"delta"})
$ dt.append!(t)

Related functions: createPartitionedTable