Example 2. Save data in partitioned database




Partitioning makes large tables more manageable by enabling users to manage and access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. Maintenance operations on subsets of data are also more efficient as these operations target only the required data instead of the entire table.



Example 2a: save a range partitioned database table


We can use ranges on a column of a table to partition the table. The following example saves table t as a partitioned table pt in the distributed database db with a range partition on the column ID.



n=1000000

ID=rand(10, n)

x=rand(1.0, n)

t=table(ID, x);


db=database("dfs://rangedb", RANGE,  0 5 10)

pt = db.createPartitionedTable(t, `pt, `ID)

pt.append!(t);


ID x

-- -


pt=loadTable(db,`pt)

select count(*) from pt;


count

-------

1000000



In the example above, the database db has 2 partitions: [0,5) and [5,10). The saved table pt also has 2 partitions.


Example 2b: save a value partitioned database table


We can make a partition of a table for each element of a column. The following example saves table t as a partitioned table pt in the distributed database db with a value partition.


n=1000000

month=take(2000.01M..2016.12M, n)

x=rand(1.0, n)

t=table(month, x);


db=database("dfs://valuedb", VALUE, 2000.01M..2016.12M)

pt = db.createPartitionedTable(t, `pt, `month)

pt.append!(t);


month x

----- -


pt=loadTable(db,`pt)

select count(x) from pt;


count_x

-------

1000000



The example above defines a database db with 204 partitions. Each partition is a month between January 2000 and December 2016.



Example 2c: save a hierarchically partitioned database table


A database can have 2 or 3 levels of partitions.


n=1000000

ID=rand(100, n)

dates=2017.08.07..2017.08.11

date=rand(dates, n)

x=rand(10.0, n)

t=table(ID, date, x);


dbDate = database(, VALUE, 2017.08.07..2017.08.11)

dbID=database(, RANGE, 0 50 100)

db = database("dfs://hierDB", HIER, [dbDate, dbID])

pt = db.createPartitionedTable(t, `pt, `date`ID)

pt.append!(t);


ID date x

-- ---- -


pt=loadTable(db,`pt)

select count(x) from pt;


count_x

-------

1000000



The example above defines a database db with 2 levels of partitions. The first level of partition is of the value domain and has 5 partitions for 5 days. The second level of partition is of the range domain. Each partition in the first level is further partitioned into 2 partitions based on ID.


Download source code here.