multiTableRepartitionDS

Syntax

multiTableRepartitionDS(query, [column], [partitionType], [partitionScheme], [local=true])

Arguments

query is metacode of SQL statements or a tuple of metacode of SQL statements.

column is a string indicating a column name in query. Function multiTableRepartitionDS deliminates data sources based on column.

partitionType means the type of partition. It can take the value of VALUE or RANGE.

partitionScheme is a vector indicating the partitioning scheme. For details please refer to Distributed Computing.

local is a Boolean value indicating whether to move the data sources to the local node for computing. The default value is true.

Details

Generate a tuple of data sources from multiple tables with a new partitioning design.

If query is metacode of SQL statements, the parameter column must be specified. partitionType and partitionScheme can be unspecified for a partitioned table with a COMPO domain. In this case, the data sources will be determined based on the original partitionType and partitionScheme of column.

If query is a tuple of metacode of SQL statements, column, partitionType and partitionScheme should be unspecified. The function returns a tuple with the same length as query. Each element of the result is a data source corresponding to a piece of metacode in query.

Examples

$ n=100000
$ date=rand(2019.06.01..2019.06.05,n)
$ sym=rand(`AAPL`MSFT`GOOG,n)
$ price=rand(1000.0,n)
$ t1=table(date,sym,price)
$ db=database("dfs://value",VALUE,2019.06.01..2019.06.05)
$ db.createPartitionedTable(t1,`pt1,`date).append!(t1);

$ n=100000
$ date=rand(2019.06.01..2019.06.05,n)
$ sym=rand(`AAPL`MSFT`GOOG,n)
$ price=rand(1000.0,n)
$ qty=rand(500,n)
$ t2=table(date,sym,price,qty)
$ db1=database("",VALUE,2019.06.01..2019.06.05)
$ db2=database("",VALUE,`AAPL`MSFT`GOOG)
$ db=database("dfs://compo",COMPO,[db1,db2])
$ db.createPartitionedTable(t2,`pt2,`date`sym).append!(t2);

$ pt1=loadTable("dfs://value","pt1")
$ pt2=loadTable("dfs://compo","pt2");

Example 1. Delineate data sources based on the original partitioning scheme. column, partitionType and partitionScheme are unspecified.

$ ds=multiTableRepartitionDS([<select * from pt1>,<select date,sym,price from pt2>]);
(DataSource< select [7] * from pt1 [partition = /value/20190601] >,DataSource< select [7] * from pt1 [partition = /value/20190602] >, ...... ,DataSource< select [7] date,sym,price from pt2 [partition = /compo/20190605/GOOG] >,DataSource< select [7] date,sym,price from pt2 [partition = /compo/20190605/MSFT] >)

Example 2. Delineate data sources based on stock symbols.

$ ds=multiTableRepartitionDS([<select * from pt1>,<select date,sym,price from pt2>],`sym,VALUE,`AAPL`MSFT`GOOG);
(DataSource< select [4] * from pt1 where sym == "AAPL" >,DataSource< select [4] * from pt1 where sym == "MSFT" >,DataSource< select [4] * from pt1 where sym == "GOOG" >,DataSource< select [4] date,sym,price from pt2 where sym == "AAPL" >,DataSource< select [4] date,sym,price from pt2 where sym == "MSFT" >,DataSource< select [4] date,sym,price from pt2 where sym == "GOOG" >)

Example 3. Delineate data sources based on dates.

$ ds=multiTableRepartitionDS([<select * from pt1>,<select date,sym,price from pt2>],`date,RANGE,2019.06.01 2019.06.03 2019.06.05);
(DataSource< select [4] * from pt1 where date >= 2019.06.01,date < 2019.06.03 >,DataSource< select [4] * from pt1 where date >= 2019.06.03,date < 2019.06.05 >,DataSource< select [4] date,sym,price from pt2 where date >= 2019.06.01,date < 2019.06.03 >,DataSource< select [4] date,sym,price from pt2 where date >= 2019.06.03,date < 2019.06.05 >)

Related function: repartitionDS