window join
Syntax
Window join:
wj(leftTable, rightTable, window, aggs, matchingCols, [rightMatchingCols])
Prevailing window join:
pwj(leftTable, rightTable, window, aggs, matchingCols, [rightMatchingCols])
Arguments
leftTable and rightTable are the tables to be joined.
window a pair of integers indicating the left bound and the right bound (both are inclusive) of the window relative to the records in the left table.
aggs metacode or a tuple of metacode indicating an aggregate function or a list of aggregate functions. For details please refer to the section of Metaprogramming.The parameters of these aggregate functions must be numeric columns of the right table.
matchingCols a string scalar/vector indicating matching columns.
rightMatchingCols a string scalar/vector indicating all the matching columns in rightTable . This optional argument must be specified if at least one of the matching columns has different names in leftTable and rightTable . The joining column names in the result will be the joining column names from the left table.
Details
Window join is a generalization of asof join. For each row in leftTable, window join applies aggregate functions on a matching interval of rows in rightTable .
Similar to asof join, if there is only 1 joining column, the window join function assumes the right table is sorted on the joining column. If there are multiple joining columns, the window join function assumes the right table is sorted on the last joining column within each group defined by the other joining columns. The right table does not need to be sorted by the other joining columns. If these conditions are not met, unexpected results may be returned. The left table does not need to be sorted.
If window=w1:w2, for each row in leftTable with the value of the last column in matchCols equal to t, find the rows in rightTable with the value of the last column in matchCols between (t+w1) and (t+w2) conditional on all other columns in matchCols are matched, then apply aggs to the selected rows in rightTable .
The differences between wj
and pwj
are:
If rightTable doesn’t have a matching value for t+w1 (the left boundary of the window),
wj
will treat it as a NULL element in the window, whereaspwj
will include the last value before t+w1 in the window.If rightTable has multiple matching values for t+w1,
wj
will include all of them whilepwj
will only include the last row.
The following aggregate functions in window join are optimized for better performance:
avg, beta, count, corr, covar, first, last, max, med, min, percentile, std, sum, sum2, var wavg, kurtosis, prod, skew, stdp, varp, atImin, atImax
Note:
When specifying atImax
or atImin
in parameter aggs of window join functions, if there are multiple identical extreme values in a window, the last record with extreme value is used for calculation by default.
Examples
t1 = table(`A`A`B as sym, 09:56:06 09:56:07 09:56:06 as time, 10.6 10.7 20.6 as price)
t2 = table(take(`A,10) join take(`B,10) as sym, take(09:56:00+1..10,20) as time, (10+(1..10)\10-0.05) join (20+(1..10)\10-0.05) as bid, (10+(1..10)\10+0.05) join (20+(1..10)\10+0.05) as offer, take(100 300 800 200 600, 20) as volume);
t1;
sym |
time |
price |
---|---|---|
A |
09:56:06 |
10.6 |
A |
09:56:07 |
10.7 |
B |
09:56:06 |
20.6 |
t2;
sym |
time |
bid |
offer |
volume |
---|---|---|---|---|
A |
09:56:01 |
10.05 |
10.15 |
100 |
A |
09:56:02 |
10.15 |
10.25 |
300 |
A |
09:56:03 |
10.25 |
10.35 |
800 |
A |
09:56:04 |
10.35 |
10.45 |
200 |
A |
09:56:05 |
10.45 |
10.55 |
600 |
A |
09:56:06 |
10.55 |
10.65 |
100 |
A |
09:56:07 |
10.65 |
10.75 |
300 |
A |
09:56:08 |
10.75 |
10.85 |
800 |
A |
09:56:09 |
10.85 |
10.95 |
200 |
A |
09:56:10 |
10.95 |
11.05 |
600 |
B |
09:56:01 |
20.05 |
20.15 |
100 |
B |
09:56:02 |
20.15 |
20.25 |
300 |
B |
09:56:03 |
20.25 |
20.35 |
800 |
B |
09:56:04 |
20.35 |
20.45 |
200 |
B |
09:56:05 |
20.45 |
20.55 |
600 |
B |
09:56:06 |
20.55 |
20.65 |
100 |
B |
09:56:07 |
20.65 |
20.75 |
300 |
B |
09:56:08 |
20.75 |
20.85 |
800 |
B |
09:56:09 |
20.85 |
20.95 |
200 |
B |
09:56:10 |
20.95 |
21.05 |
600 |
wj(t1, t2, -5s:0s, <avg(bid)>, `sym`time);
sym |
time |
price |
avg_bid |
---|---|---|---|
A |
09:56:06 |
10.6 |
10.3 |
A |
09:56:07 |
10.7 |
10.4 |
B |
09:56:06 |
20.6 |
20.3 |
wj(t1, t2, -5:-1, <[wavg(bid,volume), wavg(offer,volume)]>, `sym`time);
sym |
time |
price |
wavg_bid |
wavg_offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.295 |
10.395 |
A |
09:56:07 |
10.7 |
10.32 |
10.42 |
B |
09:56:06 |
20.6 |
20.295 |
20.395 |
t3=t2
t3.rename!(`time, `second)
wj(t1, t3, -2:2, <[wavg(bid,volume), wavg(offer,volume)]>, `sym`time, `sym`second);
sym |
time |
price |
wavg_bid |
wavg_offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.595 |
10.695 |
A |
09:56:07 |
10.7 |
10.645 |
10.745 |
B |
09:56:06 |
20.6 |
20.595 |
20.695 |
Window join is a generalization of asof join:
wj(t1, t2, -100:0, <[last(bid) as bid, last(offer) as offer]>, `sym`time);
sym |
time |
price |
bid |
offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.55 |
10.65 |
A |
09:56:07 |
10.7 |
10.65 |
10.75 |
B |
09:56:06 |
20.6 |
20.55 |
20.65 |
select sym, time, price, bid, offer from aj(t1, t2, `sym`time);
sym |
time |
price |
bid |
offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.55 |
10.65 |
A |
09:56:07 |
10.7 |
10.65 |
10.75 |
B |
09:56:06 |
20.6 |
20.55 |
20.65 |
Prevailing window join:
delete from t2 where 09:56:04<=time<=09:56:06;
t2;
sym |
time |
bid |
offer |
volume |
---|---|---|---|---|
A |
09:56:01 |
10.05 |
10.15 |
100 |
A |
09:56:02 |
10.15 |
10.25 |
300 |
A |
09:56:03 |
10.25 |
10.35 |
800 |
A |
09:56:07 |
10.65 |
10.75 |
300 |
A |
09:56:08 |
10.75 |
10.85 |
800 |
A |
09:56:09 |
10.85 |
10.95 |
200 |
A |
09:56:10 |
10.95 |
11.05 |
600 |
B |
09:56:01 |
20.05 |
20.15 |
100 |
B |
09:56:02 |
20.15 |
20.25 |
300 |
B |
09:56:03 |
20.25 |
20.35 |
800 |
B |
09:56:07 |
20.65 |
20.75 |
300 |
B |
09:56:08 |
20.75 |
20.85 |
800 |
B |
09:56:09 |
20.85 |
20.95 |
200 |
B |
09:56:10 |
20.95 |
21.05 |
600 |
wj(t1, t2, -1:1, <[first(bid), avg(offer)]>, `sym`time);
sym |
time |
price |
first_bid |
avg_offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.65 |
10.75 |
A |
09:56:07 |
10.7 |
10.65 |
10.8 |
B |
09:56:06 |
20.6 |
20.65 |
20.75 |
Dynamically pass values to aggs in tuple format.
$ aggs = array(ANY, 3) //Specify aggs as a tuple with metacode
$ aggs[0] = <min(bid)>
$ aggs[1] = <min(offer)>
$ aggs[2] = <min(volume)>
$ wj(t1, t2, -5s:0s, aggs, `sym`time);
sym |
time |
price |
min_bid |
min_offer |
min_volume |
---|---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.05 |
10.15 |
100 |
A |
09:56:07 |
10.7 |
10.15 |
10.25 |
100 |
B |
09:56:06 |
20.6 |
20.05 |
20.15 |
100 |
pwj(t1, t2, -1:1, <[first(bid), avg(offer)]>, `sym`time);
sym |
time |
price |
first_bid |
avg_offer |
---|---|---|---|---|
A |
09:56:06 |
10.6 |
10.25 |
10.55 |
A |
09:56:07 |
10.7 |
10.25 |
10.65 |
B |
09:56:06 |
20.6 |
20.25 |
20.55 |
In the example above, for sym “A” at 09:56:06 in the left table, window join uses the row of sym “A” at 09:56:07 in the right table to calculate first(bid) and avg(offer), whereas prevailing window join uses the rows of sym “A” at 09:56:03 and 09:56:07 in the right table.