asof join
Syntax
aj(leftTable, rightTable, matchingCols, [rightMatchingCols])
Arguments
leftTable and rightTable are the tables to be joined.
matchingCols a string scalar/vector indicating matching columns.
rightMatchingCols a string scalar/vector indicating all the matching columns in rightTable . This optional argument must be specified if at least one of the matching columns has different names in leftTable and rightTable . The joining column names in the result will be the joining column names from the left table.
Details
The asof join function is used in non-synchronous join. It is similar to the left join function witht the following differences:
Assume the last matching column is “time”. For a row in the left table with time=t, among the rows in the right table that match all other matching columns, if there is not a record with time=t, select the last row before time=t.
If there is only 1 joining column, the asof join function assumes the right table is sorted on the joining column. If there are multiple joining columns, the asof join function assumes the right table is sorted on the last joining column within each group defined by the other joining columns. The right table does not need to be sorted by the other joining columns. If these conditions are not met, unexpected results may be returned. The left table does not need to be sorted.
The data type of the last matching column is usually of temporal types. It can also be of integral types, UUID or IPADDR type.
If either the left table or the right table is a partitioned table, The joining columns except the last one must include all of the partitioning columns of the partitioned table.
Examples
t1 = table(2015.01.01+(0 31 59 90 120) as date, 1.2 7.8 4.6 5.1 9.5 as value)
t2 = table(2015.02.01+(0 15 89 89) as date, 1..4 as qty);
t1;
date |
value |
---|---|
2015.01.01 |
1.2 |
2015.02.01 |
7.8 |
2015.03.01 |
4.6 |
2015.04.01 |
5.1 |
2015.05.01 |
9.5 |
t2;
date |
qty |
---|---|
2015.02.01 |
1 |
2015.02.16 |
2 |
2015.05.01 |
3 |
2015.05.01 |
4 |
select * from lsj(t1, t2, `date);
date |
value |
qty |
---|---|---|
2015.01.01 |
1.2 |
|
2015.02.01 |
7.8 |
1 |
2015.03.01 |
4.6 |
|
2015.04.01 |
5.1 |
|
2015.05.01 |
9.5 |
3 |
select * from aj(t1, t2, `date);
date |
value |
t2_date |
qty |
---|---|---|---|
2015.01.01 |
1.2 |
||
2015.02.01 |
7.8 |
2015.02.01 |
1 |
2015.03.01 |
4.6 |
2015.02.16 |
2 |
2015.04.01 |
5.1 |
2015.02.16 |
2 |
2015.05.01 |
9.5 |
2015.05.01 |
4 |
select * from aj(t1, t2, `date) where t1.date>=2015.03.01;
date |
value |
t2_date |
qty |
---|---|---|---|
2015.03.01 |
4.6 |
2015.02.16 |
2 |
2015.04.01 |
5.1 |
2015.02.16 |
2 |
2015.05.01 |
9.5 |
2015.05.01 |
4 |
A common usage of asof join is to join on the time field to retrieve the latest information. Suppose we have the following 3 tables, where the data have all been sorted on the column minute .
minute = 09:30m 09:32m 09:33m 09:35m
price = 174.1 175.2 174.8 175.2
t1 = table(minute, price)
minute = 09:30m 09:31m 09:33m 09:34m
price = 29.2 28.9 29.3 30.1
t2 = table(minute, price)
minute =09:30m 09:31m 09:34m 09:36m
price = 51.2 52.4 51.9 52.8
t3 = table(minute, price);
t1;
minute |
price |
---|---|
09:30m |
174.1 |
09:32m |
175.2 |
09:33m |
174.8 |
09:35m |
175.2 |
t2;
minute |
price |
---|---|
09:30m |
29.2 |
09:31m |
28.9 |
09:33m |
29.3 |
09:34m |
30.1 |
t3;
minute |
price |
---|---|
09:30m |
51.2 |
09:31m |
52.4 |
09:34m |
51.9 |
09:36m |
52.8 |
t2 = aj(t2, t3, `minute);
t2;
minute |
price |
t3_minute |
t3_price |
---|---|---|---|
09:30m |
29.2 |
09:30m |
51.2 |
09:31m |
28.9 |
09:31m |
52.4 |
09:33m |
29.3 |
09:31m |
52.4 |
09:34m |
30.1 |
09:34m |
51.9 |
aj(t1, t2, `minute);
minute |
price |
t2_minute |
t2_price |
t3_minute |
t3_price |
---|---|---|---|---|---|
09:30m |
174.1 |
09:30m |
29.2 |
09:30m |
51.2 |
09:32m |
175.2 |
09:31m |
28.9 |
09:31m |
52.4 |
09:33m |
174.8 |
09:33m |
29.3 |
09:31m |
52.4 |
09:35m |
175.2 |
09:34m |
30.1 |
09:34m |
51.9 |
Note that there are no matching records for t2 and t3 at 09:32m, so the prices at 09:31m from table t2 and t3, which are the latest, are applied. The price from table t3 at 09:33m and the prices from table t2 and t3 at 09:35m also use the latest prices to replace the missing records. This is extremely useful when we populate data for certain time points. For example, certain information is only updated weekly or monthly. In designing a daily trading strategy, we can use asof join to populate the daily datasets with less frequently updated information.
The last joining column is of type UUID:
t1 = table(2015.01.01 2015.02.01 2015.03.01 2015.04.01 2015.05.01 as date, uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87", "5d212a78-cc48-e3b1-4235-b4d91473ee89"]) as uid)
t2 = table(2015.01.15 2015.01.20 2015.01.25 2015.03.01 as date,uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87"]) as uid)
select * from aj(t1, t2, `uid);
date |
uid |
t2_date |
t2_uid |
---|---|---|---|
2015.01.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee81 |
2015.01.15 |
5d212a78-cc48-e3b1-4235-b4d91473ee81 |
2015.02.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee83 |
2015.01.20 |
5d212a78-cc48-e3b1-4235-b4d91473ee83 |
2015.03.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee85 |
2015.01.25 |
5d212a78-cc48-e3b1-4235-b4d91473ee85 |
2015.04.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee87 |
2015.03.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee87 |
2015.05.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee89 |
2015.03.01 |
5d212a78-cc48-e3b1-4235-b4d91473ee87 |