asof join

Syntax

aj(leftTable, rightTable, matchingCols, [rightMatchingCols])

Arguments

leftTable and rightTable are the tables to be joined.

matchingCols a string scalar/vector indicating matching columns.

rightMatchingCols a string scalar/vector indicating all the matching columns in rightTable . This optional argument must be specified if at least one of the matching columns has different names in leftTable and rightTable . The joining column names in the result will be the joining column names from the left table.

Details

The asof join function is used in non-synchronous join. It is similar to the left join function witht the following differences:

  • Assume the last matching column is “time”. For a row in the left table with time=t, among the rows in the right table that match all other matching columns, if there is not a record with time=t, select the last row before time=t.

  • If there is only 1 joining column, the asof join function assumes the right table is sorted on the joining column. If there are multiple joining columns, the asof join function assumes the right table is sorted on the last joining column within each group defined by the other joining columns. The right table does not need to be sorted by the other joining columns. If these conditions are not met, unexpected results may be returned. The left table does not need to be sorted.

The data type of the last matching column is usually of temporal types. It can also be of integral types, UUID or IPADDR type.

If either the left table or the right table is a partitioned table, The joining columns except the last one must include all of the partitioning columns of the partitioned table.

Note: If the left table of asof join is not a DFS table, its right table cannot be a DFS table either.

Examples

t1 = table(2015.01.01+(0 31 59 90 120) as date, 1.2 7.8 4.6 5.1 9.5 as value)
t2 = table(2015.02.01+(0 15 89 89) as date, 1..4 as qty);
t1;

date

value

2015.01.01

1.2

2015.02.01

7.8

2015.03.01

4.6

2015.04.01

5.1

2015.05.01

9.5

t2;

date

qty

2015.02.01

1

2015.02.16

2

2015.05.01

3

2015.05.01

4

select * from lsj(t1, t2, `date);

date

value

qty

2015.01.01

1.2

2015.02.01

7.8

1

2015.03.01

4.6

2015.04.01

5.1

2015.05.01

9.5

3

select * from aj(t1, t2, `date);

date

value

t2_date

qty

2015.01.01

1.2

2015.02.01

7.8

2015.02.01

1

2015.03.01

4.6

2015.02.16

2

2015.04.01

5.1

2015.02.16

2

2015.05.01

9.5

2015.05.01

4

select * from aj(t1, t2, `date) where t1.date>=2015.03.01;

date

value

t2_date

qty

2015.03.01

4.6

2015.02.16

2

2015.04.01

5.1

2015.02.16

2

2015.05.01

9.5

2015.05.01

4

A common usage of asof join is to join on the time field to retrieve the latest information. Suppose we have the following 3 tables, where the data have all been sorted on the column minute .

minute = 09:30m 09:32m 09:33m 09:35m
price = 174.1 175.2 174.8 175.2
t1 = table(minute, price)

minute = 09:30m 09:31m 09:33m 09:34m
price = 29.2 28.9 29.3 30.1
t2 = table(minute, price)

minute =09:30m 09:31m 09:34m 09:36m
price = 51.2 52.4 51.9 52.8
t3 = table(minute, price);

t1;

minute

price

09:30m

174.1

09:32m

175.2

09:33m

174.8

09:35m

175.2

t2;

minute

price

09:30m

29.2

09:31m

28.9

09:33m

29.3

09:34m

30.1

t3;

minute

price

09:30m

51.2

09:31m

52.4

09:34m

51.9

09:36m

52.8

t2 = aj(t2, t3, `minute);
t2;

minute

price

t3_minute

t3_price

09:30m

29.2

09:30m

51.2

09:31m

28.9

09:31m

52.4

09:33m

29.3

09:31m

52.4

09:34m

30.1

09:34m

51.9

aj(t1, t2, `minute);

minute

price

t2_minute

t2_price

t3_minute

t3_price

09:30m

174.1

09:30m

29.2

09:30m

51.2

09:32m

175.2

09:31m

28.9

09:31m

52.4

09:33m

174.8

09:33m

29.3

09:31m

52.4

09:35m

175.2

09:34m

30.1

09:34m

51.9

Note that there are no matching records for t2 and t3 at 09:32m, so the prices at 09:31m from table t2 and t3, which are the latest, are applied. The price from table t3 at 09:33m and the prices from table t2 and t3 at 09:35m also use the latest prices to replace the missing records. This is extremely useful when we populate data for certain time points. For example, certain information is only updated weekly or monthly. In designing a daily trading strategy, we can use asof join to populate the daily datasets with less frequently updated information.

The last joining column is of type UUID:

t1 = table(2015.01.01 2015.02.01 2015.03.01 2015.04.01 2015.05.01 as date,  uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87", "5d212a78-cc48-e3b1-4235-b4d91473ee89"]) as uid)
t2 = table(2015.01.15 2015.01.20 2015.01.25 2015.03.01 as date,uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87"]) as uid)
select * from aj(t1, t2, `uid);

date

uid

t2_date

t2_uid

2015.01.01

5d212a78-cc48-e3b1-4235-b4d91473ee81

2015.01.15

5d212a78-cc48-e3b1-4235-b4d91473ee81

2015.02.01

5d212a78-cc48-e3b1-4235-b4d91473ee83

2015.01.20

5d212a78-cc48-e3b1-4235-b4d91473ee83

2015.03.01

5d212a78-cc48-e3b1-4235-b4d91473ee85

2015.01.25

5d212a78-cc48-e3b1-4235-b4d91473ee85

2015.04.01

5d212a78-cc48-e3b1-4235-b4d91473ee87

2015.03.01

5d212a78-cc48-e3b1-4235-b4d91473ee87

2015.05.01

5d212a78-cc48-e3b1-4235-b4d91473ee89

2015.03.01

5d212a78-cc48-e3b1-4235-b4d91473ee87