Order of Execution
The order of execution of a SQL query in DolphinDB is similar to most other systems. Please pay special attention to the order of execution of csort
keyword and limit
/ top
clause when context by
clause is used.
The order of execution of a SQL query in DolphinDB:
1. from
clause
The FROM clause, and subsequent table joins are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause.
2. on
clause
The on
clause is used to specify the join conditions, i.e., define how the records should be matched.
3. table joiners (including lj, ej, etc.)
When using outer joins, the unmatched records are populated with NULL values in the temporary table generated by the ON clause.
If the data is selected from more than two tables, each table is processed with step 1 to 3.
4. where
conditions
Rows that do not satisfy the where conditions are discarded.
The
where
conditions can only access columns directly from the tables specified in thefrom
clause. Columns in theselect
clause that are not specified in thefrom
clause cannot be referenced, such as columns renamed withas
or computed columns.In the
where
clause, if the conditions are separated by a comma (“,”), the data is filtered by the first condition, then the next condition is applied to the filtered data, and so on; if the conditions are separated by “and”, the data is filtered separately on each condition and the intersection of the filtering results is returned.
5. group by
/ context by
/ pivot by clause
The remaining rows after the where
conditions are applied are then grouped based on common values in the column specified in the group by / context by / pivot by clause
6. csort keyword (only used in conjunction with the context by
clause)
The context by
clause is often used together with time-series functions such as cumsum, mavg, and deltas, etc. The results of these functions are affected by the order of rows within each context by
group. The csort keyword (that is used only with the context by
clause) sorts the rows within each context by
group.
7. having
conditions (only used in conjunction with the group by
/ context by
clause)
If the query has a group by
/ context by
clause, then the having conditions are applied to the grouped rows, discard the grouped rows that don’t satisfy the conditions. Like the where
conditions, aliases are also not accessible from this step.
8. select
clause
Any expressions in the select
clause are finally computed.
9. distinct
clause
The distinct
clause eliminates all the duplicate records and return distinct values.
10. limit
/ top clause (if the context by
clause is used)
If the query uses the context by
clause, the limit
/ top clause applies to each group. If there are n context by
groups and the query uses “limit m" or “top m", then return at most n*`m` rows.
The limit
/ top
clause in this scenario is executed before the order by
clause; in all other scenarios the limit
/ top
clause is executed after the order by
clause.
11. order by
clause
If an order is specified by the order by
clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in the select
clause of the query have been computed, you can reference aliases in this clause.
12. limit
/ top
clause (if there is no context by
clause)
Finally, if the context by
clause is not used, the rows that fall outside the range specified by the limit
/ top
clause are discarded, leaving the final set of rows to be returned from the query.
Please note that if the SQL statement uses the cgroup by
clause, the execution sequence is as follows: first apply the filtering conditions (if any), and then conduct the calculations in the select
clause within the groups determined by the cgroup by
and group by
(if any) columns, then sort the grouping calculation results according to the required order by columns (must belong to the cgroup by
and group by
(if any) columns), and finally calculate the cumulative value of the grouping calculation results. If group by
is used, the cumulative value is calculated within each group by
group.
Examples
t = table(1 1 1 1 1 2 2 2 2 2 as id, 09:30:00+1 3 2 5 4 5 2 4 3 1 as time, 1 2 3 4 5 6 5 4 3 2 as x);
t;
id |
time |
x |
---|---|---|
1 |
09:30:01 |
1 |
1 |
09:30:03 |
2 |
1 |
09:30:02 |
3 |
1 |
09:30:05 |
4 |
1 |
09:30:04 |
5 |
2 |
09:30:05 |
6 |
2 |
09:30:02 |
5 |
2 |
09:30:04 |
4 |
2 |
09:30:03 |
3 |
2 |
09:30:01 |
2 |
select *, deltas(x) from t context by id;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:01 |
1 |
|
1 |
09:30:03 |
2 |
1 |
1 |
09:30:02 |
3 |
1 |
1 |
09:30:05 |
4 |
1 |
1 |
09:30:04 |
5 |
1 |
2 |
09:30:05 |
6 |
|
2 |
09:30:02 |
5 |
-1 |
2 |
09:30:04 |
4 |
-1 |
2 |
09:30:03 |
3 |
-1 |
2 |
09:30:01 |
2 |
-1 |
select *, deltas(x) from t context by id csort time;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:01 |
1 |
|
1 |
09:30:02 |
3 |
2 |
1 |
09:30:03 |
2 |
-1 |
1 |
09:30:04 |
5 |
3 |
1 |
09:30:05 |
4 |
-1 |
2 |
09:30:01 |
2 |
|
2 |
09:30:02 |
5 |
3 |
2 |
09:30:03 |
3 |
-2 |
2 |
09:30:04 |
4 |
1 |
2 |
09:30:05 |
6 |
2 |
The example above shows that with “csort time”, the calculation of deltas(x) is conducted after sorting the rows along column “time” within each group.
select *, deltas(x) from t context by id csort time limit 3;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:01 |
1 |
|
1 |
09:30:02 |
3 |
2 |
1 |
09:30:03 |
2 |
-1 |
2 |
09:30:01 |
2 |
|
2 |
09:30:02 |
5 |
3 |
2 |
09:30:03 |
3 |
-2 |
The example above shows that if the limit clause is used together with the context by
clause, the limit
clause is applied to each group instead of the entire final result.
select *, deltas(x) from t context by id csort time order by id, deltas_x desc;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:04 |
5 |
3 |
1 |
09:30:02 |
3 |
2 |
1 |
09:30:03 |
2 |
-1 |
1 |
09:30:05 |
4 |
-1 |
1 |
09:30:01 |
1 |
|
2 |
09:30:02 |
5 |
3 |
2 |
09:30:05 |
6 |
2 |
2 |
09:30:04 |
4 |
1 |
2 |
09:30:03 |
3 |
-2 |
2 |
09:30:01 |
2 |
The example above shows that as the order by
clause is executed after the select clause, the order by
clause can use the calculation result “deltas_x” in the select
clause.
select *, deltas(x) from t context by id csort time order by id, deltas_x desc limit 3;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:02 |
3 |
2 |
1 |
09:30:03 |
2 |
-1 |
1 |
09:30:01 |
1 |
|
2 |
09:30:02 |
5 |
3 |
2 |
09:30:03 |
3 |
-2 |
2 |
09:30:01 |
2 |
The example above shows that if the query has a context by
clause, the limit
clause restricts the number of rows within each group and is executed before the order by
clause.
select * from t order by id, x desc limit 3;
id |
time |
x |
---|---|---|
1 |
09:30:04 |
5 |
1 |
09:30:05 |
4 |
1 |
09:30:02 |
3 |
The example above shows that if the query does not contain a context by
clause, the limit
clause restricts the number of rows in the final result and is executed after the order by
clause.
select *, deltas(x) from t where x>=3 context by id;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:02 |
3 |
|
1 |
09:30:05 |
4 |
1 |
1 |
09:30:04 |
5 |
1 |
2 |
09:30:05 |
6 |
|
2 |
09:30:02 |
5 |
-1 |
2 |
09:30:04 |
4 |
-1 |
2 |
09:30:03 |
3 |
-1 |
The example above shows that the where
conditions are executed before the select
clause. The calculation of deltas(x) is conducted after executing “where x>=3”. Therefore, column “delta_x” is empty for the first row of each group.
select *, deltas(x) from t where x>=3 context by id having sum(x)<=12;
id |
time |
x |
deltas_x |
---|---|---|---|
1 |
09:30:02 |
3 |
|
1 |
09:30:05 |
4 |
1 |
1 |
09:30:04 |
5 |
1 |
The example above shows that the having
conditions are executed after the where
conditions.