pivot by

pivot by is a unique feature in DolphinDB and an extension to the standard SQL. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. It produces a table with select statement and a matrix with exec statement. Please also ses pivot and unpivot .

The following versions contain enhancements for the pivot by statement:

Version

Enhancements

1.20.4

1. If the pivot by clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result.
2. When used in conjunction with the pivot by clause, the select clause can specify multiple columns.
1.30.21


Query performance has been optimized for pivot by statements which satisfy both of the following conditions:
1. None of the following functions are applied to the queried columns: aggregate functions (e.g., sum), order-sensitive functions (e.g., last), row-based functions (e.g., rowMin), filling functions (e.g., ffill).
2. The last column specified in the pivot by clause is a partitioning column.

Examples

Example 1

sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:35:42,09:36:51,09:36:59,09:35:47,09:36:26,09:34:16,09:35:26,09:36:12]
t2 = table(timestamp, sym, qty, price);

t2;

timestamp

sym

qty

price

09:34:07

C

2200

49.6

09:35:42

MS

1900

29.46

09:36:51

MS

2100

29.52

09:36:59

MS

3200

30.02

09:35:47

IBM

6800

174.97

09:36:26

IBM

5400

175.23

09:34:16

C

1300

50.76

09:35:26

C

2500

50.32

09:36:12

C

8800

51.29

select price from t2 pivot by timestamp, sym;

timestamp

C

IBM

MS

09:34:07

49.6

09:34:16

50.76

09:35:26

50.32

09:35:42

29.46

09:35:47

174.97

09:36:12

51.29

09:36:26

175.23

09:36:51

29.52

09:36:59

30.02

a=select last(price) from t2 pivot by timestamp.minute(), sym;

a;

minute_timestamp

C

IBM

MS

09:34m

50.76

09:35m

50.32

174.97

29.46

09:36m

51.29

175.23

30.02

$ typestr a;
TABLE

$ b=exec count(price) from t2 pivot by timestamp.minute(), sym;
$ b;

label

C

IBM

MS

09:34m

2

09:35m

1

1

1

09:36m

1

1

2

$ typestr b;
FAST DOUBLE MATRIX

Example 2. Calculate the basket value of an ETF’s constituents. For simplicity, assume an ETF has 2 constituents AAPL and FB.

symbol=take(`AAPL, 6) join take(`FB, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`AAPL`FB, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select price from ETF pivot by time, symbol;

Time

AAPL

FB

2019.02.27T09:45:01.000000146

103.962

2019.02.27T09:45:01.000000212

64.604

2019.02.27T09:45:01.000000278

103.956

2019.02.27T09:45:01.000000412

103.944

2019.02.27T09:45:01.000000445

103.95

2019.02.27T09:45:01.000000496

103.956

2019.02.27T09:45:01.000000556

64.6

2019.02.27T09:45:01.000000598

64.596

2019.02.27T09:45:01.000000712

64.6

2019.02.27T09:45:01.000000789

103.962

2019.02.27T09:45:01.000000989

64.604

To calculate the basket value of this ETF’s constituents at each timestamp that any of its constituents had a trade, we need to forward fill each stock’s price in the pivot table, and then sum each row’s weighted price. They can be completed in just one SQL statement with “pivot by”.

select rowSum(ffill(price)) from ETF pivot by time, symbol;

Time

rowSum

2019.02.27T09:45:01.000000146

103.962

2019.02.27T09:45:01.000000212

168.566

2019.02.27T09:45:01.000000278

168.56

2019.02.27T09:45:01.000000412

168.548

2019.02.27T09:45:01.000000445

168.554

2019.02.27T09:45:01.000000496

168.56

2019.02.27T09:45:01.000000556

168.556

2019.02.27T09:45:01.000000598

168.552

2019.02.27T09:45:01.000000712

168.556

2019.02.27T09:45:01.000000789

168.562

2019.02.27T09:45:01.000000989

168.566

Example 3. The names of the newly created columns from a pivot by clause will be automatically adjusted.

Before version 1.30.14 / 2.00.2, column names in DolphinDB can only use letters, digits or underscores (_), and must start with letters.

Since version 1.30.14 / 2.00.2, column names generated by pivot by, addColumn can contain special characters or start with digits.

For details, please refer to section Creating tables in Table.

$ date = take(2021.08.01  2021.08.02 2021.08.03, 12)
$ sym = take(["IBM N", "_MSFTN", "3_GOOGS", ""], 12).sort()
$ value = 1..12
$ t=table(date, sym, value)
$ re = select value from t pivot by date, sym

date

NULL

3_GOOGS

IBM N

_MSFTN

2021.08.01

1

4

7

10

2021.08.02

2

5

8

11

2021.08.03

3

6

9

12

$ select  _"NULL" from re

NULL

1

2

3

$ select  _"3_GOOGS" from re

3_GOOGS

4

5

6

In the following example, set removeSpecialCharInColumnName = true. The stock symbols “600300” and “600600” are adjusted to “c600300” and “c600600” as column names.

symbol=take(`600300, 6) join take(`600600, 5)
time=2019.02.27T09:45:01.000000000+[146, 278, 412, 445, 496, 789, 212, 556, 598, 712, 989]
price=173.27 173.26 173.24 173.25 173.26 173.27 161.51 161.50 161.49 161.50 161.51
quotes=table(symbol, time, price)
weights=dict(`600300`600600, 0.6 0.4)
ETF = select symbol, time, price*weights[symbol] as price from quotes;

select last(price) from ETF pivot by time, symbol;

time

C600300

C600600

2019.02.27T09:45:01.000000146

103.962

2019.02.27T09:45:01.000000212

64.604

2019.02.27T09:45:01.000000278

103.956

2019.02.27T09:45:01.000000412

103.944

2019.02.27T09:45:01.000000445

103.95

2019.02.27T09:45:01.000000496

103.956

2019.02.27T09:45:01.000000556

64.6

2019.02.27T09:45:01.000000598

64.596

2019.02.27T09:45:01.000000712

64.6

2019.02.27T09:45:01.000000789

103.962

2019.02.27T09:45:01.000000989

64.604

Example 4. The pivot by clause includes more than 2 columns.

date = 2020.09.21 + 0 0 0 0 1 1 1 1
sym = `MS`MS`GS`GS`MS`MS`GS`GS$SYMBOL
factorNum = 1 2 1 2 1 2 1 2
factorValue = 1.2 -3.4 -2.5 6.3 1.1 -3.2 -2.1 5.6
t = table(date, sym, factorNum, factorValue);

t;

date

sym

factorNum

factorValue

2020.09.21

MS

1

1.2

2020.09.21

MS

2

-3.4

2020.09.21

GS

1

-2.5

2020.09.21

GS

2

6.3

2020.09.22

MS

1

1.1

2020.09.22

MS

2

-3.2

2020.09.22

GS

1

-2.1

2020.09.22

GS

2

5.6

select factorValue from t pivot by date, sym, factorNum;

date

sym

C1

C2

2020.09.21

GS

-2.5

6.3

2020.09.21

MS

1.2

-3.4

2020.09.22

GS

-2.1

5.6

2020.09.22

MS

1.1

-3.2

Example 5. The select clause includes multiple columns.

select factorValue, factorValue>0 as factorSign from t pivot by date, sym, factorNum;

date

sym

factorValue_1

factorValue_2

factorSign_1

factorSign_2

2020.09.21

GS

-2.5

6.3

0

1

2020.09.21

MS

1.2

-3.4

1

0

2020.09.22

GS

-2.1

5.6

0

1

2020.09.22

MS

1.1

-3.2

1

0

Example 6. For each combination of pivot by columns, if duplicate entries exist, only the latest entry will be kept.

$ t = table((`C+string(1..10)) join (`C+string(1..10)) as sym, (2020.10.01+11..20) join (2020.10.01+11..20) as time, 21..40 as val)
$ select val from t pivot by time, sym;

time

C1

C10

C2

C3

C4

C5

C6

C7

C8

C9

2020.10.12

31

2020.10.13

32

2020.10.14

33

2020.10.15

34

2020.10.16

35

2020.10.17

36

2020.10.18

37

2020.10.19

38

2020.10.20

39

2020.10.21

40