sample

Syntax

sample(partitionCol, size)

Arguments

partitionCol is a partitioning column.

size is a positive floating number or integer.

Details

Must be used in a where clause. Take a random sample of a number of partitions in a partitioned table.

Suppose the database has N partitions. If 0<size<1, then take int(N*size) partitions. If size is a positive integer, then take size partitions.

Examples

$ n=1000000
$ ID=rand(50, n)
$ x=rand(1.0, n)
$ t=table(ID, x)
$ db=database("dfs://rangedb1", RANGE, $ 0 10 20 30 40 50)
$ pt = db.createPartitionedTable(t, `pt, `ID)
$ pt.append!(t)
$ pt=loadTable(db,`pt);

Table pt has 5 partitions. To take a random sample of 2 partitions, we can use either of the following queries:

$ x = select * from pt where sample(ID, 0.4);

$ x = select * from pt where sample(ID, 2);