randomForestRegressor(ds, yColName, xColNames, [maxFeatures=0], [numTrees=10], [numBins=32], [maxDepth=32], [minImpurityDecrease=0.0], [numJobs=-1], randomSeed)
ds is the data sources to be trained. It can be generated with function sqlDS .
yColName is a string indicating the dependent variable column.
xColNames is a string scalar/vector indicating the names of the feature columns.
maxFeatures is an integer or a floating number indicating the number of features to consider when looking for the best split. The default value is 0.
maxFeaturesis a positive integer, then consider maxFeatures features at each split.
maxFeaturesis 0, then sqrt(the number of feature columns) features are considered at each split.
maxFeaturesis a floating number between 0 and 1, then int(maxFeatures * the number of feature columns) features are considered at each split.
numTrees is a positive integer indicating the number of trees in the random forest. The default value is 10.
numBins is a positive integer indicating the number of bins used when discretizing continuous features. The default value is 32. Increasing numBins allows the algorithm to consider more split candidates and make fine-grained split decisions. However, it also increases computation and communication time.
maxDepth is a positive integer indicating the maximum depth of a tree. The default value is 32.
minImpurityDecrease a node will be split if this split induces a decrease of impurity greater than or equal to this value. The default value is 0.
numJobs is an integer indicating the maximum number of concurrently running jobs if set to a positive number. If set to -1, all CPU threads are used. If set to another negative integer, (the number of all CPU threads + numJobs + 1) threads are used.
randomSeed is the seed used by the random number generator.
Fit a random forest regression model. The result is a dictionary with the following keys: minImpurityDecrease, maxDepth, numBins, numTress, maxFeatures, model, modelName and xColNames. model is a tuple with the result of the trained trees; modelName is “Random Forest Regressor”.
The fitted model can be used as an input for function predict .
Fit a random forest regression model with simulated data:
$ x1 = rand(100.0, 100) $ x2 = rand(100.0, 100) $ b0 = 6 $ b1 = 1 $ b2 = -2 $ err = norm(0, 10, 100) $ y = b0 + b1 * x1 + b2 * x2 + err $ t = table(x1, x2, y) $ model = randomForestRegressor(sqlDS(<select * from t>), `y, `x1`x2) $ yhat=predict(model, t); $ plot(y, yhat, ,SCATTER);
Save the trained model to disk:
$ saveModel(model, "C:/DolphinDB/Data/regressionModel.txt");
Load a saved model: