adaBoostRegressor

Syntax

adaBoostRegressor(ds, yColName, xColNames, [maxFeatures=0], [numTrees=10], [numBins=32], [maxDepth=10], [minImpurityDecrease=0.0], [learningRate=0.1], [loss=’linear’], [randomSeed])

Arguments

ds is the data sources to be trained. It can be generated with function sqlDS.

yColName is a string indicating the name of the dependent variable column in the data sources.

xColNames is a string scalar/vector indicating the names of the feature columns in the data sources.

maxFeatures is an integer or a floating number indicating the number of features to consider when looking for the best split. The default value is 0.

If maxFeatures is a positive integer, then consider maxFeatures features at each split.
If maxFeatures is 0, then sqrt(the number of feature columns) features are considered at each split.
If maxFeatures is a floating number between 0 and 1, then int(maxFeatures * the number of feature columns) features are considered at each split.

numTrees is a positive integer indicating the number of trees. The default value is 10.

numBins is a positive integer indicating the number of bins used when discretizing continuous features. The default value is 32. Increasing numBins allows the algorithm to consider more split candidates and make fine-grained split decisions. However, it also increases computation and communication time.

maxDepth is a positive integer indicating the maximum depth of a tree. The default value is 10.

minImpurityDecrease a node will be split if this split induces a decrease of impurity greater than or equal to this value. The default value is 0.

loss is a string indicating the loss function to use when updating the weights after each boosting iteration. It can take the value of “linear”, “square” or “exponential”. The default value is “linear”.

randomSeed is the seed used by the random number generator.

Details

Fit an AdaBoost regression model. The result is a dictionary with the following keys: minImpurityDecrease, maxDepth, numBins, numTress, maxFeatures, model, modelName, xColNames, learningRate and loss. model is a tuple with the result of the trained trees; modelName is “AdaBoost Regressor”.

The fitted model can be used as an input for function predict.

Examples

Fit an AdaBoost regression model with simulated data:

$ n=10
$ x1 = rand(1.0, n)
$ x2 = rand(1.0, n)
$ b0 = 1
$ b1 = 1
$ b2 = -2
$ err = norm(0, 0.2, n)
$ y = b0 + b1 * x1 + b2 * x2 + err
$ t = table(y, x1, x2)
$ model = adaBoostRegressor(sqlDS(<select * from t>), `y, `x1`x2);

Use the fitted model in forecasting:

$ t1 = table(0 0.4 0.7 1 as x1, 0.9 0.2 0.1 0 as x2)
$ predict(model, t1);

Save the trained model to disk:

$ saveModel(model, "C:/DolphinDB/data/regressionModel.bin")

Load a saved model:

$ loadModel("C:/DolphinDB/data/regressionModel.bin");

Related functions: adaBoostClassifier, randomForestClassifier, randomForestRegressor