logisticRegression

Syntax

logisticRegression(ds, yColName, xColNames, [intercept=true], [initTheta], [tolerance=1e-3], [maxIter=500], [regularizationCoeff=1.0])

Arguments

ds is the data source to be trained. It can be generated with function sqlDS.

yColName is a string indicating the category column name.

xColNames is a string scalar/vector indicating the names of independent variables.

intercept is a Boolean scalar indicating whether the regression uses an intercept. The default value is true, which means that a column of 1s is added to the independent variables.

initTheta is a vector indicating the initial values of the parameters when the iterations begin. The default value is a vector of zeroes with the length of xColNames.size()+intercept.

tolerance is a numeric scalar. If the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance, the iterations would stop. The default value is 0.001.

maxIter is a positive integer indicating the maximum number of iterations. The iterations will stop if the number of iterations reaches maxIter. The default value is 500.

regularizationCoeff is a positive number indicating the coefficient of the regularization term. The default value is 1.0.

intercept, initTheta, tolerance, maxIter, regularizationCoeff are optional.

Details

Fit a logistic regression model. The result is a dictionary with the following keys: iterations, modelName, coefficients, tolerance, logLikelihood, xColNames and intercept. iterations is the number of iterations, modelName is “Logistic Regression”, coefficients is a vector of the parameter estimates, logLikelihood is the final value of the log likelihood function.

The fitted model can be used as an input for function predict.

Examples

Fit a logistic regression model with simulated data:

$ t = table(100:0, `cls`x0`x1, [INT,DOUBLE,DOUBLE])
$ cls = take(0, 50)
$ x0 = norm(-1.0, 1.0, 50)
$ x1 = norm(-1.0, 1.0, 50)
$ insert into t values (cls, x0, x1)
$ cls = take(1, 50)
$ x0 = norm(1.0, 1.0, 50)
$ x1 = norm(1.0, 1.0, 50)
$ insert into t values (cls, x0, x1)

$ model = randomForestClassifier(sqlDS(<select * from t>), `cls, `x0`x1, 2);

Use the fitted model in forecasting:

$ predict(model, t);

Save the fitted model to disk:

$ saveModel(model, "C:/DolphinDB/data/logisticModel.txt");

Load a saved model:

$ loadModel("C:/DolphinDB/data/logisticModel.txt");