glm

Syntax

glm(ds, yColName, xColNames, [family], [link], [tolerance=1e-6], [maxIter=100])

Arguments

ds is the data source to be trained. It can be generated with function sqlDS.

yColName is a string indicating the dependent variable column.

xColNames is a string scalar/vector indicating the names of the indepenent variable columns.

family is a string scalar indicating the type of distribution. It can be gaussian, poisson, gamma, inverseGuassian or binomial. The default value is gaussian.

link is a string scalar indicating the type of the link function. The default value for each family is shown in the table below.

tolerance is a numeric scalar. The iterations stops if the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance. The default value is 0.000001.

maxIter is a positive integer indicating the maximum number of iterations. The default value is 100.

Possible values of link and the dependent variable for each family:

family

link

default link

dependent variable

gaussian

identity, inverse, log

identity

floating

poisson

log, sqrt, identity

log

non-negative integer

gamma

inverse, identity, log

inverse

y>=0

inverseGaussian

nverseOfSquare, inverse, identity, log

inverseOfSquare

y>=0

binomial

logit, probit

logit

y=0,1

Details

Fit a generalized linear model. The result is a dictionary with the following keys: coefficients, link, tolerance, family, xColNames, tolerance, modelName, residualDeviance, iterations and dispersion. coefficients is a table with the coefficient estimate, standard deviation, t value and p value for each coefficient; modelName is “Generalized Linear Model”; iterations is the number of iterations; dispersion is the dispersion coefficient of the model.

Examples

Fit a generalized linear model model with simulated data:

$ x1 = rand(100.0, 100)
$ x2 = rand(100.0, 100)
$ b0 = 6
$ b1 = 1
$ b2 = -2
$ err = norm(0, 10, 100)
$ y = b0 + b1 * x1 + b2 * x2 + err
$ t = table(x1, x2, y)
$ model = glm(sqlDS(<select * from t>), `y, `x1`x2, `gaussian, `identity);
$ model;

coefficients->

beta     stdError tstat      pvalue
-------- -------- ---------- --------
1.027483 0.032631 31.487543  0
-1.99913 0.03517  -56.842186 0
5.260677 2.513633 2.092858   0.038972

link->identity
tolerance->1.0E-6
family->gaussian
xColNames->["x1","x2"]
modelName->Generalized Linear Model
residualDeviance->8873.158697
iterations->5
dispersion->91.475863

Use the fitted model in forecasting:

$ predict(model, t);

Save the fitted model to disk:

$ saveModel(model, "C:/DolphinDB/Data/GLMModel.txt");

Load a saved model:

$ loadModel("C:/DolphinDB/Data/GLMModel.txt");