ridge

Syntax

ridge(ds, yColName, xColNames, [alpha=1.0], [intercept=true], [normalize=false], [maxIter=1000], [tolerance=0.0001], [solver=’svd’], [swColName])

Arguments

ds is an in-memory table, or a data source, or a list of data sources.

yColName is a string indicating the column name of the dependent variable in ds.

xColNames is a string scalar/vector indicating the column names of the independent variables in ds.

alpha is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.

intercept is a Boolean value indicating whether to include the intercept in the regression. The default value is true.

normalize is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept=false, this parameter will be ignored. The default value is false.

maxIter is a positive integer indicating the maximum number of iterations. The default value is 1000.

tolerance is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.

solver is a string indicating the solver to use in the computation. It can be either ‘svd’ or ‘cholesky’. It ds is a list of data sources, solver must be ‘cholesky’.

swColName is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.

Details

Linear least squares with l2 regularization.

Minimize the following objective function:

\(\Bigl\lVert{y - Xw} \Bigr\rVert_2^2 + alpha * \Bigl\lVert{w}\Bigr\rVert_2^2\)

Examples

$ y = [225.720746,-76.195841,63.089878,139.44561,-65.548346,2.037451,22.403987,-0.678415,37.884102,37.308288]
$ x0 = [2.240893,-0.854096,0.400157,1.454274,-0.977278,-0.205158,0.121675,-0.151357,0.333674,0.410599]
$ x1 = [0.978738,0.313068,1.764052,0.144044,1.867558,1.494079,0.761038,0.950088,0.443863,-0.103219]
$ t = table(y, x0, x1);

$ ridge(t, `y, `x0`x1);

If t is a DFS table, then the input should be a data source:

$ ridge(sqlDS(<select * from t>), `y, `x0`x1);