lasso

Syntax

lasso(ds, yColName, xColNames, [alpha=1.0], [intercept=true], [normalize=false], [maxIter=1000], [tolerance=0.0001], [positive=false], [swColName], [checkInput=true])

Arguments

ds is an in-memory table or a data source usually generated by the sqlDS function.

yColName is a string indicating the column name of the dependent variable in ds.

xColNames is a string scalar/vector indicating the column names of the independent variables in ds.

alpha is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.

intercept is a Boolean value indicating whether to include the intercept in the regression. The default value is true.

normalize is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept =false, this parameter will be ignored. The default value is false.

maxIter is a positive integer indicating the maximum number of iterations. The default value is 1000.

tolerance is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.

positive is a Boolean value indicating whether to force the coefficient estimates to be positive. The default value is false.

swColName is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.

checkInput is a BOOLEAN value. It determines whether to enable validation check for parameters yColName, xColNames, and swColName.

  • If checkInput = true (default), it will check the invalid value for parameters and throw an error if the NULL value exists.

  • If checkInput = false, the invalid value is not checked.

It is recommended to specify checkInput = true. If it is false, it must be ensured that there are no invalid values in the input parameters and no invalid values are generated during intermediate calculations, otherwise the returned model may be inaccurate.

Details

Estimate a Lasso regression that performs L1 regularization.

Minimize the following objective function:

\(\dfrac{1}{2*n_-samples}* \Bigl\lVert{y - Xw} \Bigr\rVert_2^2 + alpha * \Bigl\lVert{w}\Bigr\rVert_1\)

Examples

$ y = [225.720746,-76.195841,63.089878,139.44561,-65.548346,2.037451,22.403987,-0.678415,37.884102,37.308288];
$ x0 = [2.240893,-0.854096,0.400157,1.454274,-0.977278,-0.205158,0.121675,-0.151357,0.333674,0.410599];
$ x1 = [0.978738,0.313068,1.764052,0.144044,1.867558,1.494079,0.761038,0.950088,0.443863,-0.103219];
$ t = table(y, x0, x1);

$ lasso(t, `y, `x0`x1);

If t is a DFS table, then the input should be a data source:

$ lasso(sqlDS(<select * from t>), `y, `x0`x1);