


copied from statistics toolbox - added display line
LASSOGLM Perform lasso or elastic net regularization for a generalized linear model.
[B,STATS] = LASSOGLM(X,Y,DISTR,...) Performs L1-penalized maximum likelihood
fits (lasso) relating the predictors in X to the responses in Y, or
fits subject to joint L1- and L2 constraints (elastic net).
The default is a lasso-style fit, that is, a maximum likelihood fit
subject to a constraint on the L1-norm of the coefficients B.
LASSOGLM accepts all the command line parameters of the LASSO function
and it accepts command line parameters of the GLMFIT function, with
the following exceptions. LASSOGLM also accepts the arguments 'Link'
and 'Offset' of GLMFIT (they can also be lower case 'link' or 'offset').
LASSOGLM It does not accept the argument 'estdisp' of GLMFIT, or the
argument 'constant'. LASSOGLM does not calculate standard errors
or covariances among the coefficients, as GLMFIT does.
LASSOGLM always calculates an Intercept term.
Positional parameters:
X A numeric matrix (dimension, say, NxP)
Y A numeric vector of length N. If DISTR is
'binomial', Y may be a binary vector indicating
success/failure, and the total number of trials is
taken to be 1 for all observations. If DISTR is
'binomial', Y may also be a two column matrix, the
first column containing the number of successes for
each observation, and the second column containing
the total number of trials.
DISTR The distributional family for the non-systematic
variation in the responses. Acceptable values for DISTR are
'normal', 'binomial', 'poisson', 'gamma', and 'inverse gaussian'.
By default, the distribution is fit using the canonical link
corresponding to DISTR. Other link functions may be
specified using the optional parameter 'link'.
Optional input parameters:
'Weights' Observation weights. Must be a vector of non-negative
values, of the same length as columns of X. At least
two values must be positive. (default ones(N,1) or
equivalently (1/N)*ones(N,1)).
'Alpha' Elastic net mixing value, or the relative balance
between L2 and L1 penalty (default 1, range (0,1]).
Alpha=1 ==> lasso, otherwise elastic net.
Alpha near zero ==> nearly ridge regression.
'NumLambda' The number of lambda values to use, if the parameter
'Lambda' is not supplied (default 100). Ignored
if 'Lambda' is supplied. LASSOGLM may return fewer
fits than specified by 'NumLambda' if the deviance
of the fits drops below a threshold percentage of
the null deviance (deviance of the fit without
any predictors X).
'LambdaRatio' Ratio between the minimum value and maximum value of
lambda to generate, if the parameter "Lambda" is not
supplied. Legal range is [0,1). Default is 0.0001.
If 'LambdaRatio' is zero, LASSOGLM will generate its
default sequence of lambda values but replace the
smallest value in this sequence with the value zero.
'LambdaRatio' is ignored if 'Lambda' is supplied.
'Lambda' Lambda values. Will be returned in return argument
STATS in ascending order. The default is to have
LASSOGLM generate a sequence of lambda values, based
on 'NumLambda' and 'LambdaRatio'. LASSOGLM will generate
a sequence, based on the values in X and Y, such that
the largest Lambda value is estimated to be just
sufficient to produce all zero coefficients B.
You may supply a vector of real, non-negative values
of lambda for LASSOGLM to use, in place of its default
sequence. If you supply a value for 'Lambda',
'NumLambda' and 'LambdaRatio' are ignored.
'DFmax' Maximum number of non-zero coefficients in the model.
Can be useful with large numbers of predictors.
Results only for lambda values that satisfy this
degree of sparseness will be returned. Default is
to not limit the number of non-zero coefficients.
'Standardize' Whether to scale X prior to fitting the model
sequence. This affects whether the regularization is
applied to the coefficients on the standardized
scale or the original scale. The results are always
presented on the original data scale. Default is
TRUE, do scale X.
'RelTol' Convergence threshold for coordinate descent algorithm.
The coordinate descent iterations will terminate
when the relative change in the size of the
estimated coefficients B drops below this threshold.
Default: 1e-4. Legal range is (0,1).
'CV' If present, indicates the method used to compute Deviance.
When 'CV' is a positive integer K, LASSOGLM uses K-fold
cross-validation. Set 'CV' to a cross-validation
partition, created using CVPARTITION, to use other
forms of cross-validation. You cannot use a
'Leaveout' partition with LASSOGLM.
When 'CV' is 'resubstitution', LASSOGLM uses X and Y
both to fit the model and to estimate the deviance
of the fitted model, without cross-validation.
The default is 'resubstitution'.
'MCReps' A positive integer indicating the number of Monte-Carlo
repetitions for cross-validation. The default value is 1.
If 'CV' is 'resubstitution' or a cvpartition of type
'resubstitution', 'MCReps' must be 1. If 'CV' is a
cvpartition of type 'holdout', then 'MCReps' must be
greater than one.
'PredictorNames' A cell array of names for the predictor variables,
in the order in which they appear in X.
Default: {}
'Options' A structure that contains options specifying whether to
conduct cross-validation evaluations in parallel, and
options specifying how to use random numbers when computing
cross validation partitions. This argument can be created
by a call to STATSET. CROSSVAL uses the following fields:
'UseParallel'
'UseSubstreams'
'Streams'
For information on these fields see PARALLELSTATS.
NOTE: If supplied, 'Streams' must be of length one.
'Link' The link function to use in place of the canonical link.
The link function defines the relationship f(mu) = x*b
between the mean response mu and the linear combination of
predictors x*b. Specify the link parameter value as one of
- the text strings 'identity', 'log', 'logit', 'probit',
'comploglog', 'reciprocal', 'loglog', or
- an exponent P defining the power link, mu = (x*b)^P
for x*b > 0, or
- a cell array of the form {FL FD FI}, containing three
function handles, created using @, that define the link (FL),
the derivative of the link (FD), and the inverse link (FI).
'Offset' A vector to use as an additional predictor variable, but
with a coefficient value fixed at 1.0. Default is
not to utilize an offset variable.
Return values:
B The fitted coefficients for each model.
B will have dimension PxL, where
P = size(X,2) is the number of predictors, and
L = length(STATS.Lambda).
STATS STATS is a struct that contains information about the
sequence of model fits corresponding to the columns
of B. STATS contains the following fields:
'Intercept' The intercept term for each model. Dimension 1xL.
'Lambda' The sequence of lambda penalties used, in ascending order.
Dimension 1xL.
'Alpha' The elastic net mixing value that was used.
'DF' The number of nonzero coefficients in B for each
value of lambda. Dimension 1xL.
'Deviance' The deviance of the fitted model for each value of
lambda. If cross-validation was performed, the values
for 'Deviance' represent the estimated expected
deviance of the model applied to new data, as
calculated by cross-validation. Otherwise,
'Deviance' is the deviance of the fitted model
applied to the data used to perform the fit.
Dimension 1xL.
If cross-validation was performed, STATS also includes the following
fields:
'SE' The standard error of 'Deviance' for each lambda, as
calculated during cross-validation. Dimension 1xL.
'LambdaMinDeviance' The lambda value with minimum expected deviance, as
calculated during cross-validation. Scalar.
'Lambda1SE' The largest lambda such that 'Deviance' is within
one standard error of the minimum. Scalar.
'IndexMinDeviance' The index of Lambda with value LambdaMinMSE. Scalar.
'Index1SE' The index of Lambda with value Lambda1SE. Scalar.
See also lassoPlot, lasso, ridge, parallelstats, glmfit.


0001 function [B,stats] = lassoglm(x,y,distr,varargin) 0002 % copied from statistics toolbox - added display line 0003 %LASSOGLM Perform lasso or elastic net regularization for a generalized linear model. 0004 % [B,STATS] = LASSOGLM(X,Y,DISTR,...) Performs L1-penalized maximum likelihood 0005 % fits (lasso) relating the predictors in X to the responses in Y, or 0006 % fits subject to joint L1- and L2 constraints (elastic net). 0007 % The default is a lasso-style fit, that is, a maximum likelihood fit 0008 % subject to a constraint on the L1-norm of the coefficients B. 0009 % 0010 % LASSOGLM accepts all the command line parameters of the LASSO function 0011 % and it accepts command line parameters of the GLMFIT function, with 0012 % the following exceptions. LASSOGLM also accepts the arguments 'Link' 0013 % and 'Offset' of GLMFIT (they can also be lower case 'link' or 'offset'). 0014 % LASSOGLM It does not accept the argument 'estdisp' of GLMFIT, or the 0015 % argument 'constant'. LASSOGLM does not calculate standard errors 0016 % or covariances among the coefficients, as GLMFIT does. 0017 % LASSOGLM always calculates an Intercept term. 0018 % 0019 % Positional parameters: 0020 % 0021 % X A numeric matrix (dimension, say, NxP) 0022 % Y A numeric vector of length N. If DISTR is 0023 % 'binomial', Y may be a binary vector indicating 0024 % success/failure, and the total number of trials is 0025 % taken to be 1 for all observations. If DISTR is 0026 % 'binomial', Y may also be a two column matrix, the 0027 % first column containing the number of successes for 0028 % each observation, and the second column containing 0029 % the total number of trials. 0030 % DISTR The distributional family for the non-systematic 0031 % variation in the responses. Acceptable values for DISTR are 0032 % 'normal', 'binomial', 'poisson', 'gamma', and 'inverse gaussian'. 0033 % By default, the distribution is fit using the canonical link 0034 % corresponding to DISTR. Other link functions may be 0035 % specified using the optional parameter 'link'. 0036 % 0037 % Optional input parameters: 0038 % 0039 % 'Weights' Observation weights. Must be a vector of non-negative 0040 % values, of the same length as columns of X. At least 0041 % two values must be positive. (default ones(N,1) or 0042 % equivalently (1/N)*ones(N,1)). 0043 % 'Alpha' Elastic net mixing value, or the relative balance 0044 % between L2 and L1 penalty (default 1, range (0,1]). 0045 % Alpha=1 ==> lasso, otherwise elastic net. 0046 % Alpha near zero ==> nearly ridge regression. 0047 % 'NumLambda' The number of lambda values to use, if the parameter 0048 % 'Lambda' is not supplied (default 100). Ignored 0049 % if 'Lambda' is supplied. LASSOGLM may return fewer 0050 % fits than specified by 'NumLambda' if the deviance 0051 % of the fits drops below a threshold percentage of 0052 % the null deviance (deviance of the fit without 0053 % any predictors X). 0054 % 'LambdaRatio' Ratio between the minimum value and maximum value of 0055 % lambda to generate, if the parameter "Lambda" is not 0056 % supplied. Legal range is [0,1). Default is 0.0001. 0057 % If 'LambdaRatio' is zero, LASSOGLM will generate its 0058 % default sequence of lambda values but replace the 0059 % smallest value in this sequence with the value zero. 0060 % 'LambdaRatio' is ignored if 'Lambda' is supplied. 0061 % 'Lambda' Lambda values. Will be returned in return argument 0062 % STATS in ascending order. The default is to have 0063 % LASSOGLM generate a sequence of lambda values, based 0064 % on 'NumLambda' and 'LambdaRatio'. LASSOGLM will generate 0065 % a sequence, based on the values in X and Y, such that 0066 % the largest Lambda value is estimated to be just 0067 % sufficient to produce all zero coefficients B. 0068 % You may supply a vector of real, non-negative values 0069 % of lambda for LASSOGLM to use, in place of its default 0070 % sequence. If you supply a value for 'Lambda', 0071 % 'NumLambda' and 'LambdaRatio' are ignored. 0072 % 'DFmax' Maximum number of non-zero coefficients in the model. 0073 % Can be useful with large numbers of predictors. 0074 % Results only for lambda values that satisfy this 0075 % degree of sparseness will be returned. Default is 0076 % to not limit the number of non-zero coefficients. 0077 % 'Standardize' Whether to scale X prior to fitting the model 0078 % sequence. This affects whether the regularization is 0079 % applied to the coefficients on the standardized 0080 % scale or the original scale. The results are always 0081 % presented on the original data scale. Default is 0082 % TRUE, do scale X. 0083 % 'RelTol' Convergence threshold for coordinate descent algorithm. 0084 % The coordinate descent iterations will terminate 0085 % when the relative change in the size of the 0086 % estimated coefficients B drops below this threshold. 0087 % Default: 1e-4. Legal range is (0,1). 0088 % 'CV' If present, indicates the method used to compute Deviance. 0089 % When 'CV' is a positive integer K, LASSOGLM uses K-fold 0090 % cross-validation. Set 'CV' to a cross-validation 0091 % partition, created using CVPARTITION, to use other 0092 % forms of cross-validation. You cannot use a 0093 % 'Leaveout' partition with LASSOGLM. 0094 % When 'CV' is 'resubstitution', LASSOGLM uses X and Y 0095 % both to fit the model and to estimate the deviance 0096 % of the fitted model, without cross-validation. 0097 % The default is 'resubstitution'. 0098 % 'MCReps' A positive integer indicating the number of Monte-Carlo 0099 % repetitions for cross-validation. The default value is 1. 0100 % If 'CV' is 'resubstitution' or a cvpartition of type 0101 % 'resubstitution', 'MCReps' must be 1. If 'CV' is a 0102 % cvpartition of type 'holdout', then 'MCReps' must be 0103 % greater than one. 0104 % 'PredictorNames' A cell array of names for the predictor variables, 0105 % in the order in which they appear in X. 0106 % Default: {} 0107 % 'Options' A structure that contains options specifying whether to 0108 % conduct cross-validation evaluations in parallel, and 0109 % options specifying how to use random numbers when computing 0110 % cross validation partitions. This argument can be created 0111 % by a call to STATSET. CROSSVAL uses the following fields: 0112 % 'UseParallel' 0113 % 'UseSubstreams' 0114 % 'Streams' 0115 % For information on these fields see PARALLELSTATS. 0116 % NOTE: If supplied, 'Streams' must be of length one. 0117 % 'Link' The link function to use in place of the canonical link. 0118 % The link function defines the relationship f(mu) = x*b 0119 % between the mean response mu and the linear combination of 0120 % predictors x*b. Specify the link parameter value as one of 0121 % - the text strings 'identity', 'log', 'logit', 'probit', 0122 % 'comploglog', 'reciprocal', 'loglog', or 0123 % - an exponent P defining the power link, mu = (x*b)^P 0124 % for x*b > 0, or 0125 % - a cell array of the form {FL FD FI}, containing three 0126 % function handles, created using @, that define the link (FL), 0127 % the derivative of the link (FD), and the inverse link (FI). 0128 % 'Offset' A vector to use as an additional predictor variable, but 0129 % with a coefficient value fixed at 1.0. Default is 0130 % not to utilize an offset variable. 0131 % 0132 % Return values: 0133 % B The fitted coefficients for each model. 0134 % B will have dimension PxL, where 0135 % P = size(X,2) is the number of predictors, and 0136 % L = length(STATS.Lambda). 0137 % STATS STATS is a struct that contains information about the 0138 % sequence of model fits corresponding to the columns 0139 % of B. STATS contains the following fields: 0140 % 0141 % 'Intercept' The intercept term for each model. Dimension 1xL. 0142 % 'Lambda' The sequence of lambda penalties used, in ascending order. 0143 % Dimension 1xL. 0144 % 'Alpha' The elastic net mixing value that was used. 0145 % 'DF' The number of nonzero coefficients in B for each 0146 % value of lambda. Dimension 1xL. 0147 % 'Deviance' The deviance of the fitted model for each value of 0148 % lambda. If cross-validation was performed, the values 0149 % for 'Deviance' represent the estimated expected 0150 % deviance of the model applied to new data, as 0151 % calculated by cross-validation. Otherwise, 0152 % 'Deviance' is the deviance of the fitted model 0153 % applied to the data used to perform the fit. 0154 % Dimension 1xL. 0155 % 0156 % If cross-validation was performed, STATS also includes the following 0157 % fields: 0158 % 0159 % 'SE' The standard error of 'Deviance' for each lambda, as 0160 % calculated during cross-validation. Dimension 1xL. 0161 % 'LambdaMinDeviance' The lambda value with minimum expected deviance, as 0162 % calculated during cross-validation. Scalar. 0163 % 'Lambda1SE' The largest lambda such that 'Deviance' is within 0164 % one standard error of the minimum. Scalar. 0165 % 'IndexMinDeviance' The index of Lambda with value LambdaMinMSE. Scalar. 0166 % 'Index1SE' The index of Lambda with value Lambda1SE. Scalar. 0167 % 0168 % See also lassoPlot, lasso, ridge, parallelstats, glmfit. 0169 0170 % References: 0171 % [1] Tibshirani, R. (1996) Regression shrinkage and selection 0172 % via the lasso. Journal of the Royal Statistical Society, 0173 % Series B, Vol 58, No. 1, pp. 267-288. 0174 % [2] Zou, H. and T. Hastie. (2005) Regularization and variable 0175 % selection via the elastic net. Journal of the Royal Statistical 0176 % Society, Series B, Vol. 67, No. 2, pp. 301-320. 0177 % [3] Friedman, J., R. Tibshirani, and T. Hastie. (2010) Regularization 0178 % paths for generalized linear models via coordinate descent. 0179 % Journal of Statistical Software, Vol 33, No. 1, 0180 % http://www.jstatsoft.org/v33/i01. 0181 % [4] Hastie, T., R. Tibshirani, and J. Friedman. (2008) The Elements 0182 % of Statistical Learning, 2nd edition, Springer, New York. 0183 % [5] Dobson, A.J. (2002) An Introduction to Generalized Linear 0184 % Models, 2nd edition, Chapman&Hall/CRC Press. 0185 % [6] McCullagh, P., and J.A. Nelder (1989) Generalized Linear 0186 % Models, 2nd edition, Chapman&Hall/CRC Press. 0187 % [7] Collett, D. (2003) Modelling Binary Data, 2nd edition, 0188 % Chapman&Hall/CRC Press. 0189 0190 % Copyright 2011-2014 The MathWorks, Inc. 0191 0192 0193 if nargin < 2 0194 error(message('stats:lassoGlm:TooFewInputs')); 0195 end 0196 0197 if nargin < 3 || isempty(distr), distr = 'normal'; end 0198 0199 paramNames = { 'link' 'offset' 'weights'}; 0200 paramDflts = {'canonical' [] []}; 0201 [link,offset,pwts,~,varargin] = ... 0202 internal.stats.parseArgs(paramNames, paramDflts, varargin{:}); 0203 0204 % Read in the optional parameters pertinent to regularization (eg, lasso) 0205 LRdefault = 1e-4; 0206 pnames = { 'alpha' 'numlambda' 'lambdaratio' 'lambda' ... 0207 'dfmax' 'standardize' 'reltol' 'cv' 'mcreps' ... 0208 'predictornames' 'options' }; 0209 dflts = { 1 100 LRdefault [] ... 0210 [] true 1e-4 'resubstitution' 1 ... 0211 {} []}; 0212 [alpha, nLambda, lambdaRatio, lambda, ... 0213 dfmax, standardize, reltol, cvp, mcreps, predictorNames, parallelOptions] ... 0214 = internal.stats.parseArgs(pnames, dflts, varargin{:}); 0215 0216 if ~isempty(lambda) 0217 userSuppliedLambda = true; 0218 else 0219 userSuppliedLambda = false; 0220 end 0221 0222 % X a real 2D matrix 0223 if ~ismatrix(x) || length(size(x)) ~= 2 || ~isreal(x) 0224 error(message('stats:lassoGlm:XnotaReal2DMatrix')); 0225 end 0226 0227 % We need at least two observations. 0228 if isempty(x) || size(x,1) < 2 0229 error(message('stats:lassoGlm:TooFewObservations')); 0230 end 0231 0232 % Categorical responses 'binomial' 0233 if isa(y,'categorical') 0234 [y, classname] = grp2idx(y); 0235 nc = length(classname); 0236 if nc > 2 0237 error(message('stats:glmfit:TwoLevelCategory')); 0238 end 0239 y(y==1) = 0; 0240 y(y==2) = 1; 0241 end 0242 0243 % Number of Predictors 0244 P = size(x,2); 0245 0246 % Head off potential cruft in the command window. 0247 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned'); 0248 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2)); 0249 0250 % Sanity checking on predictors, responses, weights and offset parameter, 0251 % and removal of NaNs and Infs from same. Also, conversion of the 0252 % two-column form of binomial response to a proportion. 0253 [X, Y, offset, pwts, dataClass, nTrials, binomialTwoColumn] = ... 0254 glmProcessData(x, y, distr, 'off', offset, pwts); 0255 0256 [~,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,isCanonical,dlinkFunCanonical] = ... 0257 glmProcessDistrAndLink(Y,distr,link,'off',nTrials,dataClass); 0258 0259 [X,Y,pwts,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ... 0260 processLassoParameters(X,Y,pwts,alpha,nLambda,lambdaRatio,lambda,dfmax, ... 0261 standardize,reltol,cvp,mcreps,predictorNames); 0262 0263 % Compute the amount of penalty at which all coefficients shrink to zero. 0264 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, pwts, alpha, standardize, ... 0265 distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun); 0266 0267 % If the command line did not provide a sequence of penalty terms, 0268 % generate a sequence. 0269 if isempty(lambda) 0270 lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault); 0271 end 0272 0273 % Also, to help control saturated fits with binomial outcomes, 0274 % also alter the glmfit values for muLims 0275 if strcmp(distr,'binomial') 0276 muLims = [1.0e-5 1.0-1.0e-5]; 0277 end 0278 0279 % Will convert from lambda penalty matched to average log-likelihood 0280 % per observation (which is what lasso uses) to an equivalent 0281 % penalty for total log-likelihood (which is what glmIRLS uses.) 0282 if isempty(pwts) && isscalar(nTrials) 0283 totalWeight = size(X,1); 0284 elseif ~isempty(pwts) && isscalar(nTrials) 0285 totalWeight = sum(pwts); 0286 elseif isempty(pwts) && ~isscalar(nTrials) 0287 totalWeight = sum(nTrials); 0288 else 0289 totalWeight = sum(pwts .* nTrials); 0290 end 0291 0292 % Convert lambda penalty to match total log-likelihood 0293 lambda = lambda * totalWeight; 0294 0295 penalizedFitPartition = @(x,y,offset,pwts,n,wlsfit,b,active,mu,eta,sqrtvarFun) ... 0296 glmIRLSwrapper(x,y,distr,offset,pwts,dataClass,n, ... 0297 sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims,wlsfit,nullDev,reltol); 0298 0299 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) ... 0300 penalizedFitPartition(x,y,offset,pwts',nTrials,wlsfit,b,active,mu,eta,sqrtvarFun); 0301 0302 [B,Intercept,lambda,deviance] = ... 0303 lassoFit(X,Y,pwts,lambda,alpha,dfmax,standardize,reltol,lambdaMax*totalWeight,ever_active, ... 0304 penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept); 0305 0306 % Store the number of non-zero coefficients for each lambda. 0307 df = sum(B~=0,1); 0308 0309 % The struct 'stats' will comprise the second return argument. 0310 % Put place holders for ever-present fields to avoid undefined references 0311 % and to secure the order we want in the struct. 0312 stats = struct(); 0313 stats.Intercept = []; 0314 stats.Lambda = []; 0315 stats.Alpha = alpha; 0316 stats.DF = []; 0317 stats.Deviance = []; 0318 stats.PredictorNames = predictorNames; 0319 0320 % --------------------------------------------------------- 0321 % If requested, use cross-validation to calculate 0322 % Prediction Mean Squared Error for each lambda. 0323 % --------------------------------------------------------- 0324 0325 if ~isequal(cvp,'resubstitution') 0326 % Replace dfmax with P, the number of predictors supplied at the 0327 % command line. dfmax might cause one fold to return empty values, 0328 % because no lambda satisfies the dfmax criteria, while other folds 0329 % return a numeric value. The lambda sequence has already been 0330 % pruned by dfmax, if appropriate, in the call to lassoFit above. 0331 cvfun = @(Xtrain,Ytrain,Xtest,Ytest) lassoFitAndPredict( ... 0332 Xtrain,Ytrain,Xtest,Ytest, ... 0333 lambda,alpha,P,standardize,reltol,ever_active, ... 0334 penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ... 0335 isCanonical, dlinkFunCanonical,devFun,dataClass); 0336 weights = pwts; 0337 if isempty(weights) 0338 weights = nan(size(X,1),1); 0339 end 0340 if isempty(offset) || isequal(offset,0) 0341 offset = nan(size(X,1),1); 0342 end 0343 if binomialTwoColumn 0344 response = [nTrials Y]; 0345 else 0346 response = Y; 0347 end 0348 cvDeviance = crossval(cvfun,[weights(:) offset(:) X],response, ... 0349 'Partition',cvp,'Mcreps',mcreps,'Options',parallelOptions); 0350 % Scale single-fold deviances up to estimate whole data set deviance 0351 cvDeviance = bsxfun(@times,cvDeviance,repmat((size(X,1) ./ cvp.TestSize)', mcreps, 1)); 0352 deviance = mean(cvDeviance); 0353 se = std(cvDeviance) / sqrt(size(cvDeviance,1)); 0354 minDeviance = min(deviance); 0355 minIx = find(deviance == minDeviance,1); 0356 lambdaMin = lambda(minIx); 0357 minplus1 = deviance(minIx) + se(minIx); 0358 seIx = find((deviance(1:minIx) <= minplus1),1,'first'); 0359 if isempty(seIx) 0360 lambdaSE = []; 0361 else 0362 lambdaSE = lambda(seIx); 0363 end 0364 0365 % Deposit cross-validation results in struct for return value. 0366 stats.SE = se; 0367 stats.LambdaMinDeviance = lambdaMin; 0368 stats.Lambda1SE = lambdaSE; 0369 stats.IndexMinDeviance = minIx; 0370 stats.Index1SE = seIx; 0371 0372 % Convert key lambda values determined by cross-validation 0373 % back to match average log-likelihood per observation. 0374 stats.LambdaMinDeviance = stats.LambdaMinDeviance / totalWeight; 0375 stats.Lambda1SE = stats.Lambda1SE / totalWeight; 0376 end 0377 0378 % ------------------------------------------ 0379 % Order results by ascending lambda 0380 % ------------------------------------------ 0381 0382 nLambda = length(lambda); 0383 reverseIndices = nLambda:-1:1; 0384 lambda = lambda(reverseIndices); 0385 lambda = reshape(lambda,1,nLambda); 0386 B = B(:,reverseIndices); 0387 Intercept = Intercept(reverseIndices); 0388 df = df(reverseIndices); 0389 deviance = deviance(reverseIndices); 0390 if ~isequal(cvp,'resubstitution') 0391 stats.SE = stats.SE(reverseIndices); 0392 stats.IndexMinDeviance = nLambda - stats.IndexMinDeviance + 1; 0393 stats.Index1SE = nLambda - stats.Index1SE + 1; 0394 end 0395 0396 stats.Intercept = Intercept; 0397 stats.Lambda = lambda; 0398 stats.DF = df; 0399 stats.Deviance = deviance; 0400 0401 % Convert lambda penalty back to match average deviance per observation. 0402 stats.Lambda = stats.Lambda / totalWeight; 0403 0404 end %-main block 0405 0406 % ------------------------------------------ 0407 % SUBFUNCTIONS 0408 % ------------------------------------------ 0409 0410 % =================================================== 0411 % startingVals() 0412 % =================================================== 0413 0414 function mu = startingVals(distr,y,N) 0415 % Find a starting value for the mean, avoiding boundary values 0416 switch distr 0417 case 'poisson' 0418 mu = y + 0.25; 0419 case 'binomial' 0420 mu = (N .* y + 0.5) ./ (N + 1); 0421 case {'gamma' 'inverse gaussian'} 0422 mu = max(y, eps(class(y))); % somewhat arbitrary 0423 otherwise 0424 mu = y; 0425 end 0426 end %-startingVals 0427 0428 % =================================================== 0429 % diagnoseSeparation() 0430 % =================================================== 0431 0432 function diagnoseSeparation(eta,y,N) 0433 % Compute sample proportions, sorted by increasing fitted value 0434 [x,idx] = sort(eta); 0435 if ~isscalar(N) 0436 N = N(idx); 0437 end 0438 p = y(idx); 0439 if all(p==p(1)) % all sample proportions are the same 0440 return 0441 end 0442 if x(1)==x(end) % all fitted probabilities are the same 0443 return 0444 end 0445 0446 noFront = 0<p(1) && p(1)<1; % no "front" section as defined below 0447 noEnd = 0<p(end) && p(end)<1; % no "end" section as defined below 0448 if p(1)==p(end) || (noFront && noEnd) 0449 % No potential for perfect separation if the ends match or neither 0450 % end is perfect 0451 return 0452 end 0453 0454 % There is at least one observation potentially taking probability 0 or 0455 % 1 at one end or the other with the data sorted by eta. We want to see 0456 % if the data, sorted by eta (x) value, have this form: 0457 % x(1)<=...<=x(A) < x(A+1)=...=x(B-1) < x(B)<=...<=x(n) 0458 % with p(1)=...=p(A)=0 p(B)=...=p(n)=1 0459 % or p(1)=...=p(A)=1 p(B)=...=p(n)=0 0460 % 0461 % This includes the possibilities: 0462 % A+1=B - no middle section 0463 % A=0 - no perfect fit at the front 0464 % B=n+1 - no perfect fit at the end 0465 dx = 100*max(eps(x(1)),eps(x(end))); 0466 n = length(p); 0467 if noFront 0468 A = 0; 0469 else 0470 A = find(p~=p(1),1,'first')-1; 0471 cutoff = x(A+1)-dx; 0472 A = sum(x(1:A)<cutoff); 0473 end 0474 0475 if noEnd 0476 B = n+1; 0477 else 0478 B = find(p~=p(end),1,'last')+1; 0479 cutoff = x(B-1)+dx; 0480 B = (n+1) - sum(x(B:end)>cutoff); 0481 end 0482 0483 if A+1<B-1 0484 % There is a middle region with >1 point, see if x varies there 0485 if x(B-1)-x(A+1)>dx 0486 return 0487 end 0488 end 0489 0490 % We have perfect separation that can be defined by some middle point 0491 if A+1==B 0492 xmid = x(A) + 0.5*(x(B)-x(A)); 0493 else 0494 xmid = x(A+1); 0495 if isscalar(N) 0496 pmid = mean(p(A+1:B-1)); 0497 else 0498 pmid = sum(p(A+1:B-1).*N(A+1:B-1)) / sum(N(A+1:B-1)); 0499 end 0500 end 0501 0502 % Create explanation part for the lower region, if any 0503 if A>=1 0504 explanation = sprintf('\n XB<%g: P=%g',xmid,p(1)); 0505 else 0506 explanation = ''; 0507 end 0508 0509 % Add explanation part for the middle region, if any 0510 if A+1<B 0511 explanation = sprintf('%s\n XB=%g: P=%g',explanation,xmid,pmid); 0512 end 0513 0514 % Add explanation part for the upper region, if any 0515 if B<=n 0516 explanation = sprintf('%s\n XB>%g: P=%g',explanation,xmid,p(end)); 0517 end 0518 0519 warning(message('stats:lassoGlm:PerfectSeparation', explanation)); 0520 end %-diagnoseSeparation() 0521 0522 % =============================================== 0523 % glmProcessData() 0524 % =============================================== 0525 0526 function [x, y, offset, pwts, dataClass, N, binomialTwoColumn] = ... 0527 glmProcessData(x, y, distr, const, offset, pwts) 0528 0529 N = []; % needed only for binomial 0530 binomialTwoColumn = false; 0531 0532 % Convert the two-column form of 'y', if supplied ('binomial' only). 0533 if strcmp(distr,'binomial') 0534 if size(y,2) == 1 0535 % N will get set to 1 below 0536 if any(y < 0 | y > 1) 0537 error(message('stats:lassoGlm:BadDataBinomialFormat')); 0538 end 0539 elseif size(y,2) == 2 0540 binomialTwoColumn = true; 0541 y(y(:,2)==0,2) = NaN; 0542 N = y(:,2); 0543 y = y(:,1) ./ N; 0544 if any(y < 0 | y > 1) 0545 error(message('stats:lassoGlm:BadDataBinomialRange')); 0546 end 0547 else 0548 error(message('stats:lassoGlm:MatrixOrBernoulliRequired')); 0549 end 0550 end 0551 0552 [anybad,~,y,x,offset,pwts,N] = dfswitchyard('statremovenan',y,x,offset,pwts,N); 0553 if anybad > 0 0554 switch anybad 0555 case 2 0556 error(message('stats:lassoGlm:InputSizeMismatchX')) 0557 case 3 0558 error(message('stats:lassoGlm:InputSizeMismatchOffset')) 0559 case 4 0560 error(message('stats:lassoGlm:InputSizeMismatchPWTS')) 0561 end 0562 end 0563 0564 % Extra screening for lassoglm (Infs and zero weights) 0565 okrows = all(isfinite(x),2) & all(isfinite(y),2) & all(isfinite(offset)); 0566 0567 if ~isempty(pwts) 0568 % This screen works on weights prior to stripping NaNs and Infs. 0569 if ~isvector(pwts) || ~isreal(pwts) || size(x,1) ~= length(pwts) || ... 0570 ~all(isfinite(pwts)) || any(pwts<0) 0571 error(message('stats:lassoGlm:InvalidObservationWeights')); 0572 end 0573 okrows = okrows & pwts(:)>0; 0574 pwts = pwts(okrows); 0575 end 0576 0577 % We need at least two observations after stripping NaNs and Infs and zero weights. 0578 if sum(okrows)<2 0579 error(message('stats:lassoGlm:TooFewObservationsAfterNaNs')); 0580 end 0581 0582 % Remove observations with Infs in the predictor or response 0583 % or with zero observation weight. NaNs were already gone. 0584 x = x(okrows,:); 0585 y = y(okrows); 0586 if ~isempty(N) && ~isscalar(N) 0587 N = N(okrows); 0588 end 0589 if ~isempty(offset) 0590 offset = offset(okrows); 0591 end 0592 0593 if isequal(const,'on') 0594 x = [ones(size(x,1),1) x]; 0595 end 0596 dataClass = superiorfloat(x,y); 0597 x = cast(x,dataClass); 0598 y = cast(y,dataClass); 0599 0600 if isempty(offset), offset = 0; end 0601 if isempty(N), N = 1; end 0602 0603 end %-glmProcessData() 0604 0605 % =================================================== 0606 % processLassoParameters() 0607 % =================================================== 0608 0609 function [X,Y,weights,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ... 0610 processLassoParameters(X,Y,weights, alpha, nLambda, lambdaRatio, lambda, dfmax, ... 0611 standardize, reltol, cvp, mcreps, predictorNames) 0612 0613 % === 'Weights' parameter === 0614 if ~isempty(weights) 0615 0616 % Computations expect that weights is a row vector. 0617 weights = weights(:)'; 0618 0619 end 0620 0621 [~,P] = size(X); 0622 0623 % If X has any constant columns, we want to exclude them from the 0624 % coordinate descent calculations. The corresponding coefficients 0625 % will be returned as zero. 0626 constantPredictors = (range(X)==0); 0627 ever_active = ~constantPredictors; 0628 0629 % === 'Alpha' parameter === 0630 0631 % Require 0 < alpha <= 1. 0632 % "0" would correspond to ridge, "1" is lasso. 0633 if ~isscalar(alpha) || ~isreal(alpha) || ~isfinite(alpha) || ... 0634 alpha <= 0 || alpha > 1 0635 error(message('stats:lassoGlm:InvalidAlpha')) 0636 end 0637 0638 % === 'Standardize' option === 0639 0640 % Require a logical value. 0641 if ~isscalar(standardize) || (~islogical(standardize) && standardize~=0 && standardize~=1) 0642 error(message('stats:lassoGlm:InvalidStandardize')) 0643 end 0644 0645 % === 'Lambda' sequence or 'NumLambda' and 'lambdaRatio' === 0646 0647 if ~isempty(lambda) 0648 0649 % Sanity check on user-supplied lambda sequence. Should be non-neg real. 0650 if ~isreal(lambda) || any(lambda < 0) 0651 error(message('stats:lassoGlm:NegativeLambda')); 0652 end 0653 0654 lambda = sort(lambda(:),1,'descend'); 0655 0656 else 0657 0658 % Sanity-check of 'NumLambda', should be positive integer. 0659 if ~isreal(nLambda) || ~isfinite(nLambda) || nLambda < 1 0660 error(message('stats:lassoGlm:InvalidNumLambda')); 0661 else 0662 nLambda = floor(nLambda); 0663 end 0664 0665 % Sanity-checking of LambdaRatio, should be in [0,1). 0666 if ~isreal(lambdaRatio) || lambdaRatio <0 || lambdaRatio >= 1 0667 error(message('stats:lassoGlm:InvalidLambdaRatio')); 0668 end 0669 end 0670 0671 % === 'RelTol' parameter === 0672 % 0673 if ~isscalar(reltol) || ~isreal(reltol) || ~isfinite(reltol) || reltol <= 0 || reltol >= 1 0674 error(message('stats:lassoGlm:InvalidRelTol')); 0675 end 0676 0677 % === 'DFmax' parameter === 0678 % 0679 % DFmax is #non-zero coefficients 0680 % DFmax should map to an integer in [1,P] but we truncate if .gt. P 0681 % 0682 if isempty(dfmax) 0683 dfmax = P; 0684 else 0685 if ~isscalar(dfmax) 0686 error(message('stats:lassoGlm:DFmaxBadType')); 0687 end 0688 try 0689 dfmax = uint32(dfmax); 0690 catch ME 0691 mm = message('stats:lassoGlm:DFmaxBadType'); 0692 throwAsCaller(MException(mm.Identifier,'%s',getString(mm))); 0693 end 0694 if dfmax < 1 0695 error(message('stats:lassoGlm:DFmaxNotAnIndex')); 0696 else 0697 dfmax = min(dfmax,P); 0698 end 0699 end 0700 0701 % === 'Mcreps' parameter === 0702 % 0703 if ~isscalar(mcreps) || ~isreal(mcreps) || ~isfinite(mcreps) || mcreps < 1 0704 error(message('stats:lassoGlm:MCRepsBadType')); 0705 end 0706 mcreps = fix(mcreps); 0707 0708 % === 'CV' parameter === 0709 % 0710 0711 if isnumeric(cvp) && isscalar(cvp) && (cvp==round(cvp)) && (0<cvp) 0712 % cvp is a kfold value. It will be passed as such to crossval. 0713 if (cvp>size(X,1)) 0714 error(message('stats:lassoGlm:InvalidCVforX')); 0715 end 0716 cvp = cvpartition(size(X,1),'Kfold',cvp); 0717 elseif isa(cvp,'cvpartition') 0718 if strcmpi(cvp.Type,'resubstitution') 0719 cvp = 'resubstitution'; 0720 elseif strcmpi(cvp.Type,'leaveout') 0721 error(message('stats:lassoGlm:InvalidCVtype')); 0722 elseif strcmpi(cvp.Type,'holdout') && mcreps<=1 0723 error(message('stats:lassoGlm:InvalidMCReps')); 0724 end 0725 elseif strncmpi(cvp,'resubstitution',length(cvp)) 0726 % This may have been set as the default, or may have been 0727 % provided at the command line. In case it's the latter, we 0728 % expand abbreviations. 0729 cvp = 'resubstitution'; 0730 else 0731 error(message('stats:lassoGlm:InvalidCVtype')); 0732 end 0733 if strcmp(cvp,'resubstitution') && mcreps ~= 1 0734 error(message('stats:lassoGlm:InvalidMCReps')); 0735 end 0736 0737 if isa(cvp,'cvpartition') 0738 if (cvp.N ~= size(X,1)) || (min(cvp.TrainSize) < 2) 0739 % We need partitions that match the total number of observations 0740 % (after stripping NaNs and Infs and zero observation weights), and 0741 % we need training sets with at least 2 usable observations. 0742 error(message('stats:lassoGlm:TooFewObservationsForCrossval')); 0743 end 0744 end 0745 0746 % === 'PredictorNames' parameter === 0747 % 0748 % If PredictorNames is not supplied or is supplied as empty, we just 0749 % leave it that way. Otherwise, confirm that it is a cell array of strings. 0750 % 0751 if ~isempty(predictorNames) 0752 if ~iscellstr(predictorNames) || length(predictorNames(:)) ~= size(X,2) 0753 error(message('stats:lassoGlm:InvalidPredictorNames')); 0754 else 0755 predictorNames = predictorNames(:)'; 0756 end 0757 end 0758 0759 end %-processLassoParameters() 0760 0761 % =================================================== 0762 % glmProcessDistrAndLink() 0763 % =================================================== 0764 0765 function [estdisp,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,... 0766 isCanonical,dlinkFunCanonical] = ... 0767 glmProcessDistrAndLink(y,distr,link,estdisp,N,dataClass) 0768 0769 switch distr 0770 case 'normal' 0771 canonicalLink = 'identity'; 0772 case 'binomial' 0773 canonicalLink = 'logit'; 0774 case 'poisson' 0775 canonicalLink = 'log'; 0776 case 'gamma' 0777 canonicalLink = 'reciprocal'; 0778 case 'inverse gaussian' 0779 canonicalLink = -2; 0780 end 0781 0782 if isequal(link, 'canonical'), link = canonicalLink; end 0783 0784 switch distr 0785 case 'normal' 0786 sqrtvarFun = @(mu) ones(size(mu)); 0787 devFun = @(mu,y) (y - mu).^2; 0788 estdisp = 'on'; 0789 case 'binomial' 0790 sqrtN = sqrt(N); 0791 sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrtN; 0792 devFun = @(mu,y) 2*N.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu))); 0793 case 'poisson' 0794 if any(y < 0) 0795 error(message('stats:lassoGlm:BadDataPoisson')); 0796 end 0797 sqrtvarFun = @(mu) sqrt(mu); 0798 devFun = @(mu,y) 2*(y .* (log((y+(y==0)) ./ mu)) - (y - mu)); 0799 case 'gamma' 0800 if any(y <= 0) 0801 error(message('stats:lassoGlm:BadDataGamma')); 0802 end 0803 sqrtvarFun = @(mu) mu; 0804 devFun = @(mu,y) 2*(-log(y ./ mu) + (y - mu) ./ mu); 0805 estdisp = 'on'; 0806 case 'inverse gaussian' 0807 if any(y <= 0) 0808 error(message('stats:lassoGlm:BadDataInvGamma')); 0809 end 0810 sqrtvarFun = @(mu) mu.^(3/2); 0811 devFun = @(mu,y) ((y - mu)./mu).^2 ./ y; 0812 estdisp = 'on'; 0813 otherwise 0814 error(message('stats:lassoGlm:BadDistribution')); 0815 end 0816 0817 % Instantiate functions for one of the canned links, or validate a 0818 % user-defined link specification. 0819 [linkFun,dlinkFun,ilinkFun] = dfswitchyard('stattestlink',link,dataClass); 0820 0821 % Initialize mu and eta from y. 0822 mu = startingVals(distr,y,N); 0823 eta = linkFun(mu); 0824 0825 % Enforce limits on mu to guard against an inverse link that doesn't map into 0826 % the support of the distribution. 0827 switch distr 0828 case 'binomial' 0829 % mu is a probability, so order one is the natural scale, and eps is a 0830 % reasonable lower limit on that scale (plus it's symmetric). 0831 muLims = [eps(dataClass) 1-eps(dataClass)]; 0832 case {'poisson' 'gamma' 'inverse gaussian'} 0833 % Here we don't know the natural scale for mu, so make the lower limit 0834 % small. This choice keeps mu^4 from underflowing. No upper limit. 0835 muLims = realmin(dataClass).^.25; 0836 otherwise 0837 muLims = []; 0838 end 0839 0840 % These two quantities (isCanonical, dlinkFunCanonical) are not used by 0841 % glmfit but they are needed for calculation of lambdaMax in lassoglm. 0842 0843 isCanonical = isequal(link, canonicalLink); 0844 [~, dlinkFunCanonical] = dfswitchyard('stattestlink', canonicalLink, dataClass); 0845 0846 end %-glmProcessDistrAndLink() 0847 0848 % =================================================== 0849 % glmIRLS() 0850 % =================================================== 0851 0852 function [b,mu,eta,varargout] = glmIRLS(x,y,distr,offset,pwts,dataClass,N, ... 0853 sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ... 0854 wlsfit,nullDev,devFun,reltol) 0855 0856 wsIterationLimit = warning('off','stats:lassoGlm:IterationLimit'); 0857 wsPerfectSeparation = warning('off','stats:lassoGlm:PerfectSeparation'); 0858 wsBadScaling = warning('off','stats:lassoGlm:BadScaling'); 0859 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit)); 0860 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation)); 0861 cleanupBadScaling = onCleanup(@() warning(wsBadScaling)); 0862 0863 if isempty(pwts) 0864 pwts = 1; 0865 end 0866 0867 % Set up for iterations 0868 iter = 0; 0869 iterLim = 100; 0870 warned = false; 0871 seps = sqrt(eps); 0872 0873 % Match the convergence tolerance for the IRLS algorithm to the 0874 % command line parameter 'RelTol'. 0875 convcrit = max(1e-6,2*reltol); 0876 0877 eta = linkFun(mu); 0878 0879 while iter <= iterLim 0880 iter = iter+1; 0881 0882 % Compute adjusted dependent variable for least squares fit 0883 deta = dlinkFun(mu); 0884 z = eta + (y - mu) .* deta; 0885 0886 % Compute IRLS weights the inverse of the variance function 0887 sqrtw = sqrt(pwts) ./ (abs(deta) .* sqrtvarFun(mu)); 0888 0889 % If the weights have an enormous range, we won't be able to do IRLS very 0890 % well. The prior weights may be bad, or the fitted mu's may have too 0891 % wide a range, which is probably because the data do as well, or because 0892 % the link function is trying to go outside the distribution's support. 0893 wtol = max(sqrtw)*eps(dataClass)^(2/3); 0894 t = (sqrtw < wtol); 0895 if any(t) 0896 t = t & (sqrtw ~= 0); 0897 if any(t) 0898 sqrtw(t) = wtol; 0899 if ~warned 0900 warning(message('stats:lassoGlm:BadScaling')); 0901 end 0902 warned = true; 0903 end 0904 end 0905 0906 b_old = b; 0907 [b,active] = wlsfit(z - offset, x, sqrtw.^2, b, active); 0908 0909 % Form current linear predictor, including offset 0910 eta = offset + x * b; 0911 0912 % Compute predicted mean using inverse link function 0913 mu = ilinkFun(eta); 0914 0915 % Force mean in bounds, in case the link function is a wacky choice 0916 switch distr 0917 case 'binomial' 0918 if any(mu < muLims(1) | muLims(2) < mu) 0919 mu = max(min(mu,muLims(2)),muLims(1)); 0920 end 0921 case {'poisson' 'gamma' 'inverse gaussian'} 0922 if any(mu < muLims(1)) 0923 mu = max(mu,muLims(1)); 0924 end 0925 end 0926 0927 % Check stopping conditions 0928 % Convergence of the coefficients 0929 if (~any(abs(b-b_old) > convcrit * max(seps, abs(b_old)))) 0930 break; 0931 end 0932 % Proportion of explained deviance explained 0933 if sum(devFun(mu,y)) < (1.0e-3 * nullDev) 0934 break; 0935 end 0936 0937 end 0938 0939 if iter > iterLim 0940 warning(message('stats:lassoGlm:IterationLimit')); 0941 end 0942 0943 if iter>iterLim && isequal(distr,'binomial') 0944 diagnoseSeparation(eta,y,N); 0945 end 0946 0947 varargout{1} = active; 0948 0949 end %-glmIRLS 0950 0951 % =================================================== 0952 % glmIRLSwrapper() 0953 % =================================================== 0954 0955 function [B,active,varargout] = glmIRLSwrapper(X,Y,distr,offset,pwts,dataClass,N, ... 0956 sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims, ... 0957 wlsfit,nullDev,reltol) 0958 % glmIRLSwrapper is a utility function for regularized GLM. It is called by 0959 % the regularization framework (lassoFit()), and adapts to the conventions 0960 % and expectations of the IRLS implementation for GLM. 0961 % 0962 % Lasso always returns an intercept, but lassoFit passes in the predictor 0963 % matrix 'X' without a column of ones. Unlike OLS, GLM cannot calve off 0964 % estimation of the predictor variable coefficients and calculate the 0965 % intercept externally, post-fit. This is because each IRLS step 0966 % computes a local linear predictor (eta), and this requires an 0967 % intercept contribution. Therefore, prepend a column of ones before 0968 % passing to glmIRLS. The variable B returned by glmIRLS will include 0969 % the intercept term as the first element of B. 0970 X = [ones(size(X,1),1) X]; 0971 0972 % glmIRLS assumes pwts=1 if no observation weights were supplied, 0973 % rather than empty, which is what lassoFit will feed in. 0974 if isempty(pwts), pwts=1; end 0975 0976 [B,mu,eta,active] = glmIRLS(X,Y,distr,offset,pwts,dataClass,N, ... 0977 sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ... 0978 wlsfit,nullDev,devFun,reltol); 0979 0980 deviance = sum(pwts.* devFun(mu,Y)); 0981 0982 0983 % Pull the intercept estimate out of B and return separately. 0984 Intercept = B(1); 0985 B = B(2:end); 0986 0987 extras.Intercept = Intercept; 0988 extras.Deviance = deviance; 0989 varargout{1} = extras; 0990 varargout{2} = mu; 0991 varargout{3} = eta; 0992 0993 end %-glmIRLSwrapper 0994 0995 % =================================================== 0996 % lassoFitAndPredict() 0997 % =================================================== 0998 0999 function dev = lassoFitAndPredict(Xtrain,Ytrain,Xtest,Ytest, ... 1000 lambda,alpha,dfmax,standardize,reltol,ever_active, ... 1001 penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ... 1002 isCanonical, dlinkFunCanonical,devFun,dataClass) 1003 % 1004 % This function sets up the conditions for lasso fit and prediction from 1005 % within crossvalidation. It defines quantities as necessary for the 1006 % current fold. 1007 1008 trainWeights = Xtrain(:,1); 1009 % To conform with crossval syntax, weights are prepended to the predictor 1010 % matrix. Empty weights are temporarily converted to a vector of NaNs. 1011 % This clause restores them to empty. Same occurs below for test weights. 1012 if any(isnan(trainWeights)) 1013 trainWeights = []; 1014 end 1015 trainOffset = Xtrain(:,2); 1016 if any(isnan(trainOffset)) 1017 trainOffset = 0; 1018 end 1019 1020 Xtrain = Xtrain(:,3:end); 1021 if size(Ytrain,2) == 2 1022 trainN = Ytrain(:,1); 1023 Ytrain = Ytrain(:,2); 1024 else 1025 trainN = 1; 1026 end 1027 1028 % These quantities depend on the particular set of responses used for the fit. 1029 % Within crossval, we work with a subset of the original data. Therefore, 1030 % these quantities must be redefined on each invocation. 1031 mu = startingVals(distr,Ytrain,trainN); 1032 eta = linkFun(mu); 1033 if isequal(distr,'binomial') 1034 sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrt(trainN); 1035 devFun = @(mu,y) 2*trainN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu))); 1036 end 1037 1038 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) penalizedFitPartition(x,y, ... 1039 trainOffset,trainWeights,trainN,wlsfit,b,active,mu,eta,sqrtvarFun); 1040 1041 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(Xtrain, Ytrain, trainWeights, ... 1042 alpha, standardize, distr, link, dlinkFun, trainOffset, isCanonical, dlinkFunCanonical, devFun); 1043 1044 % Will convert from lambda penalty matched to average log-likelihood 1045 % per observation (which is what lasso uses) to an equivalent 1046 % penalty for total log-likelihood (which is what glmIRLS uses.) 1047 if isempty(trainWeights) && isscalar(trainN) 1048 totalWeight = size(Xtrain,1); 1049 elseif ~isempty(trainWeights) && isscalar(trainN) 1050 totalWeight = sum(trainWeights); 1051 elseif isempty(trainWeights) && ~isscalar(trainN) 1052 totalWeight = sum(trainN); 1053 else 1054 totalWeight = sum(trainWeights .* trainN); 1055 end 1056 1057 lambdaMax = lambdaMax * totalWeight; 1058 1059 [B,Intercept] = lassoFit(Xtrain,Ytrain, ... 1060 trainWeights,lambda,alpha,dfmax,standardize,reltol, ... 1061 lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,true,nullDev,nullIntercept); 1062 Bplus = [Intercept; B]; 1063 1064 testWeights = Xtest(:,1); 1065 if any(isnan(testWeights)) 1066 testWeights = ones(size(Xtest,1),1); 1067 end 1068 testOffset = Xtest(:,2); 1069 if any(isnan(testOffset)) 1070 testOffset = 0; 1071 end 1072 Xtest = Xtest(:,3:end); 1073 if size(Ytest,2) == 2 1074 testN = Ytest(:,1); 1075 Ytest = Ytest(:,2); 1076 else 1077 testN = 1; 1078 end 1079 1080 % For binomial with two-column form of response, the deviance function 1081 % depends on the number of trials for each observation, which is provided 1082 % in testN. Within crossval, the test set is a subset of the original data. 1083 % Therefore, the deviance function needs to be redefined for each test set. 1084 if isequal(distr,'binomial') 1085 devFun = @(mu,y) 2*testN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu))); 1086 end 1087 1088 numFits = size(Bplus,2); 1089 dev = zeros(1,numFits); 1090 for i=1:numFits 1091 if ~isequal(testOffset,0) 1092 mu = glmval(Bplus(:,i), Xtest, link, 'Offset',testOffset); 1093 else 1094 mu = glmval(Bplus(:,i), Xtest, link); 1095 end 1096 di = devFun(mu,Ytest); 1097 dev(i) = sum(testWeights' * di); 1098 end 1099 1100 end %-lassoFitAndPredict 1101 1102 % =================================================== 1103 % lassoFit() 1104 % =================================================== 1105 1106 function [B,Intercept,lambda,varargout] = ... 1107 lassoFit(X,Y,weights,lambda,alpha,dfmax,standardize,reltol, ... 1108 lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept) 1109 % 1110 % ------------------------------------------------------ 1111 % Perform model fit for each lambda and the given alpha 1112 % ------------------------------------------------------ 1113 1114 regressionType = 'GLM'; 1115 1116 [~,P] = size(X); 1117 nLambda = length(lambda); 1118 1119 % If X has any constant columns, we want to exclude them from the 1120 % coordinate descent calculations. The corresponding coefficients 1121 % will be returned as zero. 1122 constantPredictors = (range(X)==0); 1123 ever_active = ever_active & ~constantPredictors; 1124 1125 % === weights and standardization === 1126 % 1127 observationWeights = ~isempty(weights); 1128 if ~isempty(weights) 1129 observationWeights = true; 1130 weights = weights(:)'; 1131 % Normalize weights up front. 1132 weights = weights / sum(weights); 1133 end 1134 1135 if standardize 1136 if ~observationWeights 1137 % Center and scale 1138 [X0,muX,sigmaX] = zscore(X,1); 1139 % Avoid divide by zero with constant predictors 1140 sigmaX(constantPredictors) = 1; 1141 else 1142 % Weighted center and scale 1143 muX = weights*X; 1144 X0 = bsxfun(@minus,X,muX); 1145 sigmaX = sqrt( weights*(X0.^2) ); 1146 % Avoid divide by zero with constant predictors 1147 sigmaX(constantPredictors) = 1; 1148 X0 = bsxfun(@rdivide, X0, sigmaX); 1149 end 1150 else 1151 switch regressionType 1152 case 'OLS' 1153 if ~observationWeights 1154 % Center 1155 muX = mean(X,1); 1156 X0 = bsxfun(@minus,X,muX); 1157 sigmaX = 1; 1158 else 1159 % Weighted center 1160 muX = weights*X; 1161 X0 = bsxfun(@minus,X,muX); 1162 sigmaX = 1; 1163 end 1164 case 'GLM' 1165 X0 = X; 1166 % For GLM don't center until we get inside the IRLS 1167 sigmaX = 1; 1168 muX = zeros(1,size(X,2)); 1169 end 1170 end 1171 1172 % For OLS, Y is centered, for GLM it is not. 1173 switch regressionType 1174 case 'OLS' 1175 if ~observationWeights 1176 muY = mean(Y); 1177 else 1178 muY = weights*Y; 1179 end 1180 Y0 = bsxfun(@minus,Y,muY); 1181 case 'GLM' 1182 Y0 = Y; 1183 end 1184 1185 % Preallocate the returned matrix of coefficients, B, 1186 % and other variables sized to nLambda. 1187 1188 B = zeros(P,nLambda); 1189 1190 b = zeros(P,1,dataClass); 1191 1192 if nLambda > 0 1193 Extras(nLambda) = struct('Intercept', nullIntercept, 'Deviance', nullDev); 1194 for i=1:nLambda-1, Extras(i) = Extras(nLambda); end 1195 intercept = nullIntercept; 1196 end 1197 1198 active = false(1,P); 1199 1200 for i = 1:nLambda 1201 1202 lam = lambda(i); 1203 disp (['iteration ', num2str(i), ' out of ' , num2str(nLambda)]); 1204 if lam >= lambdaMax 1205 continue; 1206 end 1207 1208 % This anonymous function adapts the conventions of the IRLS algorithm 1209 % for glm fit to the coordinate descent algorithm for penalized WLS, 1210 % which is used within the IRLS algorithm. 1211 wlsfit = @(x,y,weights,b,active) glmPenalizedWlsWrapper(y,x,b,active,weights,lam, ... 1212 alpha,reltol,ever_active); 1213 1214 [b,active,extras,mu,eta] = penalizedFit(X0,Y0,wlsfit,[intercept;b],active,mu,eta); 1215 1216 B(:,i) = b; 1217 1218 Extras(i) = extras; 1219 1220 % Halt if maximum model size ('DFmax') has been met or exceeded. 1221 if sum(active) > dfmax 1222 % truncate B and lambda output arguments 1223 lambda = lambda(1:(i-1)); 1224 B = B(:,1:(i-1)); 1225 Extras = Extras(:,1:(i-1)); 1226 break 1227 end 1228 1229 % Halt if we have exceeded a threshold on the percent of 1230 % null deviance left unexplained. 1231 if ~(userSuppliedLambda || isempty(nullDev)) 1232 if extras.Deviance < 1.0e-3 * nullDev 1233 lambda = lambda(1:i); 1234 B = B(:,1:i); 1235 Extras = Extras(:,1:i); 1236 break 1237 end 1238 end 1239 1240 end % of lambda sequence 1241 1242 % ------------------------------------------ 1243 % Unwind the centering and scaling (if any) 1244 % ------------------------------------------ 1245 1246 B = bsxfun(@rdivide, B, sigmaX'); 1247 B(~ever_active,:) = 0; 1248 1249 switch regressionType 1250 case 'OLS' 1251 Intercept = muY-muX*B; 1252 case 'GLM' 1253 Intercept = zeros(1,length(lambda)); 1254 for i=1:length(lambda) 1255 Intercept(i) = Extras(i).Intercept; 1256 end 1257 if isempty(lambda) 1258 Intercept = []; 1259 else 1260 Intercept = Intercept - muX*B; 1261 end 1262 end 1263 1264 % ------------------------------------------ 1265 % Calculate Mean Prediction Squared Error (or other GoF) 1266 % ------------------------------------------ 1267 1268 switch regressionType 1269 case 'OLS' 1270 Intercept = muY-muX*B; 1271 BwithI = [Intercept; B]; 1272 fits = [ones(size(X,1),1) X]*BwithI; 1273 residuals = bsxfun(@minus, Y, fits); 1274 if ~observationWeights 1275 mspe = mean(residuals.^2); 1276 else 1277 % This line relies on the weights having been normalized. 1278 mspe = weights * (residuals.^2); 1279 end 1280 varargout{1} = mspe; 1281 case 'GLM' 1282 deviance = zeros(1,length(lambda)); 1283 for i=1:length(lambda) 1284 deviance(i) = Extras(i).Deviance; 1285 end 1286 if isempty(lambda) 1287 deviance = []; 1288 end 1289 varargout{1} = deviance; 1290 end 1291 1292 end %-lassoFit 1293 1294 % =================================================== 1295 % thresholdScreen() 1296 % =================================================== 1297 1298 function potentially_active = thresholdScreen(X0, wX0, Y0, ... 1299 b, active, threshold) 1300 r = Y0 - X0(:,active)*b(active,:); 1301 % We don't need the (b.*wX2)' term that one might expect, because it 1302 % is zero for the inactive predictors. 1303 potentially_active = abs(r' *wX0) > threshold; 1304 end %-thresholdScreen 1305 1306 % =================================================== 1307 % cdescentCycleNewCandidates() 1308 % =================================================== 1309 1310 function [b,active,wX2,wX2calculated,shrinkFactor] = ... 1311 cdescentCycleNewCandidates(X0, weights, wX0, wX2, wX2calculated, Y0, ... 1312 b, active, shrinkFactor, threshold, candidates) 1313 % 1314 r = Y0 - X0(:,active)*b(active,:); 1315 bold = b; 1316 1317 for j=find(candidates); 1318 % Regress j-th partial residuals on j-th predictor 1319 bj = wX0(:,j)' * r; 1320 1321 margin = abs(bj) - threshold; 1322 1323 % Soft thresholding 1324 if margin > 0 1325 if ~wX2calculated(j) 1326 wX2(j) = weights * X0(:,j).^2; 1327 wX2calculated(j) = true; 1328 shrinkFactor(j) = wX2(j) + shrinkFactor(j); 1329 end 1330 1331 b(j) = sign(bj) .* margin ./ shrinkFactor(j); 1332 1333 active(j) = true; 1334 end 1335 1336 r = r - X0(:,j)*(b(j)-bold(j)); 1337 end 1338 1339 end %-cdescentCycleNewCandidates 1340 1341 % =================================================== 1342 % cdescentCycleNoRecalc() 1343 % =================================================== 1344 1345 function [b,active] = ... 1346 cdescentCycleNoRecalc(X0, wX0, wX2, Y0, b, active, shrinkFactor, threshold) 1347 % 1348 r = Y0 - X0(:,active)*b(active,:); 1349 bwX2 = b.*wX2; 1350 bold = b; 1351 1352 for j=find(active); 1353 % Regress j-th partial residuals on j-th predictor 1354 bj = wX0(:,j)' * r + bwX2(j); 1355 1356 margin = abs(bj) - threshold; 1357 1358 % Soft thresholding 1359 if margin > 0 1360 b(j) = sign(bj) .* margin ./ shrinkFactor(j); 1361 else 1362 b(j) = 0; 1363 active(j) = false; 1364 end 1365 1366 r = r - X0(:,j)*(b(j)-bold(j)); 1367 end 1368 1369 end %-cdescentCycleNoRecalc 1370 1371 % =================================================== 1372 % penalizedWls() 1373 % =================================================== 1374 1375 function [b,varargout] = ... 1376 penalizedWls(X,Y,b,active,weights,lambda,alpha,reltol) 1377 1378 weights = weights(:)'; 1379 1380 [~,P] = size(X); 1381 1382 wX = bsxfun(@times,X,weights'); 1383 1384 wX2 = zeros(P,1); 1385 wX2(active) = (weights * X(:,active).^2)'; 1386 wX2calculated = active; 1387 1388 threshold = lambda * alpha; 1389 1390 shrinkFactor = wX2 + lambda * (1-alpha); 1391 1392 % Iterative coordinate descent until converged 1393 while true 1394 1395 bold = b; 1396 old_active = active; 1397 1398 [b,active] = cdescentCycleNoRecalc(X,wX,wX2,Y, b,active,shrinkFactor,threshold); 1399 1400 if ~any( abs(b(old_active) - bold(old_active)) > reltol * max(1.0,abs(bold(old_active))) ) 1401 % Cycling over the active set converged. 1402 % Do one full pass through the predictors. 1403 % If there is no predictor added to the active set, we're done. 1404 % Otherwise, resume the coordinate descent iterations. 1405 bold = b; 1406 potentially_active = thresholdScreen(X,wX,Y,b,active,threshold); 1407 new_candidates = potentially_active & ~active; 1408 if any(new_candidates) 1409 [b,new_active,wX2,wX2calculated,shrinkFactor] = ... 1410 cdescentCycleNewCandidates(X,weights,wX,wX2,wX2calculated,Y, ... 1411 b,active,shrinkFactor,threshold,new_candidates); 1412 else 1413 new_active = active; 1414 end 1415 1416 if isequal(new_active, active) 1417 break 1418 else 1419 super_active = active | new_active; 1420 if ~any( abs(b(super_active) - bold(super_active)) > reltol * max(1.0,abs(bold(super_active))) ) 1421 % We didn't change the coefficients enough by the full pass 1422 % through the predictors to warrant continuing the iterations. 1423 % The active coefficients after this pass and the 1424 % active coefficients prior to this pass differ. 1425 % This implies that a coefficient changed between zero 1426 % and a small numeric value. There is no "fit-wise" reason 1427 % to prefer the before and after coefficients, so we 1428 % choose the more parsimonious alternative. 1429 if sum(new_active) > sum(active) 1430 b = bold; 1431 else 1432 active = new_active; 1433 end 1434 break 1435 else 1436 active = new_active; 1437 end 1438 end 1439 end 1440 1441 end 1442 1443 varargout{1} = active; 1444 1445 end %-penalizedWls() 1446 1447 % =================================================== 1448 % glmPenalizedWlsWrapper() 1449 % =================================================== 1450 1451 function [b,varargout] = glmPenalizedWlsWrapper(X,Y,b,active,weights, ... 1452 lambda,alpha,reltol,ever_active) 1453 1454 % The coordinate descent in penalizedWls() assumes centered X and Y, 1455 % and it assumes X does NOT have a column of ones. 1456 1457 X0 = X(:,2:end); 1458 1459 weights = weights(:)'; 1460 1461 normedWeights = weights / sum(weights); 1462 1463 % Center X 1464 muX = normedWeights * X0; 1465 X0 = bsxfun(@minus,X0,muX); 1466 1467 % Center Y. 1468 muY = normedWeights * Y; 1469 Y = Y - muY; 1470 1471 [bPredictors,varargout{1}] = penalizedWls(X0, Y, b(2:end), ... 1472 active,weights,lambda,alpha,reltol); 1473 1474 bPredictors(~ever_active,:) = 0; 1475 1476 % Since X and Y were centered for the WLS we can calculate the intercept 1477 % after the fact. 1478 Intercept = muY-muX*bPredictors; 1479 b = [Intercept; bPredictors]; 1480 1481 end %-glmPenalizedWlsWrapper() 1482 1483 function [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, weights, alpha, standardize, ... 1484 distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun) 1485 1486 % lambdaMax is the penalty term (lambda) beyond which coefficients 1487 % are guaranteed to be all zero: there is no benefit to calculating 1488 % penalized fits with lambda > lambdaMax. 1489 % 1490 % nullDev is the deviance of the fit using just a constant term. 1491 % 1492 % The input parameter 'devFun' is used only as a sanity check to see if glmfit 1493 % gets a plausible fit with intercept term only. 1494 1495 % Head off potential cruft in the command window. 1496 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned'); 1497 wsIterationLimit = warning('off','stats:glmfit:IterationLimit'); 1498 wsPerfectSeparation = warning('off','stats:glmfit:PerfectSeparation'); 1499 wsBadScaling = warning('off','stats:glmfit:BadScaling'); 1500 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2)); 1501 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit)); 1502 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation)); 1503 cleanupBadScaling = onCleanup(@() warning(wsBadScaling)); 1504 1505 if ~isempty(weights) 1506 observationWeights = true; 1507 weights = weights(:)'; 1508 % Normalized weights are used for standardization and calculating lambdaMax. 1509 normalizedweights = weights / sum(weights); 1510 else 1511 observationWeights = false; 1512 end 1513 1514 [N,~] = size(X); 1515 1516 % If we were asked to standardize the predictors, do so here because 1517 % the calculation of lambdaMax needs the predictors as we will use 1518 % them to perform fits. 1519 1520 if standardize 1521 % If X has any constant columns, we want to protect against 1522 % divide-by-zero in normalizing variances. 1523 constantPredictors = (range(X)==0); 1524 1525 if ~observationWeights 1526 % Center and scale 1527 [X0,~,~] = zscore(X,1); 1528 else 1529 % Weighted center and scale 1530 muX = normalizedweights * X; 1531 X0 = bsxfun(@minus,X,muX); 1532 sigmaX = sqrt( normalizedweights * (X0.^2) ); 1533 % Avoid divide by zero with constant predictors 1534 sigmaX(constantPredictors) = 1; 1535 X0 = bsxfun(@rdivide, X0, sigmaX); 1536 end 1537 else 1538 if ~observationWeights 1539 % Center 1540 muX = mean(X,1); 1541 X0 = bsxfun(@minus,X,muX); 1542 else 1543 % Weighted center 1544 muX = normalizedweights(:)' * X; 1545 X0 = bsxfun(@minus,X,muX); 1546 end 1547 end 1548 1549 constantTerm = ones(length(Y),1); 1550 if isscalar(offset) 1551 [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ... 1552 'link',link, 'weights',weights); 1553 predictedMu = glmval(coeffs,constantTerm,link,'constant','off'); 1554 else 1555 [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ... 1556 'link',link,'weights',weights,'offset',offset); 1557 predictedMu = glmval(coeffs,constantTerm,link,'constant','off','offset',offset); 1558 end 1559 1560 nullIntercept = coeffs; 1561 1562 % Sanity check. With badly matched link / distr / data, glmfit may not 1563 % have been able to converge to a reasonble estimate. If so, the inputs 1564 % to lassoglm may constitute a difficult problem formulation with 1565 % unsatisfactory maximum likelihood solution. Poor formulations 1566 % have been observed with mismatched links (ie, 'reciprocal' link with 1567 % the Poisson distribution, in place of canonical 'log'). As a screen for 1568 % this contingency, calculate the deviance we would get using the scalar 1569 % unmodeled mean for the response data. Call this quantity "muDev". 1570 % The value of muDev should be no better than the nullDev calculated 1571 % by glmfit above (albeit it might be equal or nearly equal). 1572 % If the value is better, warn that the computations are of uncertain validity. 1573 1574 if observationWeights 1575 muDev = weights * devFun(mean(Y)*ones(length(Y),1), Y); 1576 else 1577 muDev = sum(devFun(mean(Y)*ones(length(Y),1), Y)); 1578 end 1579 if (muDev - nullDev) / max([1.0 muDev nullDev]) < - 1.0e-4 1580 [~, lastid] = lastwarn; 1581 if strcmp(lastid,'stats:glmfit:BadScaling') 1582 % This reassignment of predicted values is not guaranteed to 1583 % improve matters, but is an attempt to generate a workable 1584 % sequence of lambda values. Note: Since it is a constant 1585 % value for all observations, observation weights are a wash. 1586 predictedMu = mean(Y)*ones(length(Y),1); 1587 warning(message('stats:lassoGlm:DifficultLikelihood')); 1588 end 1589 end 1590 1591 if ~isCanonical 1592 X0 = bsxfun( @times, X0, dlinkFunCanonical(predictedMu) ./ dlinkFun(predictedMu) ); 1593 end 1594 1595 if ~observationWeights 1596 dotp = abs(X0' * (Y - predictedMu)); 1597 lambdaMax = max(dotp) / (N*alpha); 1598 else 1599 wX0 = bsxfun(@times, X0, normalizedweights'); 1600 dotp = abs(sum(bsxfun(@times, wX0, (Y - predictedMu)))); 1601 lambdaMax = max(dotp) / alpha; 1602 end 1603 1604 end %-computeLambdaMax() 1605 1606 function lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault) 1607 1608 % Fill in the log-spaced sequence of lambda values. 1609 1610 if nLambda==1 1611 lambda = lambdaMax; 1612 else 1613 % Fill in a number "nLambda" of smaller values, on a log scale. 1614 if lambdaRatio==0 1615 lambdaRatio = LRdefault; 1616 addZeroLambda = true; 1617 else 1618 addZeroLambda = false; 1619 end 1620 lambdaMin = lambdaMax * lambdaRatio; 1621 loghi = log(lambdaMax); 1622 loglo = log(lambdaMin); 1623 logrange = loghi - loglo; 1624 interval = -logrange/(nLambda-1); 1625 lambda = exp(loghi:interval:loglo)'; 1626 if addZeroLambda 1627 lambda(end) = 0; 1628 else 1629 lambda(end) = lambdaMin; 1630 end 1631 end 1632 1633 end %-computeLambdaSequence 1634