Description of lassoglm

0001 function [B,stats] = lassoglm(x,y,distr,varargin)
0002 % copied from statistics toolbox - added display line
0003 %LASSOGLM Perform lasso or elastic net regularization for a generalized linear model.
0004 %   [B,STATS] = LASSOGLM(X,Y,DISTR,...) Performs L1-penalized maximum likelihood
0005 %   fits (lasso) relating the predictors in X to the responses in Y, or
0006 %   fits subject to joint L1- and L2 constraints (elastic net).
0007 %   The default is a lasso-style fit, that is, a maximum likelihood fit
0008 %   subject to a constraint on the L1-norm of the coefficients B.
0009 %
0010 %   LASSOGLM accepts all the command line parameters of the LASSO function
0011 %   and it accepts command line parameters of the GLMFIT function, with
0012 %   the following exceptions. LASSOGLM also accepts the arguments 'Link'
0013 %   and 'Offset' of GLMFIT (they can also be lower case 'link' or 'offset').
0014 %   LASSOGLM It does not accept the argument 'estdisp' of GLMFIT, or the
0015 %   argument 'constant'.  LASSOGLM does not calculate standard errors
0016 %   or covariances among the coefficients, as GLMFIT does.
0017 %   LASSOGLM always calculates an Intercept term.
0018 %
0019 %   Positional parameters:
0020 %
0021 %     X                A numeric matrix (dimension, say, NxP)
0022 %     Y                A numeric vector of length N.  If DISTR is
0023 %                      'binomial', Y may be a binary vector indicating
0024 %                      success/failure, and the total number of trials is
0025 %                      taken to be 1 for all observations.  If DISTR is
0026 %                      'binomial', Y may also be a two column matrix, the
0027 %                      first column containing the number of successes for
0028 %                      each observation, and the second column containing
0029 %                      the total number of trials.
0030 %     DISTR            The distributional family for the non-systematic
0031 %                      variation in the responses.  Acceptable values for DISTR are
0032 %                      'normal', 'binomial', 'poisson', 'gamma', and 'inverse gaussian'.
0033 %                      By default, the distribution is fit using the canonical link
0034 %                      corresponding to DISTR.  Other link functions may be
0035 %                      specified using the optional parameter 'link'.
0036 %
0037 %   Optional input parameters:
0038 %
0039 %     'Weights'        Observation weights.  Must be a vector of non-negative
0040 %                      values, of the same length as columns of X.  At least
0041 %                      two values must be positive. (default ones(N,1) or
0042 %                      equivalently (1/N)*ones(N,1)).
0043 %     'Alpha'          Elastic net mixing value, or the relative balance
0044 %                      between L2 and L1 penalty (default 1, range (0,1]).
0045 %                      Alpha=1 ==> lasso, otherwise elastic net.
0046 %                      Alpha near zero ==> nearly ridge regression.
0047 %     'NumLambda'      The number of lambda values to use, if the parameter
0048 %                      'Lambda' is not supplied (default 100). Ignored
0049 %                      if 'Lambda' is supplied. LASSOGLM may return fewer
0050 %                      fits than specified by 'NumLambda' if the deviance
0051 %                      of the fits drops below a threshold percentage of
0052 %                      the null deviance (deviance of the fit without
0053 %                      any predictors X).
0054 %     'LambdaRatio'    Ratio between the minimum value and maximum value of
0055 %                      lambda to generate, if the  parameter "Lambda" is not
0056 %                      supplied.  Legal range is [0,1). Default is 0.0001.
0057 %                      If 'LambdaRatio' is zero, LASSOGLM will generate its
0058 %                      default sequence of lambda values but replace the
0059 %                      smallest value in this sequence with the value zero.
0060 %                      'LambdaRatio' is ignored if 'Lambda' is supplied.
0061 %     'Lambda'         Lambda values. Will be returned in return argument
0062 %                      STATS in ascending order. The default is to have
0063 %                      LASSOGLM generate a sequence of lambda values, based
0064 %                      on 'NumLambda' and 'LambdaRatio'. LASSOGLM will generate
0065 %                      a sequence, based on the values in X and Y, such that
0066 %                      the largest Lambda value is estimated to be just
0067 %                      sufficient to produce all zero coefficients B.
0068 %                      You may supply a vector of real, non-negative values
0069 %                      of lambda for LASSOGLM to use, in place of its default
0070 %                      sequence.  If you supply a value for 'Lambda',
0071 %                      'NumLambda' and 'LambdaRatio' are ignored.
0072 %     'DFmax'          Maximum number of non-zero coefficients in the model.
0073 %                      Can be useful with large numbers of predictors.
0074 %                      Results only for lambda values that satisfy this
0075 %                      degree of sparseness will be returned.  Default is
0076 %                      to not limit the number of non-zero coefficients.
0077 %     'Standardize'    Whether to scale X prior to fitting the model
0078 %                      sequence. This affects whether the regularization is
0079 %                      applied to the coefficients on the standardized
0080 %                      scale or the original scale. The results are always
0081 %                      presented on the original data scale. Default is
0082 %                      TRUE, do scale X.
0083 %     'RelTol'         Convergence threshold for coordinate descent algorithm.
0084 %                      The coordinate descent iterations will terminate
0085 %                      when the relative change in the size of the
0086 %                      estimated coefficients B drops below this threshold.
0087 %                      Default: 1e-4. Legal range is (0,1).
0088 %     'CV'             If present, indicates the method used to compute Deviance.
0089 %                      When 'CV' is a positive integer K, LASSOGLM uses K-fold
0090 %                      cross-validation.  Set 'CV' to a cross-validation
0091 %                      partition, created using CVPARTITION, to use other
0092 %                      forms of cross-validation. You cannot use a
0093 %                      'Leaveout' partition with LASSOGLM.
0094 %                      When 'CV' is 'resubstitution', LASSOGLM uses X and Y
0095 %                      both to fit the model and to estimate the deviance
0096 %                      of the fitted model, without cross-validation.
0097 %                      The default is 'resubstitution'.
0098 %     'MCReps'         A positive integer indicating the number of Monte-Carlo
0099 %                      repetitions for cross-validation.  The default value is 1.
0100 %                      If 'CV' is 'resubstitution' or a cvpartition of type
0101 %                      'resubstitution', 'MCReps' must be 1.  If 'CV' is a
0102 %                      cvpartition of type 'holdout', then 'MCReps' must be
0103 %                      greater than one.
0104 %     'PredictorNames' A cell array of names for the predictor variables,
0105 %                      in the order in which they appear in X.
0106 %                      Default: {}
0107 %     'Options'        A structure that contains options specifying whether to
0108 %                      conduct cross-validation evaluations in parallel, and
0109 %                      options specifying how to use random numbers when computing
0110 %                      cross validation partitions. This argument can be created
0111 %                      by a call to STATSET. CROSSVAL uses the following fields:
0112 %                        'UseParallel'
0113 %                        'UseSubstreams'
0114 %                        'Streams'
0115 %                      For information on these fields see PARALLELSTATS.
0116 %                      NOTE: If supplied, 'Streams' must be of length one.
0117 %     'Link'           The link function to use in place of the canonical link.
0118 %                      The link function defines the relationship f(mu) = x*b
0119 %                      between the mean response mu and the linear combination of
0120 %                      predictors x*b.  Specify the link parameter value as one of
0121 %                      - the text strings 'identity', 'log', 'logit', 'probit',
0122 %                        'comploglog', 'reciprocal', 'loglog', or
0123 %                      - an exponent P defining the power link, mu = (x*b)^P
0124 %                        for x*b > 0, or
0125 %                      - a cell array of the form {FL FD FI}, containing three
0126 %                        function handles, created using @, that define the link (FL),
0127 %                        the derivative of the link (FD), and the inverse link (FI).
0128 %     'Offset'         A vector to use as an additional predictor variable, but
0129 %                      with a coefficient value fixed at 1.0.  Default is
0130 %                      not to utilize an offset variable.
0131 %
0132 %   Return values:
0133 %     B                The fitted coefficients for each model.
0134 %                      B will have dimension PxL, where
0135 %                      P = size(X,2) is the number of predictors, and
0136 %                      L = length(STATS.Lambda).
0137 %     STATS            STATS is a struct that contains information about the
0138 %                      sequence of model fits corresponding to the columns
0139 %                      of B. STATS contains the following fields:
0140 %
0141 %       'Intercept'    The intercept term for each model. Dimension 1xL.
0142 %       'Lambda'       The sequence of lambda penalties used, in ascending order.
0143 %                      Dimension 1xL.
0144 %       'Alpha'        The elastic net mixing value that was used.
0145 %       'DF'           The number of nonzero coefficients in B for each
0146 %                      value of lambda. Dimension 1xL.
0147 %       'Deviance'     The deviance of the fitted model for each value of
0148 %                      lambda. If cross-validation was performed, the values
0149 %                      for 'Deviance' represent the estimated expected
0150 %                      deviance of the model applied to new data, as
0151 %                      calculated by cross-validation. Otherwise,
0152 %                      'Deviance' is the deviance of the fitted model
0153 %                      applied to the data used to perform the fit.
0154 %                      Dimension 1xL.
0155 %
0156 %     If cross-validation was performed, STATS also includes the following
0157 %     fields:
0158 %
0159 %       'SE'                The standard error of 'Deviance' for each lambda, as
0160 %                           calculated during cross-validation. Dimension 1xL.
0161 %       'LambdaMinDeviance' The lambda value with minimum expected deviance, as
0162 %                           calculated during cross-validation. Scalar.
0163 %       'Lambda1SE'         The largest lambda such that 'Deviance' is within
0164 %                           one standard error of the minimum. Scalar.
0165 %       'IndexMinDeviance'  The index of Lambda with value LambdaMinMSE. Scalar.
0166 %       'Index1SE'          The index of Lambda with value Lambda1SE. Scalar.
0167 %
0168 %   See also lassoPlot, lasso, ridge, parallelstats, glmfit.
0169 
0170 %   References:
0171 %   [1] Tibshirani, R. (1996) Regression shrinkage and selection
0172 %       via the lasso. Journal of the Royal Statistical Society,
0173 %       Series B, Vol 58, No. 1, pp. 267-288.
0174 %   [2] Zou, H. and T. Hastie. (2005) Regularization and variable
0175 %       selection via the elastic net. Journal of the Royal Statistical
0176 %       Society, Series B, Vol. 67, No. 2, pp. 301-320.
0177 %   [3] Friedman, J., R. Tibshirani, and T. Hastie. (2010) Regularization
0178 %       paths for generalized linear models via coordinate descent.
0179 %       Journal of Statistical Software, Vol 33, No. 1,
0180 %       http://www.jstatsoft.org/v33/i01.
0181 %   [4] Hastie, T., R. Tibshirani, and J. Friedman. (2008) The Elements
0182 %       of Statistical Learning, 2nd edition, Springer, New York.
0183 %   [5] Dobson, A.J. (2002) An Introduction to Generalized Linear
0184 %       Models, 2nd edition, Chapman&Hall/CRC Press.
0185 %   [6] McCullagh, P., and J.A. Nelder (1989) Generalized Linear
0186 %       Models, 2nd edition, Chapman&Hall/CRC Press.
0187 %   [7] Collett, D. (2003) Modelling Binary Data, 2nd edition,
0188 %       Chapman&Hall/CRC Press.
0189 
0190 %   Copyright 2011-2014 The MathWorks, Inc.
0191 
0192 
0193 if nargin < 2
0194     error(message('stats:lassoGlm:TooFewInputs'));
0195 end
0196 
0197 if nargin < 3 || isempty(distr), distr = 'normal'; end
0198 
0199 paramNames = {     'link' 'offset' 'weights'};
0200 paramDflts = {'canonical'  []       []};
0201 [link,offset,pwts,~,varargin] = ...
0202                     internal.stats.parseArgs(paramNames, paramDflts, varargin{:});
0203                 
0204 % Read in the optional parameters pertinent to regularization (eg, lasso)
0205 LRdefault = 1e-4;
0206 pnames = { 'alpha' 'numlambda' 'lambdaratio' 'lambda' ...
0207     'dfmax' 'standardize' 'reltol' 'cv' 'mcreps' ...
0208     'predictornames' 'options' };
0209 dflts  = {  1       100       LRdefault     []      ...
0210      []      true          1e-4    'resubstitution'  1 ...
0211      {}               []};
0212 [alpha, nLambda, lambdaRatio, lambda, ...
0213     dfmax, standardize, reltol, cvp, mcreps, predictorNames, parallelOptions] ...
0214      = internal.stats.parseArgs(pnames, dflts, varargin{:});
0215  
0216 if ~isempty(lambda)
0217     userSuppliedLambda = true;
0218 else
0219     userSuppliedLambda = false;
0220 end
0221 
0222 % X a real 2D matrix
0223 if ~ismatrix(x) || length(size(x)) ~= 2 || ~isreal(x)
0224     error(message('stats:lassoGlm:XnotaReal2DMatrix'));
0225 end
0226 
0227 % We need at least two observations.
0228 if isempty(x) || size(x,1) < 2
0229     error(message('stats:lassoGlm:TooFewObservations'));
0230 end
0231 
0232 % Categorical responses 'binomial'
0233 if isa(y,'categorical')
0234     [y, classname] = grp2idx(y); 
0235     nc = length(classname);
0236     if nc > 2
0237         error(message('stats:glmfit:TwoLevelCategory'));
0238     end
0239     y(y==1) = 0;
0240     y(y==2) = 1;
0241 end
0242 
0243 % Number of Predictors
0244 P = size(x,2);
0245 
0246 % Head off potential cruft in the command window.
0247 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned');
0248 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2));
0249 
0250 % Sanity checking on predictors, responses, weights and offset parameter,
0251 % and removal of NaNs and Infs from same. Also, conversion of the
0252 % two-column form of binomial response to a proportion.
0253 [X, Y, offset, pwts, dataClass, nTrials, binomialTwoColumn] = ...
0254     glmProcessData(x, y, distr, 'off', offset, pwts);
0255 
0256 [~,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,isCanonical,dlinkFunCanonical] = ...
0257     glmProcessDistrAndLink(Y,distr,link,'off',nTrials,dataClass);
0258 
0259 [X,Y,pwts,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ...
0260     processLassoParameters(X,Y,pwts,alpha,nLambda,lambdaRatio,lambda,dfmax, ...
0261     standardize,reltol,cvp,mcreps,predictorNames);
0262 
0263 % Compute the amount of penalty at which all coefficients shrink to zero.
0264 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, pwts, alpha, standardize, ...
0265     distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun);
0266 
0267 % If the command line did not provide a sequence of penalty terms,
0268 % generate a sequence.
0269 if isempty(lambda)
0270     lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault);
0271 end
0272 
0273 % Also, to help control saturated fits with binomial outcomes,
0274 % also alter the glmfit values for muLims
0275 if strcmp(distr,'binomial')
0276     muLims = [1.0e-5 1.0-1.0e-5];
0277 end
0278 
0279 % Will convert from lambda penalty matched to average log-likelihood
0280 % per observation (which is what lasso uses) to an equivalent
0281 % penalty for total log-likelihood (which is what glmIRLS uses.)
0282 if isempty(pwts) && isscalar(nTrials)
0283     totalWeight = size(X,1);
0284 elseif ~isempty(pwts) && isscalar(nTrials)
0285     totalWeight = sum(pwts);
0286 elseif isempty(pwts) && ~isscalar(nTrials)
0287     totalWeight = sum(nTrials);
0288 else
0289     totalWeight = sum(pwts .* nTrials);
0290 end
0291 
0292 % Convert lambda penalty to match total log-likelihood
0293 lambda = lambda * totalWeight;
0294 
0295 penalizedFitPartition = @(x,y,offset,pwts,n,wlsfit,b,active,mu,eta,sqrtvarFun) ...
0296     glmIRLSwrapper(x,y,distr,offset,pwts,dataClass,n, ...
0297     sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims,wlsfit,nullDev,reltol);
0298 
0299 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) ...
0300     penalizedFitPartition(x,y,offset,pwts',nTrials,wlsfit,b,active,mu,eta,sqrtvarFun);
0301 
0302 [B,Intercept,lambda,deviance] = ...
0303     lassoFit(X,Y,pwts,lambda,alpha,dfmax,standardize,reltol,lambdaMax*totalWeight,ever_active, ...
0304     penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept);
0305 
0306 % Store the number of non-zero coefficients for each lambda.
0307 df = sum(B~=0,1);
0308 
0309 % The struct 'stats' will comprise the second return argument.
0310 % Put place holders for ever-present fields to avoid undefined references
0311 % and to secure the order we want in the struct.
0312 stats = struct();
0313 stats.Intercept      = [];
0314 stats.Lambda         = [];
0315 stats.Alpha          = alpha;
0316 stats.DF             = [];
0317 stats.Deviance       = [];
0318 stats.PredictorNames = predictorNames;
0319 
0320 % ---------------------------------------------------------
0321 % If requested, use cross-validation to calculate
0322 % Prediction Mean Squared Error for each lambda.
0323 % ---------------------------------------------------------
0324 
0325 if ~isequal(cvp,'resubstitution')
0326     % Replace dfmax with P, the number of predictors supplied at the
0327     % command line. dfmax might cause one fold to return empty values,
0328     % because no lambda satisfies the dfmax criteria, while other folds
0329     % return a numeric value. The lambda sequence has already been
0330     % pruned by dfmax, if appropriate, in the call to lassoFit above.
0331     cvfun = @(Xtrain,Ytrain,Xtest,Ytest) lassoFitAndPredict( ...
0332         Xtrain,Ytrain,Xtest,Ytest, ...
0333         lambda,alpha,P,standardize,reltol,ever_active, ...
0334         penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ...
0335         isCanonical, dlinkFunCanonical,devFun,dataClass);
0336     weights = pwts;
0337     if isempty(weights)
0338         weights = nan(size(X,1),1);
0339     end
0340     if isempty(offset) || isequal(offset,0)
0341         offset = nan(size(X,1),1);
0342     end
0343     if binomialTwoColumn
0344         response = [nTrials Y];
0345     else
0346         response = Y;
0347     end
0348     cvDeviance = crossval(cvfun,[weights(:) offset(:) X],response, ...
0349         'Partition',cvp,'Mcreps',mcreps,'Options',parallelOptions);
0350     % Scale single-fold deviances up to estimate whole data set deviance
0351     cvDeviance = bsxfun(@times,cvDeviance,repmat((size(X,1) ./ cvp.TestSize)', mcreps, 1));
0352     deviance = mean(cvDeviance);
0353     se = std(cvDeviance) / sqrt(size(cvDeviance,1));
0354     minDeviance = min(deviance);
0355     minIx = find(deviance == minDeviance,1);
0356     lambdaMin = lambda(minIx);
0357     minplus1 = deviance(minIx) + se(minIx);
0358     seIx = find((deviance(1:minIx) <= minplus1),1,'first');
0359     if isempty(seIx)
0360         lambdaSE = [];
0361     else
0362         lambdaSE = lambda(seIx);
0363     end
0364     
0365     % Deposit cross-validation results in struct for return value.
0366     stats.SE                = se;
0367     stats.LambdaMinDeviance = lambdaMin;
0368     stats.Lambda1SE         = lambdaSE;
0369     stats.IndexMinDeviance  = minIx;
0370     stats.Index1SE          = seIx;
0371     
0372     % Convert key lambda values determined by cross-validation
0373     % back to match average log-likelihood per observation.
0374     stats.LambdaMinDeviance = stats.LambdaMinDeviance / totalWeight;
0375     stats.Lambda1SE         = stats.Lambda1SE / totalWeight;
0376 end
0377 
0378 % ------------------------------------------
0379 % Order results by ascending lambda
0380 % ------------------------------------------
0381 
0382 nLambda = length(lambda);
0383 reverseIndices = nLambda:-1:1;
0384 lambda = lambda(reverseIndices);
0385 lambda = reshape(lambda,1,nLambda);
0386 B = B(:,reverseIndices);
0387 Intercept = Intercept(reverseIndices);
0388 df = df(reverseIndices);
0389 deviance = deviance(reverseIndices);
0390 if ~isequal(cvp,'resubstitution')
0391     stats.SE               = stats.SE(reverseIndices);
0392     stats.IndexMinDeviance = nLambda - stats.IndexMinDeviance + 1;
0393     stats.Index1SE         = nLambda - stats.Index1SE + 1;
0394 end
0395 
0396 stats.Intercept = Intercept;
0397 stats.Lambda = lambda;
0398 stats.DF = df;
0399 stats.Deviance = deviance;
0400 
0401 % Convert lambda penalty back to match average deviance per observation.
0402 stats.Lambda = stats.Lambda / totalWeight;
0403 
0404 end %-main block
0405 
0406 % ------------------------------------------
0407 % SUBFUNCTIONS
0408 % ------------------------------------------
0409 
0410 % ===================================================
0411 %                  startingVals()
0412 % ===================================================
0413 
0414 function mu = startingVals(distr,y,N)
0415 % Find a starting value for the mean, avoiding boundary values
0416 switch distr
0417 case 'poisson'
0418     mu = y + 0.25;
0419 case 'binomial'
0420     mu = (N .* y + 0.5) ./ (N + 1);
0421 case {'gamma' 'inverse gaussian'}
0422     mu = max(y, eps(class(y))); % somewhat arbitrary
0423 otherwise
0424     mu = y;
0425 end
0426 end %-startingVals
0427 
0428 % ===================================================
0429 %                diagnoseSeparation()
0430 % ===================================================
0431 
0432 function diagnoseSeparation(eta,y,N)
0433 % Compute sample proportions, sorted by increasing fitted value
0434 [x,idx] = sort(eta);
0435 if ~isscalar(N)
0436     N = N(idx);
0437 end
0438 p = y(idx);
0439 if all(p==p(1))   % all sample proportions are the same
0440     return
0441 end
0442 if x(1)==x(end)   % all fitted probabilities are the same
0443     return
0444 end
0445 
0446 noFront = 0<p(1) && p(1)<1;     % no "front" section as defined below
0447 noEnd = 0<p(end) && p(end)<1;   % no "end" section as defined below
0448 if p(1)==p(end) || (noFront && noEnd)
0449     % No potential for perfect separation if the ends match or neither
0450     % end is perfect
0451     return
0452 end
0453 
0454 % There is at least one observation potentially taking probability 0 or
0455 % 1 at one end or the other with the data sorted by eta. We want to see
0456 % if the data, sorted by eta (x) value, have this form:
0457 %        x(1)<=...<=x(A)  <  x(A+1)=...=x(B-1)  <  x(B)<=...<=x(n)
0458 % with   p(1)=...=p(A)=0                           p(B)=...=p(n)=1
0459 % or     p(1)=...=p(A)=1                           p(B)=...=p(n)=0
0460 %
0461 % This includes the possibilities:
0462 %     A+1=B  - no middle section
0463 %     A=0    - no perfect fit at the front
0464 %     B=n+1  - no perfect fit at the end
0465 dx = 100*max(eps(x(1)),eps(x(end)));
0466 n = length(p);
0467 if noFront
0468     A = 0;
0469 else
0470     A = find(p~=p(1),1,'first')-1;
0471     cutoff = x(A+1)-dx;
0472     A = sum(x(1:A)<cutoff);
0473 end
0474 
0475 if noEnd
0476     B = n+1;
0477 else
0478     B = find(p~=p(end),1,'last')+1;
0479     cutoff = x(B-1)+dx;
0480     B = (n+1) - sum(x(B:end)>cutoff);
0481 end
0482 
0483 if A+1<B-1
0484     % There is a middle region with >1 point, see if x varies there
0485     if x(B-1)-x(A+1)>dx
0486         return
0487     end
0488 end
0489 
0490 % We have perfect separation that can be defined by some middle point
0491 if A+1==B
0492     xmid = x(A) + 0.5*(x(B)-x(A));
0493 else
0494     xmid = x(A+1);
0495     if isscalar(N)
0496         pmid = mean(p(A+1:B-1));
0497     else
0498         pmid = sum(p(A+1:B-1).*N(A+1:B-1)) / sum(N(A+1:B-1));
0499     end
0500 end
0501 
0502 % Create explanation part for the lower region, if any
0503 if A>=1
0504     explanation = sprintf('\n   XB<%g: P=%g',xmid,p(1));
0505 else
0506     explanation = '';
0507 end
0508 
0509 % Add explanation part for the middle region, if any
0510 if A+1<B
0511     explanation = sprintf('%s\n   XB=%g: P=%g',explanation,xmid,pmid);
0512 end
0513     
0514 % Add explanation part for the upper region, if any
0515 if B<=n
0516     explanation = sprintf('%s\n   XB>%g: P=%g',explanation,xmid,p(end));
0517 end
0518 
0519 warning(message('stats:lassoGlm:PerfectSeparation', explanation));
0520 end %-diagnoseSeparation()
0521 
0522 % ===============================================
0523 %               glmProcessData()
0524 % ===============================================
0525 
0526 function [x, y, offset, pwts, dataClass, N, binomialTwoColumn] = ...
0527     glmProcessData(x, y, distr, const, offset, pwts)
0528 
0529 N = []; % needed only for binomial
0530 binomialTwoColumn = false;
0531 
0532 % Convert the two-column form of 'y', if supplied ('binomial' only).
0533 if strcmp(distr,'binomial')
0534     if size(y,2) == 1
0535         % N will get set to 1 below
0536         if any(y < 0 | y > 1)
0537             error(message('stats:lassoGlm:BadDataBinomialFormat'));
0538         end
0539     elseif size(y,2) == 2
0540         binomialTwoColumn = true;
0541         y(y(:,2)==0,2) = NaN;
0542         N = y(:,2);
0543         y = y(:,1) ./ N;
0544         if any(y < 0 | y > 1)
0545             error(message('stats:lassoGlm:BadDataBinomialRange'));
0546         end
0547     else
0548         error(message('stats:lassoGlm:MatrixOrBernoulliRequired'));
0549     end
0550 end
0551 
0552 [anybad,~,y,x,offset,pwts,N] = dfswitchyard('statremovenan',y,x,offset,pwts,N);
0553 if anybad > 0
0554     switch anybad
0555     case 2
0556         error(message('stats:lassoGlm:InputSizeMismatchX'))
0557     case 3
0558         error(message('stats:lassoGlm:InputSizeMismatchOffset'))
0559     case 4
0560         error(message('stats:lassoGlm:InputSizeMismatchPWTS'))
0561     end
0562 end
0563 
0564 % Extra screening for lassoglm (Infs and zero weights)
0565 okrows = all(isfinite(x),2) & all(isfinite(y),2) & all(isfinite(offset));
0566 
0567 if ~isempty(pwts)
0568     % This screen works on weights prior to stripping NaNs and Infs.
0569     if ~isvector(pwts) || ~isreal(pwts) || size(x,1) ~= length(pwts) || ...
0570             ~all(isfinite(pwts)) || any(pwts<0)
0571         error(message('stats:lassoGlm:InvalidObservationWeights'));
0572     end    
0573     okrows = okrows & pwts(:)>0;
0574     pwts = pwts(okrows);
0575 end
0576 
0577 % We need at least two observations after stripping NaNs and Infs and zero weights.
0578 if sum(okrows)<2
0579     error(message('stats:lassoGlm:TooFewObservationsAfterNaNs'));
0580 end
0581 
0582 % Remove observations with Infs in the predictor or response
0583 % or with zero observation weight.  NaNs were already gone.
0584 x = x(okrows,:);
0585 y = y(okrows);
0586 if ~isempty(N) && ~isscalar(N)
0587     N = N(okrows);
0588 end
0589 if ~isempty(offset)
0590     offset = offset(okrows);
0591 end
0592 
0593 if isequal(const,'on')
0594     x = [ones(size(x,1),1) x];
0595 end
0596 dataClass = superiorfloat(x,y);
0597 x = cast(x,dataClass);
0598 y = cast(y,dataClass);
0599 
0600 if isempty(offset), offset = 0; end
0601 if isempty(N), N = 1; end
0602 
0603 end %-glmProcessData()
0604 
0605 % ===================================================
0606 %             processLassoParameters()
0607 % ===================================================
0608 
0609 function [X,Y,weights,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ...
0610     processLassoParameters(X,Y,weights, alpha, nLambda, lambdaRatio, lambda, dfmax, ...
0611     standardize, reltol, cvp, mcreps, predictorNames)
0612 
0613 % === 'Weights' parameter ===
0614 if ~isempty(weights)
0615         
0616     % Computations expect that weights is a row vector.
0617     weights = weights(:)';
0618     
0619 end
0620 
0621 [~,P] = size(X);
0622 
0623 % If X has any constant columns, we want to exclude them from the
0624 % coordinate descent calculations.  The corresponding coefficients
0625 % will be returned as zero.
0626 constantPredictors = (range(X)==0);
0627 ever_active = ~constantPredictors;
0628 
0629 % === 'Alpha' parameter ===
0630 
0631 % Require 0 < alpha <= 1.
0632 % "0" would correspond to ridge, "1" is lasso.
0633 if ~isscalar(alpha) || ~isreal(alpha) || ~isfinite(alpha) || ...
0634         alpha <= 0 || alpha > 1
0635     error(message('stats:lassoGlm:InvalidAlpha'))
0636 end
0637 
0638 % === 'Standardize' option ===
0639 
0640 % Require a logical value.
0641 if ~isscalar(standardize) || (~islogical(standardize) && standardize~=0 && standardize~=1)
0642     error(message('stats:lassoGlm:InvalidStandardize'))
0643 end
0644 
0645 % === 'Lambda' sequence or 'NumLambda' and 'lambdaRatio' ===
0646 
0647 if ~isempty(lambda)
0648     
0649     % Sanity check on user-supplied lambda sequence.  Should be non-neg real.
0650     if ~isreal(lambda) || any(lambda < 0)
0651         error(message('stats:lassoGlm:NegativeLambda'));
0652     end
0653 
0654     lambda = sort(lambda(:),1,'descend');
0655     
0656 else
0657     
0658     % Sanity-check of 'NumLambda', should be positive integer.
0659     if ~isreal(nLambda) || ~isfinite(nLambda) || nLambda < 1
0660         error(message('stats:lassoGlm:InvalidNumLambda'));
0661     else
0662         nLambda = floor(nLambda);
0663     end
0664     
0665     % Sanity-checking of LambdaRatio, should be in [0,1).
0666     if ~isreal(lambdaRatio) || lambdaRatio <0 || lambdaRatio >= 1
0667         error(message('stats:lassoGlm:InvalidLambdaRatio'));
0668     end
0669 end
0670 
0671 % === 'RelTol' parameter ===
0672 %
0673 if ~isscalar(reltol) || ~isreal(reltol) || ~isfinite(reltol) || reltol <= 0 || reltol >= 1
0674     error(message('stats:lassoGlm:InvalidRelTol'));
0675 end
0676 
0677 % === 'DFmax' parameter ===
0678 %
0679 % DFmax is #non-zero coefficients
0680 % DFmax should map to an integer in [1,P] but we truncate if .gt. P
0681 %
0682 if isempty(dfmax)
0683     dfmax = P;
0684 else
0685     if ~isscalar(dfmax)
0686         error(message('stats:lassoGlm:DFmaxBadType'));
0687     end
0688     try
0689         dfmax = uint32(dfmax);
0690     catch ME
0691         mm = message('stats:lassoGlm:DFmaxBadType');
0692         throwAsCaller(MException(mm.Identifier,'%s',getString(mm)));
0693     end
0694     if dfmax < 1
0695         error(message('stats:lassoGlm:DFmaxNotAnIndex'));
0696     else
0697         dfmax = min(dfmax,P);
0698     end
0699 end
0700 
0701 % === 'Mcreps' parameter ===
0702 %
0703 if ~isscalar(mcreps) || ~isreal(mcreps) || ~isfinite(mcreps) || mcreps < 1
0704     error(message('stats:lassoGlm:MCRepsBadType'));
0705 end
0706 mcreps = fix(mcreps);
0707 
0708 % === 'CV' parameter ===
0709 %
0710 
0711 if isnumeric(cvp) && isscalar(cvp) && (cvp==round(cvp)) && (0<cvp)
0712     % cvp is a kfold value. It will be passed as such to crossval.
0713     if (cvp>size(X,1))
0714         error(message('stats:lassoGlm:InvalidCVforX'));
0715     end
0716     cvp = cvpartition(size(X,1),'Kfold',cvp);
0717 elseif isa(cvp,'cvpartition')
0718     if strcmpi(cvp.Type,'resubstitution')
0719         cvp = 'resubstitution';
0720     elseif strcmpi(cvp.Type,'leaveout')
0721         error(message('stats:lassoGlm:InvalidCVtype'));
0722     elseif strcmpi(cvp.Type,'holdout') && mcreps<=1
0723         error(message('stats:lassoGlm:InvalidMCReps'));
0724     end
0725 elseif strncmpi(cvp,'resubstitution',length(cvp))
0726     % This may have been set as the default, or may have been
0727     % provided at the command line.  In case it's the latter, we
0728     % expand abbreviations.
0729     cvp = 'resubstitution';
0730 else
0731     error(message('stats:lassoGlm:InvalidCVtype'));
0732 end
0733 if strcmp(cvp,'resubstitution') && mcreps ~= 1
0734     error(message('stats:lassoGlm:InvalidMCReps'));
0735 end
0736 
0737 if isa(cvp,'cvpartition')
0738     if (cvp.N ~= size(X,1)) || (min(cvp.TrainSize) < 2)
0739         % We need partitions that match the total number of observations
0740         % (after stripping NaNs and Infs and zero observation weights), and
0741         % we need training sets with at least 2 usable observations.
0742         error(message('stats:lassoGlm:TooFewObservationsForCrossval'));
0743     end
0744 end
0745 
0746 % === 'PredictorNames' parameter ===
0747 %
0748 % If PredictorNames is not supplied or is supplied as empty, we just
0749 % leave it that way. Otherwise, confirm that it is a cell array of strings.
0750 %
0751 if ~isempty(predictorNames) 
0752     if ~iscellstr(predictorNames) || length(predictorNames(:)) ~= size(X,2)
0753         error(message('stats:lassoGlm:InvalidPredictorNames'));
0754     else
0755         predictorNames = predictorNames(:)';
0756     end
0757 end
0758 
0759 end %-processLassoParameters()
0760 
0761 % ===================================================
0762 %             glmProcessDistrAndLink()
0763 % ===================================================
0764 
0765 function [estdisp,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,...
0766     isCanonical,dlinkFunCanonical] = ...
0767     glmProcessDistrAndLink(y,distr,link,estdisp,N,dataClass)
0768 
0769 switch distr
0770     case 'normal'
0771         canonicalLink = 'identity';
0772     case 'binomial'
0773         canonicalLink = 'logit';
0774     case 'poisson'
0775         canonicalLink = 'log';
0776     case 'gamma'
0777         canonicalLink = 'reciprocal';
0778     case 'inverse gaussian'
0779         canonicalLink = -2;
0780 end
0781 
0782 if isequal(link, 'canonical'), link = canonicalLink; end
0783 
0784 switch distr
0785 case 'normal'
0786     sqrtvarFun = @(mu) ones(size(mu));
0787     devFun = @(mu,y) (y - mu).^2;
0788     estdisp = 'on';
0789 case 'binomial'
0790     sqrtN = sqrt(N);
0791     sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrtN;
0792     devFun = @(mu,y) 2*N.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
0793 case 'poisson'
0794     if any(y < 0)
0795         error(message('stats:lassoGlm:BadDataPoisson'));
0796     end
0797     sqrtvarFun = @(mu) sqrt(mu);
0798     devFun = @(mu,y) 2*(y .* (log((y+(y==0)) ./ mu)) - (y - mu));
0799 case 'gamma'
0800     if any(y <= 0)
0801         error(message('stats:lassoGlm:BadDataGamma'));
0802     end
0803     sqrtvarFun = @(mu) mu;
0804     devFun = @(mu,y) 2*(-log(y ./ mu) + (y - mu) ./ mu);
0805     estdisp = 'on';
0806 case 'inverse gaussian'
0807     if any(y <= 0)
0808         error(message('stats:lassoGlm:BadDataInvGamma'));
0809     end
0810     sqrtvarFun = @(mu) mu.^(3/2);
0811     devFun = @(mu,y) ((y - mu)./mu).^2 ./  y;
0812     estdisp = 'on';
0813 otherwise
0814     error(message('stats:lassoGlm:BadDistribution'));
0815 end
0816 
0817 % Instantiate functions for one of the canned links, or validate a
0818 % user-defined link specification.
0819 [linkFun,dlinkFun,ilinkFun] = dfswitchyard('stattestlink',link,dataClass);
0820 
0821 % Initialize mu and eta from y.
0822 mu = startingVals(distr,y,N);
0823 eta = linkFun(mu);
0824 
0825 % Enforce limits on mu to guard against an inverse link that doesn't map into
0826 % the support of the distribution.
0827 switch distr
0828 case 'binomial'
0829     % mu is a probability, so order one is the natural scale, and eps is a
0830     % reasonable lower limit on that scale (plus it's symmetric).
0831     muLims = [eps(dataClass) 1-eps(dataClass)];
0832 case {'poisson' 'gamma' 'inverse gaussian'}
0833     % Here we don't know the natural scale for mu, so make the lower limit
0834     % small.  This choice keeps mu^4 from underflowing.  No upper limit.
0835     muLims = realmin(dataClass).^.25;
0836 otherwise
0837     muLims = []; 
0838 end
0839 
0840 % These two quantities (isCanonical, dlinkFunCanonical) are not used by
0841 % glmfit but they are needed for calculation of lambdaMax in lassoglm.
0842 
0843 isCanonical = isequal(link, canonicalLink);
0844 [~, dlinkFunCanonical] = dfswitchyard('stattestlink', canonicalLink, dataClass);
0845 
0846 end %-glmProcessDistrAndLink()
0847 
0848 % ===================================================
0849 %                      glmIRLS()
0850 % ===================================================
0851 
0852 function [b,mu,eta,varargout] = glmIRLS(x,y,distr,offset,pwts,dataClass,N, ...
0853     sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ...
0854     wlsfit,nullDev,devFun,reltol)
0855 
0856 wsIterationLimit = warning('off','stats:lassoGlm:IterationLimit');
0857 wsPerfectSeparation = warning('off','stats:lassoGlm:PerfectSeparation');
0858 wsBadScaling = warning('off','stats:lassoGlm:BadScaling');
0859 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit));
0860 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation));
0861 cleanupBadScaling = onCleanup(@() warning(wsBadScaling));
0862 
0863 if isempty(pwts)
0864     pwts = 1;
0865 end
0866 
0867 % Set up for iterations
0868 iter = 0;
0869 iterLim = 100;
0870 warned = false;
0871 seps = sqrt(eps);
0872 
0873 % Match the convergence tolerance for the IRLS algorithm to the
0874 % command line parameter 'RelTol'.
0875 convcrit = max(1e-6,2*reltol);
0876 
0877 eta = linkFun(mu);
0878 
0879 while iter <= iterLim
0880     iter = iter+1;
0881 
0882     % Compute adjusted dependent variable for least squares fit
0883     deta = dlinkFun(mu);
0884     z = eta + (y - mu) .* deta;
0885 
0886     % Compute IRLS weights the inverse of the variance function
0887     sqrtw = sqrt(pwts) ./ (abs(deta) .* sqrtvarFun(mu));
0888     
0889     % If the weights have an enormous range, we won't be able to do IRLS very
0890     % well.  The prior weights may be bad, or the fitted mu's may have too
0891     % wide a range, which is probably because the data do as well, or because
0892     % the link function is trying to go outside the distribution's support.
0893     wtol = max(sqrtw)*eps(dataClass)^(2/3);
0894     t = (sqrtw < wtol);
0895     if any(t)
0896         t = t & (sqrtw ~= 0);
0897         if any(t)
0898             sqrtw(t) = wtol;
0899             if ~warned
0900                 warning(message('stats:lassoGlm:BadScaling'));
0901             end
0902             warned = true;
0903         end
0904     end
0905 
0906     b_old = b;
0907     [b,active] = wlsfit(z - offset, x, sqrtw.^2, b, active);
0908 
0909     % Form current linear predictor, including offset
0910     eta = offset + x * b;
0911 
0912     % Compute predicted mean using inverse link function
0913     mu = ilinkFun(eta);
0914 
0915     % Force mean in bounds, in case the link function is a wacky choice
0916     switch distr
0917     case 'binomial'
0918         if any(mu < muLims(1) | muLims(2) < mu)
0919         mu = max(min(mu,muLims(2)),muLims(1));
0920         end
0921     case {'poisson' 'gamma' 'inverse gaussian'}
0922         if any(mu < muLims(1))
0923         mu = max(mu,muLims(1));
0924         end
0925     end
0926 
0927     % Check stopping conditions
0928     % Convergence of the coefficients
0929     if (~any(abs(b-b_old) > convcrit * max(seps, abs(b_old)))) 
0930         break; 
0931     end
0932     % Proportion of explained deviance explained
0933     if sum(devFun(mu,y)) < (1.0e-3 * nullDev)
0934         break;
0935     end
0936      
0937 end 
0938 
0939 if iter > iterLim
0940     warning(message('stats:lassoGlm:IterationLimit'));
0941 end
0942 
0943 if iter>iterLim && isequal(distr,'binomial')
0944     diagnoseSeparation(eta,y,N);
0945 end
0946 
0947 varargout{1} = active;
0948 
0949 end %-glmIRLS
0950 
0951 % ===================================================
0952 %                      glmIRLSwrapper()
0953 % ===================================================
0954 
0955 function [B,active,varargout] = glmIRLSwrapper(X,Y,distr,offset,pwts,dataClass,N, ...
0956     sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims, ...
0957     wlsfit,nullDev,reltol)
0958 % glmIRLSwrapper is a utility function for regularized GLM. It is called by
0959 % the regularization framework (lassoFit()), and adapts to the conventions
0960 % and expectations of the IRLS implementation for GLM.
0961 %
0962 % Lasso always returns an intercept, but lassoFit passes in the predictor
0963 % matrix 'X' without a column of ones. Unlike OLS, GLM cannot calve off
0964 % estimation of the predictor variable coefficients and calculate the
0965 % intercept externally, post-fit.  This is because each IRLS step
0966 % computes a local linear predictor (eta), and this requires an
0967 % intercept contribution. Therefore, prepend a column of ones before
0968 % passing to glmIRLS. The variable B returned by glmIRLS will include
0969 % the intercept term as the first element of B.
0970 X = [ones(size(X,1),1) X];
0971 
0972 % glmIRLS assumes pwts=1 if no observation weights were supplied,
0973 % rather than empty, which is what lassoFit will feed in.
0974 if isempty(pwts), pwts=1; end
0975 
0976 [B,mu,eta,active] = glmIRLS(X,Y,distr,offset,pwts,dataClass,N, ...
0977     sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ...
0978     wlsfit,nullDev,devFun,reltol);
0979 
0980 deviance = sum(pwts.* devFun(mu,Y));
0981 
0982 
0983 % Pull the intercept estimate out of B and return separately.
0984 Intercept = B(1);
0985 B = B(2:end);
0986 
0987 extras.Intercept = Intercept;
0988 extras.Deviance  = deviance;
0989 varargout{1} = extras;
0990 varargout{2} = mu;
0991 varargout{3} = eta;
0992 
0993 end %-glmIRLSwrapper
0994 
0995 % ===================================================
0996 %              lassoFitAndPredict()
0997 % ===================================================
0998 
0999 function dev = lassoFitAndPredict(Xtrain,Ytrain,Xtest,Ytest, ...
1000     lambda,alpha,dfmax,standardize,reltol,ever_active, ...
1001     penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ...
1002     isCanonical, dlinkFunCanonical,devFun,dataClass)
1003 %
1004 % This function sets up the conditions for lasso fit and prediction from
1005 % within crossvalidation.  It defines quantities as necessary for the
1006 % current fold.
1007 
1008 trainWeights = Xtrain(:,1);
1009 % To conform with crossval syntax, weights are prepended to the predictor
1010 % matrix. Empty weights are temporarily converted to a vector of NaNs.
1011 % This clause restores them to empty.  Same occurs below for test weights.
1012 if any(isnan(trainWeights))
1013     trainWeights = [];
1014 end
1015 trainOffset = Xtrain(:,2);
1016 if any(isnan(trainOffset))
1017     trainOffset = 0;
1018 end
1019 
1020 Xtrain = Xtrain(:,3:end);
1021 if size(Ytrain,2) == 2
1022     trainN = Ytrain(:,1);
1023     Ytrain = Ytrain(:,2);
1024 else
1025     trainN = 1;
1026 end
1027 
1028 % These quantities depend on the particular set of responses used for the fit.
1029 % Within crossval, we work with a subset of the original data. Therefore,
1030 % these quantities must be redefined on each invocation.
1031 mu = startingVals(distr,Ytrain,trainN);
1032 eta = linkFun(mu);
1033 if isequal(distr,'binomial')
1034     sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrt(trainN);
1035     devFun = @(mu,y) 2*trainN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
1036 end
1037 
1038 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) penalizedFitPartition(x,y, ...
1039     trainOffset,trainWeights,trainN,wlsfit,b,active,mu,eta,sqrtvarFun);
1040 
1041 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(Xtrain, Ytrain, trainWeights, ...
1042     alpha, standardize, distr, link, dlinkFun, trainOffset, isCanonical, dlinkFunCanonical, devFun);
1043 
1044 % Will convert from lambda penalty matched to average log-likelihood
1045 % per observation (which is what lasso uses) to an equivalent
1046 % penalty for total log-likelihood (which is what glmIRLS uses.)
1047 if isempty(trainWeights) && isscalar(trainN)
1048     totalWeight = size(Xtrain,1);
1049 elseif ~isempty(trainWeights) && isscalar(trainN)
1050     totalWeight = sum(trainWeights);
1051 elseif isempty(trainWeights) && ~isscalar(trainN)
1052     totalWeight = sum(trainN);
1053 else
1054     totalWeight = sum(trainWeights .* trainN);
1055 end
1056 
1057 lambdaMax = lambdaMax * totalWeight;
1058 
1059 [B,Intercept] = lassoFit(Xtrain,Ytrain, ...
1060     trainWeights,lambda,alpha,dfmax,standardize,reltol, ...
1061     lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,true,nullDev,nullIntercept);
1062 Bplus = [Intercept; B];
1063 
1064 testWeights = Xtest(:,1);
1065 if any(isnan(testWeights))
1066     testWeights = ones(size(Xtest,1),1);
1067 end
1068 testOffset = Xtest(:,2);
1069 if any(isnan(testOffset))
1070     testOffset = 0;
1071 end
1072 Xtest = Xtest(:,3:end);
1073 if size(Ytest,2) == 2
1074     testN = Ytest(:,1);
1075     Ytest = Ytest(:,2);
1076 else
1077     testN = 1;
1078 end
1079 
1080 % For binomial with two-column form of response, the deviance function
1081 % depends on the number of trials for each observation, which is provided
1082 % in testN. Within crossval, the test set is a subset of the original data.
1083 % Therefore, the deviance function needs to be redefined for each test set.
1084 if isequal(distr,'binomial')
1085     devFun = @(mu,y) 2*testN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
1086 end
1087 
1088 numFits = size(Bplus,2);
1089 dev = zeros(1,numFits);
1090 for i=1:numFits
1091     if ~isequal(testOffset,0)
1092         mu = glmval(Bplus(:,i), Xtest, link, 'Offset',testOffset);
1093     else
1094         mu = glmval(Bplus(:,i), Xtest, link);
1095     end
1096     di = devFun(mu,Ytest);
1097     dev(i) = sum(testWeights' * di);
1098 end
1099 
1100 end %-lassoFitAndPredict
1101 
1102 % ===================================================
1103 %                    lassoFit()
1104 % ===================================================
1105 
1106 function [B,Intercept,lambda,varargout] = ...
1107     lassoFit(X,Y,weights,lambda,alpha,dfmax,standardize,reltol, ...
1108     lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept)
1109 %
1110 % ------------------------------------------------------
1111 % Perform model fit for each lambda and the given alpha
1112 % ------------------------------------------------------
1113 
1114 regressionType = 'GLM';
1115 
1116 [~,P] = size(X);
1117 nLambda = length(lambda);
1118 
1119 % If X has any constant columns, we want to exclude them from the
1120 % coordinate descent calculations.  The corresponding coefficients
1121 % will be returned as zero.
1122 constantPredictors = (range(X)==0);
1123 ever_active = ever_active & ~constantPredictors;
1124 
1125 % === weights and standardization ===
1126 %
1127 observationWeights = ~isempty(weights);
1128 if ~isempty(weights)
1129     observationWeights = true;
1130     weights = weights(:)';
1131     % Normalize weights up front.
1132     weights = weights / sum(weights);
1133 end
1134 
1135 if standardize
1136     if ~observationWeights
1137         % Center and scale
1138         [X0,muX,sigmaX] = zscore(X,1);
1139         % Avoid divide by zero with constant predictors
1140         sigmaX(constantPredictors) = 1;
1141     else
1142         % Weighted center and scale
1143         muX = weights*X;
1144         X0 = bsxfun(@minus,X,muX);
1145         sigmaX = sqrt( weights*(X0.^2) );
1146         % Avoid divide by zero with constant predictors
1147         sigmaX(constantPredictors) = 1;
1148         X0 = bsxfun(@rdivide, X0, sigmaX);
1149     end
1150 else
1151     switch regressionType
1152         case 'OLS'           
1153             if ~observationWeights
1154                 % Center
1155                 muX = mean(X,1);
1156                 X0 = bsxfun(@minus,X,muX);
1157                 sigmaX = 1;
1158             else
1159                 % Weighted center
1160                 muX = weights*X;
1161                 X0 = bsxfun(@minus,X,muX);
1162                 sigmaX = 1;
1163             end
1164         case 'GLM'
1165             X0 = X;
1166             % For GLM don't center until we get inside the IRLS
1167             sigmaX = 1;
1168             muX = zeros(1,size(X,2));
1169     end
1170 end
1171 
1172 % For OLS, Y is centered, for GLM it is not.
1173 switch regressionType
1174     case 'OLS'
1175         if ~observationWeights
1176             muY = mean(Y);
1177         else
1178             muY = weights*Y;
1179         end
1180         Y0 = bsxfun(@minus,Y,muY);
1181     case 'GLM'
1182         Y0 = Y;
1183 end
1184 
1185 % Preallocate the returned matrix of coefficients, B,
1186 % and other variables sized to nLambda.
1187 
1188 B = zeros(P,nLambda);
1189 
1190 b = zeros(P,1,dataClass);
1191 
1192 if nLambda > 0
1193     Extras(nLambda) = struct('Intercept', nullIntercept, 'Deviance', nullDev);
1194     for i=1:nLambda-1, Extras(i) = Extras(nLambda); end
1195     intercept = nullIntercept;
1196 end
1197 
1198 active = false(1,P);
1199 
1200 for i = 1:nLambda
1201     
1202     lam = lambda(i);
1203     disp (['iteration ', num2str(i), ' out of ' , num2str(nLambda)]);
1204     if lam >= lambdaMax
1205         continue;
1206     end
1207     
1208     % This anonymous function adapts the conventions of the IRLS algorithm
1209     % for glm fit to the coordinate descent algorithm for penalized WLS,
1210     % which is used within the IRLS algorithm.
1211     wlsfit = @(x,y,weights,b,active) glmPenalizedWlsWrapper(y,x,b,active,weights,lam, ...
1212         alpha,reltol,ever_active);
1213 
1214     [b,active,extras,mu,eta] = penalizedFit(X0,Y0,wlsfit,[intercept;b],active,mu,eta);
1215 
1216     B(:,i) = b;
1217     
1218     Extras(i) = extras;
1219     
1220     % Halt if maximum model size ('DFmax') has been met or exceeded.
1221     if sum(active) > dfmax
1222         % truncate B and lambda output arguments
1223         lambda = lambda(1:(i-1));
1224         B = B(:,1:(i-1));
1225         Extras = Extras(:,1:(i-1));
1226         break
1227     end
1228     
1229     % Halt if we have exceeded a threshold on the percent of
1230     % null deviance left unexplained.
1231     if ~(userSuppliedLambda || isempty(nullDev))
1232         if extras.Deviance < 1.0e-3 * nullDev
1233             lambda = lambda(1:i);
1234             B = B(:,1:i);
1235             Extras = Extras(:,1:i);
1236             break
1237         end
1238     end
1239     
1240 end % of lambda sequence
1241 
1242 % ------------------------------------------
1243 % Unwind the centering and scaling (if any)
1244 % ------------------------------------------
1245 
1246 B = bsxfun(@rdivide, B, sigmaX');
1247 B(~ever_active,:) = 0;
1248 
1249 switch regressionType
1250     case 'OLS'
1251         Intercept = muY-muX*B;
1252     case 'GLM'
1253         Intercept = zeros(1,length(lambda));
1254         for i=1:length(lambda)
1255             Intercept(i) = Extras(i).Intercept;
1256         end
1257         if isempty(lambda)
1258             Intercept = [];
1259         else
1260             Intercept = Intercept - muX*B;
1261         end
1262 end
1263 
1264 % ------------------------------------------
1265 % Calculate Mean Prediction Squared Error (or other GoF)
1266 % ------------------------------------------
1267 
1268 switch regressionType
1269     case 'OLS'
1270         Intercept = muY-muX*B;
1271         BwithI = [Intercept; B];
1272         fits = [ones(size(X,1),1) X]*BwithI;
1273         residuals = bsxfun(@minus, Y, fits);
1274         if ~observationWeights
1275             mspe = mean(residuals.^2);
1276         else
1277             % This line relies on the weights having been normalized.
1278             mspe = weights * (residuals.^2);
1279         end        
1280         varargout{1} = mspe;
1281     case 'GLM'
1282         deviance = zeros(1,length(lambda));
1283         for i=1:length(lambda)
1284             deviance(i) = Extras(i).Deviance;
1285         end
1286         if isempty(lambda)
1287             deviance = [];
1288         end
1289         varargout{1} = deviance;
1290 end
1291 
1292 end %-lassoFit
1293 
1294 % ===================================================
1295 %                 thresholdScreen()
1296 % ===================================================
1297 
1298 function potentially_active = thresholdScreen(X0, wX0, Y0, ...
1299     b, active, threshold)
1300 r = Y0 - X0(:,active)*b(active,:);
1301 % We don't need the (b.*wX2)' term that one might expect, because it
1302 % is zero for the inactive predictors.
1303 potentially_active = abs(r' *wX0) > threshold;
1304 end %-thresholdScreen
1305 
1306 % ===================================================
1307 %                 cdescentCycleNewCandidates()
1308 % ===================================================
1309 
1310 function [b,active,wX2,wX2calculated,shrinkFactor] = ...
1311     cdescentCycleNewCandidates(X0, weights, wX0, wX2, wX2calculated, Y0, ...
1312     b, active, shrinkFactor, threshold, candidates)
1313 %
1314 r = Y0 - X0(:,active)*b(active,:);
1315 bold = b;
1316 
1317 for j=find(candidates);
1318     % Regress j-th partial residuals on j-th predictor
1319     bj = wX0(:,j)' * r;
1320     
1321     margin = abs(bj) - threshold;
1322     
1323     % Soft thresholding
1324     if margin > 0
1325         if ~wX2calculated(j)
1326             wX2(j) = weights * X0(:,j).^2;
1327             wX2calculated(j) = true;
1328             shrinkFactor(j) = wX2(j) + shrinkFactor(j);
1329         end
1330         
1331         b(j) = sign(bj) .* margin ./ shrinkFactor(j);
1332 
1333         active(j) = true;
1334     end
1335     
1336     r = r - X0(:,j)*(b(j)-bold(j));
1337 end
1338  
1339 end %-cdescentCycleNewCandidates
1340 
1341 % ===================================================
1342 %                 cdescentCycleNoRecalc()
1343 % ===================================================
1344 
1345 function [b,active] = ...
1346     cdescentCycleNoRecalc(X0, wX0, wX2, Y0, b, active, shrinkFactor, threshold)
1347 %
1348 r = Y0 - X0(:,active)*b(active,:);
1349 bwX2 = b.*wX2;
1350 bold = b;
1351 
1352 for j=find(active);
1353     % Regress j-th partial residuals on j-th predictor
1354     bj = wX0(:,j)' * r + bwX2(j);
1355     
1356     margin = abs(bj) - threshold;
1357     
1358     % Soft thresholding
1359     if margin > 0
1360         b(j) = sign(bj) .* margin ./ shrinkFactor(j);
1361     else
1362         b(j) = 0;
1363         active(j) = false;
1364     end
1365     
1366     r = r - X0(:,j)*(b(j)-bold(j));
1367 end
1368  
1369 end %-cdescentCycleNoRecalc
1370 
1371 % ===================================================
1372 %                    penalizedWls()
1373 % ===================================================
1374 
1375 function [b,varargout] = ...
1376     penalizedWls(X,Y,b,active,weights,lambda,alpha,reltol)
1377 
1378     weights = weights(:)';
1379     
1380     [~,P] = size(X);
1381 
1382     wX = bsxfun(@times,X,weights');
1383     
1384     wX2 = zeros(P,1);
1385     wX2(active) = (weights * X(:,active).^2)';
1386     wX2calculated = active;
1387     
1388     threshold = lambda * alpha;
1389     
1390     shrinkFactor = wX2 + lambda * (1-alpha);
1391 
1392     % Iterative coordinate descent until converged
1393     while true
1394         
1395         bold = b;
1396         old_active = active;
1397 
1398         [b,active] = cdescentCycleNoRecalc(X,wX,wX2,Y, b,active,shrinkFactor,threshold);
1399 
1400         if ~any( abs(b(old_active) - bold(old_active)) > reltol * max(1.0,abs(bold(old_active))) )
1401             % Cycling over the active set converged.
1402             % Do one full pass through the predictors.
1403             % If there is no predictor added to the active set, we're done.
1404             % Otherwise, resume the coordinate descent iterations.
1405             bold = b;
1406             potentially_active = thresholdScreen(X,wX,Y,b,active,threshold);
1407             new_candidates = potentially_active & ~active;
1408             if any(new_candidates)
1409                 [b,new_active,wX2,wX2calculated,shrinkFactor] = ...
1410                     cdescentCycleNewCandidates(X,weights,wX,wX2,wX2calculated,Y, ...
1411                     b,active,shrinkFactor,threshold,new_candidates);
1412             else
1413                 new_active = active;
1414             end
1415 
1416             if isequal(new_active, active)
1417                 break
1418             else
1419                 super_active = active | new_active;
1420                 if ~any( abs(b(super_active) - bold(super_active)) > reltol * max(1.0,abs(bold(super_active))) )
1421                     % We didn't change the coefficients enough by the full pass
1422                     % through the predictors to warrant continuing the iterations.
1423                     % The active coefficients after this pass and the
1424                     % active coefficients prior to this pass differ.
1425                     % This implies that a coefficient changed between zero
1426                     % and a small numeric value.  There is no "fit-wise" reason
1427                     % to prefer the before and after coefficients, so we
1428                     % choose the more parsimonious alternative.
1429                     if sum(new_active) > sum(active)
1430                         b = bold;
1431                     else
1432                         active = new_active;
1433                     end
1434                     break
1435                 else
1436                     active = new_active;
1437                 end
1438             end
1439         end
1440         
1441     end
1442     
1443     varargout{1} = active;
1444         
1445 end %-penalizedWls()
1446  
1447 % ===================================================
1448 %                    glmPenalizedWlsWrapper()
1449 % ===================================================
1450 
1451 function [b,varargout] = glmPenalizedWlsWrapper(X,Y,b,active,weights, ...
1452     lambda,alpha,reltol,ever_active)
1453 
1454 % The coordinate descent in penalizedWls() assumes centered X and Y,
1455 % and it assumes X does NOT have a column of ones.
1456 
1457 X0 = X(:,2:end);
1458 
1459 weights = weights(:)';
1460 
1461 normedWeights = weights / sum(weights);
1462 
1463 % Center X
1464 muX = normedWeights * X0;
1465 X0 = bsxfun(@minus,X0,muX);
1466 
1467 % Center Y.
1468 muY = normedWeights * Y;
1469 Y = Y - muY;
1470 
1471 [bPredictors,varargout{1}] = penalizedWls(X0, Y, b(2:end), ...
1472     active,weights,lambda,alpha,reltol);
1473 
1474 bPredictors(~ever_active,:) = 0;
1475 
1476 % Since X and Y were centered for the WLS we can calculate the intercept
1477 % after the fact.
1478 Intercept = muY-muX*bPredictors;
1479 b = [Intercept; bPredictors];
1480 
1481 end %-glmPenalizedWlsWrapper()
1482 
1483 function [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, weights, alpha, standardize, ...
1484     distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun)
1485 
1486 % lambdaMax is the penalty term (lambda) beyond which coefficients
1487 % are guaranteed to be all zero: there is no benefit to calculating
1488 % penalized fits with lambda > lambdaMax.
1489 %
1490 % nullDev is the deviance of the fit using just a constant term.
1491 %
1492 % The input parameter 'devFun' is used only as a sanity check to see if glmfit
1493 % gets a plausible fit with intercept term only.
1494 
1495 % Head off potential cruft in the command window.
1496 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned');
1497 wsIterationLimit = warning('off','stats:glmfit:IterationLimit');
1498 wsPerfectSeparation = warning('off','stats:glmfit:PerfectSeparation');
1499 wsBadScaling = warning('off','stats:glmfit:BadScaling');
1500 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2));
1501 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit));
1502 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation));
1503 cleanupBadScaling = onCleanup(@() warning(wsBadScaling));
1504 
1505 if ~isempty(weights)
1506     observationWeights = true;
1507     weights = weights(:)';        
1508     % Normalized weights are used for standardization and calculating lambdaMax.
1509     normalizedweights = weights / sum(weights);
1510 else
1511     observationWeights = false;
1512 end
1513 
1514 [N,~] = size(X);
1515 
1516 % If we were asked to standardize the predictors, do so here because
1517 % the calculation of lambdaMax needs the predictors as we will use
1518 % them to perform fits.
1519 
1520 if standardize
1521     % If X has any constant columns, we want to protect against
1522     % divide-by-zero in normalizing variances.
1523     constantPredictors = (range(X)==0);
1524 
1525     if ~observationWeights
1526         % Center and scale
1527         [X0,~,~] = zscore(X,1);
1528     else
1529         % Weighted center and scale
1530         muX = normalizedweights * X;
1531         X0 = bsxfun(@minus,X,muX);
1532         sigmaX = sqrt( normalizedweights * (X0.^2) );
1533         % Avoid divide by zero with constant predictors
1534         sigmaX(constantPredictors) = 1;
1535         X0 = bsxfun(@rdivide, X0, sigmaX);
1536     end
1537 else
1538     if ~observationWeights
1539         % Center
1540         muX = mean(X,1);
1541         X0 = bsxfun(@minus,X,muX);
1542     else
1543         % Weighted center
1544         muX = normalizedweights(:)' * X;
1545         X0 = bsxfun(@minus,X,muX);
1546     end
1547 end
1548 
1549 constantTerm = ones(length(Y),1);
1550 if isscalar(offset)
1551     [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ...
1552         'link',link, 'weights',weights);
1553     predictedMu = glmval(coeffs,constantTerm,link,'constant','off');
1554 else
1555     [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ...
1556         'link',link,'weights',weights,'offset',offset);
1557     predictedMu = glmval(coeffs,constantTerm,link,'constant','off','offset',offset);
1558 end
1559 
1560 nullIntercept = coeffs;
1561 
1562 % Sanity check. With badly matched link / distr / data, glmfit may not
1563 % have been able to converge to a reasonble estimate.  If so, the inputs
1564 % to lassoglm may constitute a difficult problem formulation with
1565 % unsatisfactory maximum likelihood solution.  Poor formulations
1566 % have been observed with mismatched links (ie, 'reciprocal' link with
1567 % the Poisson distribution, in place of canonical 'log').  As a screen for
1568 % this contingency, calculate the deviance we would get using the scalar
1569 % unmodeled mean for the response data. Call this quantity "muDev".
1570 % The value of muDev should be no better than the nullDev calculated
1571 % by glmfit above (albeit it might be equal or nearly equal).
1572 % If the value is better, warn that the computations are of uncertain validity.
1573 
1574 if observationWeights
1575     muDev = weights * devFun(mean(Y)*ones(length(Y),1), Y);
1576 else
1577     muDev = sum(devFun(mean(Y)*ones(length(Y),1), Y));
1578 end
1579 if (muDev - nullDev) / max([1.0 muDev nullDev]) < - 1.0e-4
1580     [~, lastid] = lastwarn;
1581     if strcmp(lastid,'stats:glmfit:BadScaling')
1582         % This reassignment of predicted values is not guaranteed to
1583         % improve matters, but is an attempt to generate a workable
1584         % sequence of lambda values. Note: Since it is a constant
1585         % value for all observations, observation weights are a wash.
1586         predictedMu = mean(Y)*ones(length(Y),1);
1587         warning(message('stats:lassoGlm:DifficultLikelihood'));
1588     end
1589 end
1590 
1591 if ~isCanonical
1592     X0 = bsxfun( @times, X0, dlinkFunCanonical(predictedMu) ./ dlinkFun(predictedMu) );
1593 end
1594 
1595 if ~observationWeights
1596     dotp = abs(X0' * (Y - predictedMu));
1597     lambdaMax = max(dotp) / (N*alpha);
1598 else
1599     wX0 = bsxfun(@times, X0, normalizedweights');
1600     dotp = abs(sum(bsxfun(@times, wX0, (Y - predictedMu))));
1601     lambdaMax = max(dotp) / alpha;
1602 end
1603 
1604 end %-computeLambdaMax()
1605 
1606 function lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault)
1607 
1608 % Fill in the log-spaced sequence of lambda values.
1609         
1610 if nLambda==1
1611     lambda = lambdaMax;
1612 else
1613     % Fill in a number "nLambda" of smaller values, on a log scale.
1614     if lambdaRatio==0
1615         lambdaRatio = LRdefault;
1616         addZeroLambda = true;
1617     else
1618         addZeroLambda = false;
1619     end
1620     lambdaMin = lambdaMax * lambdaRatio;
1621     loghi = log(lambdaMax);
1622     loglo = log(lambdaMin);
1623     logrange = loghi - loglo;
1624     interval = -logrange/(nLambda-1);
1625     lambda = exp(loghi:interval:loglo)';
1626     if addZeroLambda
1627         lambda(end) = 0;
1628     else
1629         lambda(end) = lambdaMin;
1630     end
1631 end
1632 
1633 end %-computeLambdaSequence
1634
lassoglm

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SUBFUNCTIONS

SOURCE CODE