Home > SuperSegger > trainingConstants > lassoglm.m

lassoglm

PURPOSE ^

copied from statistics toolbox - added display line

SYNOPSIS ^

function [B,stats] = lassoglm(x,y,distr,varargin)

DESCRIPTION ^

 copied from statistics toolbox - added display line
LASSOGLM Perform lasso or elastic net regularization for a generalized linear model.
   [B,STATS] = LASSOGLM(X,Y,DISTR,...) Performs L1-penalized maximum likelihood 
   fits (lasso) relating the predictors in X to the responses in Y, or 
   fits subject to joint L1- and L2 constraints (elastic net).
   The default is a lasso-style fit, that is, a maximum likelihood fit
   subject to a constraint on the L1-norm of the coefficients B.

   LASSOGLM accepts all the command line parameters of the LASSO function 
   and it accepts command line parameters of the GLMFIT function, with 
   the following exceptions. LASSOGLM also accepts the arguments 'Link'
   and 'Offset' of GLMFIT (they can also be lower case 'link' or 'offset').  
   LASSOGLM It does not accept the argument 'estdisp' of GLMFIT, or the 
   argument 'constant'.  LASSOGLM does not calculate standard errors
   or covariances among the coefficients, as GLMFIT does.
   LASSOGLM always calculates an Intercept term. 
   
   Positional parameters:

     X                A numeric matrix (dimension, say, NxP)
     Y                A numeric vector of length N.  If DISTR is
                      'binomial', Y may be a binary vector indicating
                      success/failure, and the total number of trials is
                      taken to be 1 for all observations.  If DISTR is
                      'binomial', Y may also be a two column matrix, the 
                      first column containing the number of successes for
                      each observation, and the second column containing
                      the total number of trials.
     DISTR            The distributional family for the non-systematic
                      variation in the responses.  Acceptable values for DISTR are
                      'normal', 'binomial', 'poisson', 'gamma', and 'inverse gaussian'.  
                      By default, the distribution is fit using the canonical link 
                      corresponding to DISTR.  Other link functions may be
                      specified using the optional parameter 'link'.
   
   Optional input parameters:  

     'Weights'        Observation weights.  Must be a vector of non-negative
                      values, of the same length as columns of X.  At least
                      two values must be positive. (default ones(N,1) or 
                      equivalently (1/N)*ones(N,1)).
     'Alpha'          Elastic net mixing value, or the relative balance
                      between L2 and L1 penalty (default 1, range (0,1]).
                      Alpha=1 ==> lasso, otherwise elastic net.
                      Alpha near zero ==> nearly ridge regression.
     'NumLambda'      The number of lambda values to use, if the parameter
                      'Lambda' is not supplied (default 100). Ignored
                      if 'Lambda' is supplied. LASSOGLM may return fewer
                      fits than specified by 'NumLambda' if the deviance
                      of the fits drops below a threshold percentage of
                      the null deviance (deviance of the fit without
                      any predictors X).
     'LambdaRatio'    Ratio between the minimum value and maximum value of
                      lambda to generate, if the  parameter "Lambda" is not 
                      supplied.  Legal range is [0,1). Default is 0.0001.
                      If 'LambdaRatio' is zero, LASSOGLM will generate its
                      default sequence of lambda values but replace the
                      smallest value in this sequence with the value zero.
                      'LambdaRatio' is ignored if 'Lambda' is supplied.
     'Lambda'         Lambda values. Will be returned in return argument
                      STATS in ascending order. The default is to have
                      LASSOGLM generate a sequence of lambda values, based 
                      on 'NumLambda' and 'LambdaRatio'. LASSOGLM will generate 
                      a sequence, based on the values in X and Y, such that 
                      the largest Lambda value is estimated to be just 
                      sufficient to produce all zero coefficients B. 
                      You may supply a vector of real, non-negative values 
                      of lambda for LASSOGLM to use, in place of its default
                      sequence.  If you supply a value for 'Lambda', 
                      'NumLambda' and 'LambdaRatio' are ignored.
     'DFmax'          Maximum number of non-zero coefficients in the model.
                      Can be useful with large numbers of predictors.
                      Results only for lambda values that satisfy this
                      degree of sparseness will be returned.  Default is
                      to not limit the number of non-zero coefficients.
     'Standardize'    Whether to scale X prior to fitting the model
                      sequence. This affects whether the regularization is
                      applied to the coefficients on the standardized
                      scale or the original scale. The results are always
                      presented on the original data scale. Default is
                      TRUE, do scale X.
     'RelTol'         Convergence threshold for coordinate descent algorithm.
                      The coordinate descent iterations will terminate
                      when the relative change in the size of the
                      estimated coefficients B drops below this threshold.
                      Default: 1e-4. Legal range is (0,1).
     'CV'             If present, indicates the method used to compute Deviance.
                      When 'CV' is a positive integer K, LASSOGLM uses K-fold
                      cross-validation.  Set 'CV' to a cross-validation 
                      partition, created using CVPARTITION, to use other
                      forms of cross-validation. You cannot use a
                      'Leaveout' partition with LASSOGLM.                
                      When 'CV' is 'resubstitution', LASSOGLM uses X and Y 
                      both to fit the model and to estimate the deviance 
                      of the fitted model, without cross-validation.  
                      The default is 'resubstitution'.
     'MCReps'         A positive integer indicating the number of Monte-Carlo
                      repetitions for cross-validation.  The default value is 1.
                      If 'CV' is 'resubstitution' or a cvpartition of type
                      'resubstitution', 'MCReps' must be 1.  If 'CV' is a
                      cvpartition of type 'holdout', then 'MCReps' must be
                      greater than one.
     'PredictorNames' A cell array of names for the predictor variables,
                      in the order in which they appear in X. 
                      Default: {}
     'Options'        A structure that contains options specifying whether to
                      conduct cross-validation evaluations in parallel, and
                      options specifying how to use random numbers when computing
                      cross validation partitions. This argument can be created
                      by a call to STATSET. CROSSVAL uses the following fields:
                        'UseParallel'
                        'UseSubstreams'
                        'Streams'
                      For information on these fields see PARALLELSTATS.
                      NOTE: If supplied, 'Streams' must be of length one.
     'Link'           The link function to use in place of the canonical link.
                      The link function defines the relationship f(mu) = x*b
                      between the mean response mu and the linear combination of
                      predictors x*b.  Specify the link parameter value as one of
                      - the text strings 'identity', 'log', 'logit', 'probit',
                        'comploglog', 'reciprocal', 'loglog', or
                      - an exponent P defining the power link, mu = (x*b)^P 
                        for x*b > 0, or
                      - a cell array of the form {FL FD FI}, containing three
                        function handles, created using @, that define the link (FL),
                        the derivative of the link (FD), and the inverse link (FI).  
     'Offset'         A vector to use as an additional predictor variable, but
                      with a coefficient value fixed at 1.0.  Default is
                      not to utilize an offset variable.
 
   Return values:
     B                The fitted coefficients for each model. 
                      B will have dimension PxL, where 
                      P = size(X,2) is the number of predictors, and
                      L = length(STATS.Lambda).
     STATS            STATS is a struct that contains information about the
                      sequence of model fits corresponding to the columns
                      of B. STATS contains the following fields:

       'Intercept'    The intercept term for each model. Dimension 1xL.
       'Lambda'       The sequence of lambda penalties used, in ascending order. 
                      Dimension 1xL.
       'Alpha'        The elastic net mixing value that was used.
       'DF'           The number of nonzero coefficients in B for each
                      value of lambda. Dimension 1xL.
       'Deviance'     The deviance of the fitted model for each value of
                      lambda. If cross-validation was performed, the values 
                      for 'Deviance' represent the estimated expected 
                      deviance of the model applied to new data, as 
                      calculated by cross-validation. Otherwise, 
                      'Deviance' is the deviance of the fitted model 
                      applied to the data used to perform the fit. 
                      Dimension 1xL.

     If cross-validation was performed, STATS also includes the following
     fields:

       'SE'                The standard error of 'Deviance' for each lambda, as
                           calculated during cross-validation. Dimension 1xL.
       'LambdaMinDeviance' The lambda value with minimum expected deviance, as 
                           calculated during cross-validation. Scalar.
       'Lambda1SE'         The largest lambda such that 'Deviance' is within 
                           one standard error of the minimum. Scalar.
       'IndexMinDeviance'  The index of Lambda with value LambdaMinMSE. Scalar.
       'Index1SE'          The index of Lambda with value Lambda1SE. Scalar.

   See also lassoPlot, lasso, ridge, parallelstats, glmfit.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [B,stats] = lassoglm(x,y,distr,varargin)
0002 % copied from statistics toolbox - added display line
0003 %LASSOGLM Perform lasso or elastic net regularization for a generalized linear model.
0004 %   [B,STATS] = LASSOGLM(X,Y,DISTR,...) Performs L1-penalized maximum likelihood
0005 %   fits (lasso) relating the predictors in X to the responses in Y, or
0006 %   fits subject to joint L1- and L2 constraints (elastic net).
0007 %   The default is a lasso-style fit, that is, a maximum likelihood fit
0008 %   subject to a constraint on the L1-norm of the coefficients B.
0009 %
0010 %   LASSOGLM accepts all the command line parameters of the LASSO function
0011 %   and it accepts command line parameters of the GLMFIT function, with
0012 %   the following exceptions. LASSOGLM also accepts the arguments 'Link'
0013 %   and 'Offset' of GLMFIT (they can also be lower case 'link' or 'offset').
0014 %   LASSOGLM It does not accept the argument 'estdisp' of GLMFIT, or the
0015 %   argument 'constant'.  LASSOGLM does not calculate standard errors
0016 %   or covariances among the coefficients, as GLMFIT does.
0017 %   LASSOGLM always calculates an Intercept term.
0018 %
0019 %   Positional parameters:
0020 %
0021 %     X                A numeric matrix (dimension, say, NxP)
0022 %     Y                A numeric vector of length N.  If DISTR is
0023 %                      'binomial', Y may be a binary vector indicating
0024 %                      success/failure, and the total number of trials is
0025 %                      taken to be 1 for all observations.  If DISTR is
0026 %                      'binomial', Y may also be a two column matrix, the
0027 %                      first column containing the number of successes for
0028 %                      each observation, and the second column containing
0029 %                      the total number of trials.
0030 %     DISTR            The distributional family for the non-systematic
0031 %                      variation in the responses.  Acceptable values for DISTR are
0032 %                      'normal', 'binomial', 'poisson', 'gamma', and 'inverse gaussian'.
0033 %                      By default, the distribution is fit using the canonical link
0034 %                      corresponding to DISTR.  Other link functions may be
0035 %                      specified using the optional parameter 'link'.
0036 %
0037 %   Optional input parameters:
0038 %
0039 %     'Weights'        Observation weights.  Must be a vector of non-negative
0040 %                      values, of the same length as columns of X.  At least
0041 %                      two values must be positive. (default ones(N,1) or
0042 %                      equivalently (1/N)*ones(N,1)).
0043 %     'Alpha'          Elastic net mixing value, or the relative balance
0044 %                      between L2 and L1 penalty (default 1, range (0,1]).
0045 %                      Alpha=1 ==> lasso, otherwise elastic net.
0046 %                      Alpha near zero ==> nearly ridge regression.
0047 %     'NumLambda'      The number of lambda values to use, if the parameter
0048 %                      'Lambda' is not supplied (default 100). Ignored
0049 %                      if 'Lambda' is supplied. LASSOGLM may return fewer
0050 %                      fits than specified by 'NumLambda' if the deviance
0051 %                      of the fits drops below a threshold percentage of
0052 %                      the null deviance (deviance of the fit without
0053 %                      any predictors X).
0054 %     'LambdaRatio'    Ratio between the minimum value and maximum value of
0055 %                      lambda to generate, if the  parameter "Lambda" is not
0056 %                      supplied.  Legal range is [0,1). Default is 0.0001.
0057 %                      If 'LambdaRatio' is zero, LASSOGLM will generate its
0058 %                      default sequence of lambda values but replace the
0059 %                      smallest value in this sequence with the value zero.
0060 %                      'LambdaRatio' is ignored if 'Lambda' is supplied.
0061 %     'Lambda'         Lambda values. Will be returned in return argument
0062 %                      STATS in ascending order. The default is to have
0063 %                      LASSOGLM generate a sequence of lambda values, based
0064 %                      on 'NumLambda' and 'LambdaRatio'. LASSOGLM will generate
0065 %                      a sequence, based on the values in X and Y, such that
0066 %                      the largest Lambda value is estimated to be just
0067 %                      sufficient to produce all zero coefficients B.
0068 %                      You may supply a vector of real, non-negative values
0069 %                      of lambda for LASSOGLM to use, in place of its default
0070 %                      sequence.  If you supply a value for 'Lambda',
0071 %                      'NumLambda' and 'LambdaRatio' are ignored.
0072 %     'DFmax'          Maximum number of non-zero coefficients in the model.
0073 %                      Can be useful with large numbers of predictors.
0074 %                      Results only for lambda values that satisfy this
0075 %                      degree of sparseness will be returned.  Default is
0076 %                      to not limit the number of non-zero coefficients.
0077 %     'Standardize'    Whether to scale X prior to fitting the model
0078 %                      sequence. This affects whether the regularization is
0079 %                      applied to the coefficients on the standardized
0080 %                      scale or the original scale. The results are always
0081 %                      presented on the original data scale. Default is
0082 %                      TRUE, do scale X.
0083 %     'RelTol'         Convergence threshold for coordinate descent algorithm.
0084 %                      The coordinate descent iterations will terminate
0085 %                      when the relative change in the size of the
0086 %                      estimated coefficients B drops below this threshold.
0087 %                      Default: 1e-4. Legal range is (0,1).
0088 %     'CV'             If present, indicates the method used to compute Deviance.
0089 %                      When 'CV' is a positive integer K, LASSOGLM uses K-fold
0090 %                      cross-validation.  Set 'CV' to a cross-validation
0091 %                      partition, created using CVPARTITION, to use other
0092 %                      forms of cross-validation. You cannot use a
0093 %                      'Leaveout' partition with LASSOGLM.
0094 %                      When 'CV' is 'resubstitution', LASSOGLM uses X and Y
0095 %                      both to fit the model and to estimate the deviance
0096 %                      of the fitted model, without cross-validation.
0097 %                      The default is 'resubstitution'.
0098 %     'MCReps'         A positive integer indicating the number of Monte-Carlo
0099 %                      repetitions for cross-validation.  The default value is 1.
0100 %                      If 'CV' is 'resubstitution' or a cvpartition of type
0101 %                      'resubstitution', 'MCReps' must be 1.  If 'CV' is a
0102 %                      cvpartition of type 'holdout', then 'MCReps' must be
0103 %                      greater than one.
0104 %     'PredictorNames' A cell array of names for the predictor variables,
0105 %                      in the order in which they appear in X.
0106 %                      Default: {}
0107 %     'Options'        A structure that contains options specifying whether to
0108 %                      conduct cross-validation evaluations in parallel, and
0109 %                      options specifying how to use random numbers when computing
0110 %                      cross validation partitions. This argument can be created
0111 %                      by a call to STATSET. CROSSVAL uses the following fields:
0112 %                        'UseParallel'
0113 %                        'UseSubstreams'
0114 %                        'Streams'
0115 %                      For information on these fields see PARALLELSTATS.
0116 %                      NOTE: If supplied, 'Streams' must be of length one.
0117 %     'Link'           The link function to use in place of the canonical link.
0118 %                      The link function defines the relationship f(mu) = x*b
0119 %                      between the mean response mu and the linear combination of
0120 %                      predictors x*b.  Specify the link parameter value as one of
0121 %                      - the text strings 'identity', 'log', 'logit', 'probit',
0122 %                        'comploglog', 'reciprocal', 'loglog', or
0123 %                      - an exponent P defining the power link, mu = (x*b)^P
0124 %                        for x*b > 0, or
0125 %                      - a cell array of the form {FL FD FI}, containing three
0126 %                        function handles, created using @, that define the link (FL),
0127 %                        the derivative of the link (FD), and the inverse link (FI).
0128 %     'Offset'         A vector to use as an additional predictor variable, but
0129 %                      with a coefficient value fixed at 1.0.  Default is
0130 %                      not to utilize an offset variable.
0131 %
0132 %   Return values:
0133 %     B                The fitted coefficients for each model.
0134 %                      B will have dimension PxL, where
0135 %                      P = size(X,2) is the number of predictors, and
0136 %                      L = length(STATS.Lambda).
0137 %     STATS            STATS is a struct that contains information about the
0138 %                      sequence of model fits corresponding to the columns
0139 %                      of B. STATS contains the following fields:
0140 %
0141 %       'Intercept'    The intercept term for each model. Dimension 1xL.
0142 %       'Lambda'       The sequence of lambda penalties used, in ascending order.
0143 %                      Dimension 1xL.
0144 %       'Alpha'        The elastic net mixing value that was used.
0145 %       'DF'           The number of nonzero coefficients in B for each
0146 %                      value of lambda. Dimension 1xL.
0147 %       'Deviance'     The deviance of the fitted model for each value of
0148 %                      lambda. If cross-validation was performed, the values
0149 %                      for 'Deviance' represent the estimated expected
0150 %                      deviance of the model applied to new data, as
0151 %                      calculated by cross-validation. Otherwise,
0152 %                      'Deviance' is the deviance of the fitted model
0153 %                      applied to the data used to perform the fit.
0154 %                      Dimension 1xL.
0155 %
0156 %     If cross-validation was performed, STATS also includes the following
0157 %     fields:
0158 %
0159 %       'SE'                The standard error of 'Deviance' for each lambda, as
0160 %                           calculated during cross-validation. Dimension 1xL.
0161 %       'LambdaMinDeviance' The lambda value with minimum expected deviance, as
0162 %                           calculated during cross-validation. Scalar.
0163 %       'Lambda1SE'         The largest lambda such that 'Deviance' is within
0164 %                           one standard error of the minimum. Scalar.
0165 %       'IndexMinDeviance'  The index of Lambda with value LambdaMinMSE. Scalar.
0166 %       'Index1SE'          The index of Lambda with value Lambda1SE. Scalar.
0167 %
0168 %   See also lassoPlot, lasso, ridge, parallelstats, glmfit.
0169 
0170 %   References:
0171 %   [1] Tibshirani, R. (1996) Regression shrinkage and selection
0172 %       via the lasso. Journal of the Royal Statistical Society,
0173 %       Series B, Vol 58, No. 1, pp. 267-288.
0174 %   [2] Zou, H. and T. Hastie. (2005) Regularization and variable
0175 %       selection via the elastic net. Journal of the Royal Statistical
0176 %       Society, Series B, Vol. 67, No. 2, pp. 301-320.
0177 %   [3] Friedman, J., R. Tibshirani, and T. Hastie. (2010) Regularization
0178 %       paths for generalized linear models via coordinate descent.
0179 %       Journal of Statistical Software, Vol 33, No. 1,
0180 %       http://www.jstatsoft.org/v33/i01.
0181 %   [4] Hastie, T., R. Tibshirani, and J. Friedman. (2008) The Elements
0182 %       of Statistical Learning, 2nd edition, Springer, New York.
0183 %   [5] Dobson, A.J. (2002) An Introduction to Generalized Linear
0184 %       Models, 2nd edition, Chapman&Hall/CRC Press.
0185 %   [6] McCullagh, P., and J.A. Nelder (1989) Generalized Linear
0186 %       Models, 2nd edition, Chapman&Hall/CRC Press.
0187 %   [7] Collett, D. (2003) Modelling Binary Data, 2nd edition,
0188 %       Chapman&Hall/CRC Press.
0189 
0190 %   Copyright 2011-2014 The MathWorks, Inc.
0191 
0192 
0193 if nargin < 2
0194     error(message('stats:lassoGlm:TooFewInputs'));
0195 end
0196 
0197 if nargin < 3 || isempty(distr), distr = 'normal'; end
0198 
0199 paramNames = {     'link' 'offset' 'weights'};
0200 paramDflts = {'canonical'  []       []};
0201 [link,offset,pwts,~,varargin] = ...
0202                     internal.stats.parseArgs(paramNames, paramDflts, varargin{:});
0203                 
0204 % Read in the optional parameters pertinent to regularization (eg, lasso)
0205 LRdefault = 1e-4;
0206 pnames = { 'alpha' 'numlambda' 'lambdaratio' 'lambda' ...
0207     'dfmax' 'standardize' 'reltol' 'cv' 'mcreps' ...
0208     'predictornames' 'options' };
0209 dflts  = {  1       100       LRdefault     []      ...
0210      []      true          1e-4    'resubstitution'  1 ...
0211      {}               []};
0212 [alpha, nLambda, lambdaRatio, lambda, ...
0213     dfmax, standardize, reltol, cvp, mcreps, predictorNames, parallelOptions] ...
0214      = internal.stats.parseArgs(pnames, dflts, varargin{:});
0215  
0216 if ~isempty(lambda)
0217     userSuppliedLambda = true;
0218 else
0219     userSuppliedLambda = false;
0220 end
0221 
0222 % X a real 2D matrix
0223 if ~ismatrix(x) || length(size(x)) ~= 2 || ~isreal(x)
0224     error(message('stats:lassoGlm:XnotaReal2DMatrix'));
0225 end
0226 
0227 % We need at least two observations.
0228 if isempty(x) || size(x,1) < 2
0229     error(message('stats:lassoGlm:TooFewObservations'));
0230 end
0231 
0232 % Categorical responses 'binomial'
0233 if isa(y,'categorical')
0234     [y, classname] = grp2idx(y); 
0235     nc = length(classname);
0236     if nc > 2
0237         error(message('stats:glmfit:TwoLevelCategory'));
0238     end
0239     y(y==1) = 0;
0240     y(y==2) = 1;
0241 end
0242 
0243 % Number of Predictors
0244 P = size(x,2);
0245 
0246 % Head off potential cruft in the command window.
0247 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned');
0248 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2));
0249 
0250 % Sanity checking on predictors, responses, weights and offset parameter,
0251 % and removal of NaNs and Infs from same. Also, conversion of the
0252 % two-column form of binomial response to a proportion.
0253 [X, Y, offset, pwts, dataClass, nTrials, binomialTwoColumn] = ...
0254     glmProcessData(x, y, distr, 'off', offset, pwts);
0255 
0256 [~,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,isCanonical,dlinkFunCanonical] = ...
0257     glmProcessDistrAndLink(Y,distr,link,'off',nTrials,dataClass);
0258 
0259 [X,Y,pwts,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ...
0260     processLassoParameters(X,Y,pwts,alpha,nLambda,lambdaRatio,lambda,dfmax, ...
0261     standardize,reltol,cvp,mcreps,predictorNames);
0262 
0263 % Compute the amount of penalty at which all coefficients shrink to zero.
0264 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, pwts, alpha, standardize, ...
0265     distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun);
0266 
0267 % If the command line did not provide a sequence of penalty terms,
0268 % generate a sequence.
0269 if isempty(lambda)
0270     lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault);
0271 end
0272 
0273 % Also, to help control saturated fits with binomial outcomes,
0274 % also alter the glmfit values for muLims
0275 if strcmp(distr,'binomial')
0276     muLims = [1.0e-5 1.0-1.0e-5];
0277 end
0278 
0279 % Will convert from lambda penalty matched to average log-likelihood
0280 % per observation (which is what lasso uses) to an equivalent
0281 % penalty for total log-likelihood (which is what glmIRLS uses.)
0282 if isempty(pwts) && isscalar(nTrials)
0283     totalWeight = size(X,1);
0284 elseif ~isempty(pwts) && isscalar(nTrials)
0285     totalWeight = sum(pwts);
0286 elseif isempty(pwts) && ~isscalar(nTrials)
0287     totalWeight = sum(nTrials);
0288 else
0289     totalWeight = sum(pwts .* nTrials);
0290 end
0291 
0292 % Convert lambda penalty to match total log-likelihood
0293 lambda = lambda * totalWeight;
0294 
0295 penalizedFitPartition = @(x,y,offset,pwts,n,wlsfit,b,active,mu,eta,sqrtvarFun) ...
0296     glmIRLSwrapper(x,y,distr,offset,pwts,dataClass,n, ...
0297     sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims,wlsfit,nullDev,reltol);
0298 
0299 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) ...
0300     penalizedFitPartition(x,y,offset,pwts',nTrials,wlsfit,b,active,mu,eta,sqrtvarFun);
0301 
0302 [B,Intercept,lambda,deviance] = ...
0303     lassoFit(X,Y,pwts,lambda,alpha,dfmax,standardize,reltol,lambdaMax*totalWeight,ever_active, ...
0304     penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept);
0305 
0306 % Store the number of non-zero coefficients for each lambda.
0307 df = sum(B~=0,1);
0308 
0309 % The struct 'stats' will comprise the second return argument.
0310 % Put place holders for ever-present fields to avoid undefined references
0311 % and to secure the order we want in the struct.
0312 stats = struct();
0313 stats.Intercept      = [];
0314 stats.Lambda         = [];
0315 stats.Alpha          = alpha;
0316 stats.DF             = [];
0317 stats.Deviance       = [];
0318 stats.PredictorNames = predictorNames;
0319 
0320 % ---------------------------------------------------------
0321 % If requested, use cross-validation to calculate
0322 % Prediction Mean Squared Error for each lambda.
0323 % ---------------------------------------------------------
0324 
0325 if ~isequal(cvp,'resubstitution')
0326     % Replace dfmax with P, the number of predictors supplied at the
0327     % command line. dfmax might cause one fold to return empty values,
0328     % because no lambda satisfies the dfmax criteria, while other folds
0329     % return a numeric value. The lambda sequence has already been
0330     % pruned by dfmax, if appropriate, in the call to lassoFit above.
0331     cvfun = @(Xtrain,Ytrain,Xtest,Ytest) lassoFitAndPredict( ...
0332         Xtrain,Ytrain,Xtest,Ytest, ...
0333         lambda,alpha,P,standardize,reltol,ever_active, ...
0334         penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ...
0335         isCanonical, dlinkFunCanonical,devFun,dataClass);
0336     weights = pwts;
0337     if isempty(weights)
0338         weights = nan(size(X,1),1);
0339     end
0340     if isempty(offset) || isequal(offset,0)
0341         offset = nan(size(X,1),1);
0342     end
0343     if binomialTwoColumn
0344         response = [nTrials Y];
0345     else
0346         response = Y;
0347     end
0348     cvDeviance = crossval(cvfun,[weights(:) offset(:) X],response, ...
0349         'Partition',cvp,'Mcreps',mcreps,'Options',parallelOptions);
0350     % Scale single-fold deviances up to estimate whole data set deviance
0351     cvDeviance = bsxfun(@times,cvDeviance,repmat((size(X,1) ./ cvp.TestSize)', mcreps, 1));
0352     deviance = mean(cvDeviance);
0353     se = std(cvDeviance) / sqrt(size(cvDeviance,1));
0354     minDeviance = min(deviance);
0355     minIx = find(deviance == minDeviance,1);
0356     lambdaMin = lambda(minIx);
0357     minplus1 = deviance(minIx) + se(minIx);
0358     seIx = find((deviance(1:minIx) <= minplus1),1,'first');
0359     if isempty(seIx)
0360         lambdaSE = [];
0361     else
0362         lambdaSE = lambda(seIx);
0363     end
0364     
0365     % Deposit cross-validation results in struct for return value.
0366     stats.SE                = se;
0367     stats.LambdaMinDeviance = lambdaMin;
0368     stats.Lambda1SE         = lambdaSE;
0369     stats.IndexMinDeviance  = minIx;
0370     stats.Index1SE          = seIx;
0371     
0372     % Convert key lambda values determined by cross-validation
0373     % back to match average log-likelihood per observation.
0374     stats.LambdaMinDeviance = stats.LambdaMinDeviance / totalWeight;
0375     stats.Lambda1SE         = stats.Lambda1SE / totalWeight;
0376 end
0377 
0378 % ------------------------------------------
0379 % Order results by ascending lambda
0380 % ------------------------------------------
0381 
0382 nLambda = length(lambda);
0383 reverseIndices = nLambda:-1:1;
0384 lambda = lambda(reverseIndices);
0385 lambda = reshape(lambda,1,nLambda);
0386 B = B(:,reverseIndices);
0387 Intercept = Intercept(reverseIndices);
0388 df = df(reverseIndices);
0389 deviance = deviance(reverseIndices);
0390 if ~isequal(cvp,'resubstitution')
0391     stats.SE               = stats.SE(reverseIndices);
0392     stats.IndexMinDeviance = nLambda - stats.IndexMinDeviance + 1;
0393     stats.Index1SE         = nLambda - stats.Index1SE + 1;
0394 end
0395 
0396 stats.Intercept = Intercept;
0397 stats.Lambda = lambda;
0398 stats.DF = df;
0399 stats.Deviance = deviance;
0400 
0401 % Convert lambda penalty back to match average deviance per observation.
0402 stats.Lambda = stats.Lambda / totalWeight;
0403 
0404 end %-main block
0405 
0406 % ------------------------------------------
0407 % SUBFUNCTIONS
0408 % ------------------------------------------
0409 
0410 % ===================================================
0411 %                  startingVals()
0412 % ===================================================
0413 
0414 function mu = startingVals(distr,y,N)
0415 % Find a starting value for the mean, avoiding boundary values
0416 switch distr
0417 case 'poisson'
0418     mu = y + 0.25;
0419 case 'binomial'
0420     mu = (N .* y + 0.5) ./ (N + 1);
0421 case {'gamma' 'inverse gaussian'}
0422     mu = max(y, eps(class(y))); % somewhat arbitrary
0423 otherwise
0424     mu = y;
0425 end
0426 end %-startingVals
0427 
0428 % ===================================================
0429 %                diagnoseSeparation()
0430 % ===================================================
0431 
0432 function diagnoseSeparation(eta,y,N)
0433 % Compute sample proportions, sorted by increasing fitted value
0434 [x,idx] = sort(eta);
0435 if ~isscalar(N)
0436     N = N(idx);
0437 end
0438 p = y(idx);
0439 if all(p==p(1))   % all sample proportions are the same
0440     return
0441 end
0442 if x(1)==x(end)   % all fitted probabilities are the same
0443     return
0444 end
0445 
0446 noFront = 0<p(1) && p(1)<1;     % no "front" section as defined below
0447 noEnd = 0<p(end) && p(end)<1;   % no "end" section as defined below
0448 if p(1)==p(end) || (noFront && noEnd)
0449     % No potential for perfect separation if the ends match or neither
0450     % end is perfect
0451     return
0452 end
0453 
0454 % There is at least one observation potentially taking probability 0 or
0455 % 1 at one end or the other with the data sorted by eta. We want to see
0456 % if the data, sorted by eta (x) value, have this form:
0457 %        x(1)<=...<=x(A)  <  x(A+1)=...=x(B-1)  <  x(B)<=...<=x(n)
0458 % with   p(1)=...=p(A)=0                           p(B)=...=p(n)=1
0459 % or     p(1)=...=p(A)=1                           p(B)=...=p(n)=0
0460 %
0461 % This includes the possibilities:
0462 %     A+1=B  - no middle section
0463 %     A=0    - no perfect fit at the front
0464 %     B=n+1  - no perfect fit at the end
0465 dx = 100*max(eps(x(1)),eps(x(end)));
0466 n = length(p);
0467 if noFront
0468     A = 0;
0469 else
0470     A = find(p~=p(1),1,'first')-1;
0471     cutoff = x(A+1)-dx;
0472     A = sum(x(1:A)<cutoff);
0473 end
0474 
0475 if noEnd
0476     B = n+1;
0477 else
0478     B = find(p~=p(end),1,'last')+1;
0479     cutoff = x(B-1)+dx;
0480     B = (n+1) - sum(x(B:end)>cutoff);
0481 end
0482 
0483 if A+1<B-1
0484     % There is a middle region with >1 point, see if x varies there
0485     if x(B-1)-x(A+1)>dx
0486         return
0487     end
0488 end
0489 
0490 % We have perfect separation that can be defined by some middle point
0491 if A+1==B
0492     xmid = x(A) + 0.5*(x(B)-x(A));
0493 else
0494     xmid = x(A+1);
0495     if isscalar(N)
0496         pmid = mean(p(A+1:B-1));
0497     else
0498         pmid = sum(p(A+1:B-1).*N(A+1:B-1)) / sum(N(A+1:B-1));
0499     end
0500 end
0501 
0502 % Create explanation part for the lower region, if any
0503 if A>=1
0504     explanation = sprintf('\n   XB<%g: P=%g',xmid,p(1));
0505 else
0506     explanation = '';
0507 end
0508 
0509 % Add explanation part for the middle region, if any
0510 if A+1<B
0511     explanation = sprintf('%s\n   XB=%g: P=%g',explanation,xmid,pmid);
0512 end
0513     
0514 % Add explanation part for the upper region, if any
0515 if B<=n
0516     explanation = sprintf('%s\n   XB>%g: P=%g',explanation,xmid,p(end));
0517 end
0518 
0519 warning(message('stats:lassoGlm:PerfectSeparation', explanation));
0520 end %-diagnoseSeparation()
0521 
0522 % ===============================================
0523 %               glmProcessData()
0524 % ===============================================
0525 
0526 function [x, y, offset, pwts, dataClass, N, binomialTwoColumn] = ...
0527     glmProcessData(x, y, distr, const, offset, pwts)
0528 
0529 N = []; % needed only for binomial
0530 binomialTwoColumn = false;
0531 
0532 % Convert the two-column form of 'y', if supplied ('binomial' only).
0533 if strcmp(distr,'binomial')
0534     if size(y,2) == 1
0535         % N will get set to 1 below
0536         if any(y < 0 | y > 1)
0537             error(message('stats:lassoGlm:BadDataBinomialFormat'));
0538         end
0539     elseif size(y,2) == 2
0540         binomialTwoColumn = true;
0541         y(y(:,2)==0,2) = NaN;
0542         N = y(:,2);
0543         y = y(:,1) ./ N;
0544         if any(y < 0 | y > 1)
0545             error(message('stats:lassoGlm:BadDataBinomialRange'));
0546         end
0547     else
0548         error(message('stats:lassoGlm:MatrixOrBernoulliRequired'));
0549     end
0550 end
0551 
0552 [anybad,~,y,x,offset,pwts,N] = dfswitchyard('statremovenan',y,x,offset,pwts,N);
0553 if anybad > 0
0554     switch anybad
0555     case 2
0556         error(message('stats:lassoGlm:InputSizeMismatchX'))
0557     case 3
0558         error(message('stats:lassoGlm:InputSizeMismatchOffset'))
0559     case 4
0560         error(message('stats:lassoGlm:InputSizeMismatchPWTS'))
0561     end
0562 end
0563 
0564 % Extra screening for lassoglm (Infs and zero weights)
0565 okrows = all(isfinite(x),2) & all(isfinite(y),2) & all(isfinite(offset));
0566 
0567 if ~isempty(pwts)
0568     % This screen works on weights prior to stripping NaNs and Infs.
0569     if ~isvector(pwts) || ~isreal(pwts) || size(x,1) ~= length(pwts) || ...
0570             ~all(isfinite(pwts)) || any(pwts<0)
0571         error(message('stats:lassoGlm:InvalidObservationWeights'));
0572     end    
0573     okrows = okrows & pwts(:)>0;
0574     pwts = pwts(okrows);
0575 end
0576 
0577 % We need at least two observations after stripping NaNs and Infs and zero weights.
0578 if sum(okrows)<2
0579     error(message('stats:lassoGlm:TooFewObservationsAfterNaNs'));
0580 end
0581 
0582 % Remove observations with Infs in the predictor or response
0583 % or with zero observation weight.  NaNs were already gone.
0584 x = x(okrows,:);
0585 y = y(okrows);
0586 if ~isempty(N) && ~isscalar(N)
0587     N = N(okrows);
0588 end
0589 if ~isempty(offset)
0590     offset = offset(okrows);
0591 end
0592 
0593 if isequal(const,'on')
0594     x = [ones(size(x,1),1) x];
0595 end
0596 dataClass = superiorfloat(x,y);
0597 x = cast(x,dataClass);
0598 y = cast(y,dataClass);
0599 
0600 if isempty(offset), offset = 0; end
0601 if isempty(N), N = 1; end
0602 
0603 end %-glmProcessData()
0604 
0605 % ===================================================
0606 %             processLassoParameters()
0607 % ===================================================
0608 
0609 function [X,Y,weights,nLambda,lambda,dfmax,cvp,mcreps,predictorNames,ever_active] = ...
0610     processLassoParameters(X,Y,weights, alpha, nLambda, lambdaRatio, lambda, dfmax, ...
0611     standardize, reltol, cvp, mcreps, predictorNames)
0612 
0613 % === 'Weights' parameter ===
0614 if ~isempty(weights)
0615         
0616     % Computations expect that weights is a row vector.
0617     weights = weights(:)';
0618     
0619 end
0620 
0621 [~,P] = size(X);
0622 
0623 % If X has any constant columns, we want to exclude them from the
0624 % coordinate descent calculations.  The corresponding coefficients
0625 % will be returned as zero.
0626 constantPredictors = (range(X)==0);
0627 ever_active = ~constantPredictors;
0628 
0629 % === 'Alpha' parameter ===
0630 
0631 % Require 0 < alpha <= 1.
0632 % "0" would correspond to ridge, "1" is lasso.
0633 if ~isscalar(alpha) || ~isreal(alpha) || ~isfinite(alpha) || ...
0634         alpha <= 0 || alpha > 1
0635     error(message('stats:lassoGlm:InvalidAlpha'))
0636 end
0637 
0638 % === 'Standardize' option ===
0639 
0640 % Require a logical value.
0641 if ~isscalar(standardize) || (~islogical(standardize) && standardize~=0 && standardize~=1)
0642     error(message('stats:lassoGlm:InvalidStandardize'))
0643 end
0644 
0645 % === 'Lambda' sequence or 'NumLambda' and 'lambdaRatio' ===
0646 
0647 if ~isempty(lambda)
0648     
0649     % Sanity check on user-supplied lambda sequence.  Should be non-neg real.
0650     if ~isreal(lambda) || any(lambda < 0)
0651         error(message('stats:lassoGlm:NegativeLambda'));
0652     end
0653 
0654     lambda = sort(lambda(:),1,'descend');
0655     
0656 else
0657     
0658     % Sanity-check of 'NumLambda', should be positive integer.
0659     if ~isreal(nLambda) || ~isfinite(nLambda) || nLambda < 1
0660         error(message('stats:lassoGlm:InvalidNumLambda'));
0661     else
0662         nLambda = floor(nLambda);
0663     end
0664     
0665     % Sanity-checking of LambdaRatio, should be in [0,1).
0666     if ~isreal(lambdaRatio) || lambdaRatio <0 || lambdaRatio >= 1
0667         error(message('stats:lassoGlm:InvalidLambdaRatio'));
0668     end
0669 end
0670 
0671 % === 'RelTol' parameter ===
0672 %
0673 if ~isscalar(reltol) || ~isreal(reltol) || ~isfinite(reltol) || reltol <= 0 || reltol >= 1
0674     error(message('stats:lassoGlm:InvalidRelTol'));
0675 end
0676 
0677 % === 'DFmax' parameter ===
0678 %
0679 % DFmax is #non-zero coefficients
0680 % DFmax should map to an integer in [1,P] but we truncate if .gt. P
0681 %
0682 if isempty(dfmax)
0683     dfmax = P;
0684 else
0685     if ~isscalar(dfmax)
0686         error(message('stats:lassoGlm:DFmaxBadType'));
0687     end
0688     try
0689         dfmax = uint32(dfmax);
0690     catch ME
0691         mm = message('stats:lassoGlm:DFmaxBadType');
0692         throwAsCaller(MException(mm.Identifier,'%s',getString(mm)));
0693     end
0694     if dfmax < 1
0695         error(message('stats:lassoGlm:DFmaxNotAnIndex'));
0696     else
0697         dfmax = min(dfmax,P);
0698     end
0699 end
0700 
0701 % === 'Mcreps' parameter ===
0702 %
0703 if ~isscalar(mcreps) || ~isreal(mcreps) || ~isfinite(mcreps) || mcreps < 1
0704     error(message('stats:lassoGlm:MCRepsBadType'));
0705 end
0706 mcreps = fix(mcreps);
0707 
0708 % === 'CV' parameter ===
0709 %
0710 
0711 if isnumeric(cvp) && isscalar(cvp) && (cvp==round(cvp)) && (0<cvp)
0712     % cvp is a kfold value. It will be passed as such to crossval.
0713     if (cvp>size(X,1))
0714         error(message('stats:lassoGlm:InvalidCVforX'));
0715     end
0716     cvp = cvpartition(size(X,1),'Kfold',cvp);
0717 elseif isa(cvp,'cvpartition')
0718     if strcmpi(cvp.Type,'resubstitution')
0719         cvp = 'resubstitution';
0720     elseif strcmpi(cvp.Type,'leaveout')
0721         error(message('stats:lassoGlm:InvalidCVtype'));
0722     elseif strcmpi(cvp.Type,'holdout') && mcreps<=1
0723         error(message('stats:lassoGlm:InvalidMCReps'));
0724     end
0725 elseif strncmpi(cvp,'resubstitution',length(cvp))
0726     % This may have been set as the default, or may have been
0727     % provided at the command line.  In case it's the latter, we
0728     % expand abbreviations.
0729     cvp = 'resubstitution';
0730 else
0731     error(message('stats:lassoGlm:InvalidCVtype'));
0732 end
0733 if strcmp(cvp,'resubstitution') && mcreps ~= 1
0734     error(message('stats:lassoGlm:InvalidMCReps'));
0735 end
0736 
0737 if isa(cvp,'cvpartition')
0738     if (cvp.N ~= size(X,1)) || (min(cvp.TrainSize) < 2)
0739         % We need partitions that match the total number of observations
0740         % (after stripping NaNs and Infs and zero observation weights), and
0741         % we need training sets with at least 2 usable observations.
0742         error(message('stats:lassoGlm:TooFewObservationsForCrossval'));
0743     end
0744 end
0745 
0746 % === 'PredictorNames' parameter ===
0747 %
0748 % If PredictorNames is not supplied or is supplied as empty, we just
0749 % leave it that way. Otherwise, confirm that it is a cell array of strings.
0750 %
0751 if ~isempty(predictorNames) 
0752     if ~iscellstr(predictorNames) || length(predictorNames(:)) ~= size(X,2)
0753         error(message('stats:lassoGlm:InvalidPredictorNames'));
0754     else
0755         predictorNames = predictorNames(:)';
0756     end
0757 end
0758 
0759 end %-processLassoParameters()
0760 
0761 % ===================================================
0762 %             glmProcessDistrAndLink()
0763 % ===================================================
0764 
0765 function [estdisp,sqrtvarFun,devFun,linkFun,dlinkFun,ilinkFun,link,mu,eta,muLims,...
0766     isCanonical,dlinkFunCanonical] = ...
0767     glmProcessDistrAndLink(y,distr,link,estdisp,N,dataClass)
0768 
0769 switch distr
0770     case 'normal'
0771         canonicalLink = 'identity';
0772     case 'binomial'
0773         canonicalLink = 'logit';
0774     case 'poisson'
0775         canonicalLink = 'log';
0776     case 'gamma'
0777         canonicalLink = 'reciprocal';
0778     case 'inverse gaussian'
0779         canonicalLink = -2;
0780 end
0781 
0782 if isequal(link, 'canonical'), link = canonicalLink; end
0783 
0784 switch distr
0785 case 'normal'
0786     sqrtvarFun = @(mu) ones(size(mu));
0787     devFun = @(mu,y) (y - mu).^2;
0788     estdisp = 'on';
0789 case 'binomial'
0790     sqrtN = sqrt(N);
0791     sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrtN;
0792     devFun = @(mu,y) 2*N.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
0793 case 'poisson'
0794     if any(y < 0)
0795         error(message('stats:lassoGlm:BadDataPoisson'));
0796     end
0797     sqrtvarFun = @(mu) sqrt(mu);
0798     devFun = @(mu,y) 2*(y .* (log((y+(y==0)) ./ mu)) - (y - mu));
0799 case 'gamma'
0800     if any(y <= 0)
0801         error(message('stats:lassoGlm:BadDataGamma'));
0802     end
0803     sqrtvarFun = @(mu) mu;
0804     devFun = @(mu,y) 2*(-log(y ./ mu) + (y - mu) ./ mu);
0805     estdisp = 'on';
0806 case 'inverse gaussian'
0807     if any(y <= 0)
0808         error(message('stats:lassoGlm:BadDataInvGamma'));
0809     end
0810     sqrtvarFun = @(mu) mu.^(3/2);
0811     devFun = @(mu,y) ((y - mu)./mu).^2 ./  y;
0812     estdisp = 'on';
0813 otherwise
0814     error(message('stats:lassoGlm:BadDistribution'));
0815 end
0816 
0817 % Instantiate functions for one of the canned links, or validate a
0818 % user-defined link specification.
0819 [linkFun,dlinkFun,ilinkFun] = dfswitchyard('stattestlink',link,dataClass);
0820 
0821 % Initialize mu and eta from y.
0822 mu = startingVals(distr,y,N);
0823 eta = linkFun(mu);
0824 
0825 % Enforce limits on mu to guard against an inverse link that doesn't map into
0826 % the support of the distribution.
0827 switch distr
0828 case 'binomial'
0829     % mu is a probability, so order one is the natural scale, and eps is a
0830     % reasonable lower limit on that scale (plus it's symmetric).
0831     muLims = [eps(dataClass) 1-eps(dataClass)];
0832 case {'poisson' 'gamma' 'inverse gaussian'}
0833     % Here we don't know the natural scale for mu, so make the lower limit
0834     % small.  This choice keeps mu^4 from underflowing.  No upper limit.
0835     muLims = realmin(dataClass).^.25;
0836 otherwise
0837     muLims = []; 
0838 end
0839 
0840 % These two quantities (isCanonical, dlinkFunCanonical) are not used by
0841 % glmfit but they are needed for calculation of lambdaMax in lassoglm.
0842 
0843 isCanonical = isequal(link, canonicalLink);
0844 [~, dlinkFunCanonical] = dfswitchyard('stattestlink', canonicalLink, dataClass);
0845 
0846 end %-glmProcessDistrAndLink()
0847 
0848 % ===================================================
0849 %                      glmIRLS()
0850 % ===================================================
0851 
0852 function [b,mu,eta,varargout] = glmIRLS(x,y,distr,offset,pwts,dataClass,N, ...
0853     sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ...
0854     wlsfit,nullDev,devFun,reltol)
0855 
0856 wsIterationLimit = warning('off','stats:lassoGlm:IterationLimit');
0857 wsPerfectSeparation = warning('off','stats:lassoGlm:PerfectSeparation');
0858 wsBadScaling = warning('off','stats:lassoGlm:BadScaling');
0859 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit));
0860 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation));
0861 cleanupBadScaling = onCleanup(@() warning(wsBadScaling));
0862 
0863 if isempty(pwts)
0864     pwts = 1;
0865 end
0866 
0867 % Set up for iterations
0868 iter = 0;
0869 iterLim = 100;
0870 warned = false;
0871 seps = sqrt(eps);
0872 
0873 % Match the convergence tolerance for the IRLS algorithm to the
0874 % command line parameter 'RelTol'.
0875 convcrit = max(1e-6,2*reltol);
0876 
0877 eta = linkFun(mu);
0878 
0879 while iter <= iterLim
0880     iter = iter+1;
0881 
0882     % Compute adjusted dependent variable for least squares fit
0883     deta = dlinkFun(mu);
0884     z = eta + (y - mu) .* deta;
0885 
0886     % Compute IRLS weights the inverse of the variance function
0887     sqrtw = sqrt(pwts) ./ (abs(deta) .* sqrtvarFun(mu));
0888     
0889     % If the weights have an enormous range, we won't be able to do IRLS very
0890     % well.  The prior weights may be bad, or the fitted mu's may have too
0891     % wide a range, which is probably because the data do as well, or because
0892     % the link function is trying to go outside the distribution's support.
0893     wtol = max(sqrtw)*eps(dataClass)^(2/3);
0894     t = (sqrtw < wtol);
0895     if any(t)
0896         t = t & (sqrtw ~= 0);
0897         if any(t)
0898             sqrtw(t) = wtol;
0899             if ~warned
0900                 warning(message('stats:lassoGlm:BadScaling'));
0901             end
0902             warned = true;
0903         end
0904     end
0905 
0906     b_old = b;
0907     [b,active] = wlsfit(z - offset, x, sqrtw.^2, b, active);
0908 
0909     % Form current linear predictor, including offset
0910     eta = offset + x * b;
0911 
0912     % Compute predicted mean using inverse link function
0913     mu = ilinkFun(eta);
0914 
0915     % Force mean in bounds, in case the link function is a wacky choice
0916     switch distr
0917     case 'binomial'
0918         if any(mu < muLims(1) | muLims(2) < mu)
0919         mu = max(min(mu,muLims(2)),muLims(1));
0920         end
0921     case {'poisson' 'gamma' 'inverse gaussian'}
0922         if any(mu < muLims(1))
0923         mu = max(mu,muLims(1));
0924         end
0925     end
0926 
0927     % Check stopping conditions
0928     % Convergence of the coefficients
0929     if (~any(abs(b-b_old) > convcrit * max(seps, abs(b_old)))) 
0930         break; 
0931     end
0932     % Proportion of explained deviance explained
0933     if sum(devFun(mu,y)) < (1.0e-3 * nullDev)
0934         break;
0935     end
0936      
0937 end 
0938 
0939 if iter > iterLim
0940     warning(message('stats:lassoGlm:IterationLimit'));
0941 end
0942 
0943 if iter>iterLim && isequal(distr,'binomial')
0944     diagnoseSeparation(eta,y,N);
0945 end
0946 
0947 varargout{1} = active;
0948 
0949 end %-glmIRLS
0950 
0951 % ===================================================
0952 %                      glmIRLSwrapper()
0953 % ===================================================
0954 
0955 function [B,active,varargout] = glmIRLSwrapper(X,Y,distr,offset,pwts,dataClass,N, ...
0956     sqrtvarFun,linkFun,dlinkFun,ilinkFun,devFun,b,active,mu,muLims, ...
0957     wlsfit,nullDev,reltol)
0958 % glmIRLSwrapper is a utility function for regularized GLM. It is called by
0959 % the regularization framework (lassoFit()), and adapts to the conventions
0960 % and expectations of the IRLS implementation for GLM.
0961 %
0962 % Lasso always returns an intercept, but lassoFit passes in the predictor
0963 % matrix 'X' without a column of ones. Unlike OLS, GLM cannot calve off
0964 % estimation of the predictor variable coefficients and calculate the
0965 % intercept externally, post-fit.  This is because each IRLS step
0966 % computes a local linear predictor (eta), and this requires an
0967 % intercept contribution. Therefore, prepend a column of ones before
0968 % passing to glmIRLS. The variable B returned by glmIRLS will include
0969 % the intercept term as the first element of B.
0970 X = [ones(size(X,1),1) X];
0971 
0972 % glmIRLS assumes pwts=1 if no observation weights were supplied,
0973 % rather than empty, which is what lassoFit will feed in.
0974 if isempty(pwts), pwts=1; end
0975 
0976 [B,mu,eta,active] = glmIRLS(X,Y,distr,offset,pwts,dataClass,N, ...
0977     sqrtvarFun,linkFun,dlinkFun,ilinkFun,b,active,mu,muLims, ...
0978     wlsfit,nullDev,devFun,reltol);
0979 
0980 deviance = sum(pwts.* devFun(mu,Y));
0981 
0982 
0983 % Pull the intercept estimate out of B and return separately.
0984 Intercept = B(1);
0985 B = B(2:end);
0986 
0987 extras.Intercept = Intercept;
0988 extras.Deviance  = deviance;
0989 varargout{1} = extras;
0990 varargout{2} = mu;
0991 varargout{3} = eta;
0992 
0993 end %-glmIRLSwrapper
0994 
0995 % ===================================================
0996 %              lassoFitAndPredict()
0997 % ===================================================
0998 
0999 function dev = lassoFitAndPredict(Xtrain,Ytrain,Xtest,Ytest, ...
1000     lambda,alpha,dfmax,standardize,reltol,ever_active, ...
1001     penalizedFitPartition,distr,link,linkFun,dlinkFun,sqrtvarFun, ...
1002     isCanonical, dlinkFunCanonical,devFun,dataClass)
1003 %
1004 % This function sets up the conditions for lasso fit and prediction from
1005 % within crossvalidation.  It defines quantities as necessary for the
1006 % current fold.
1007 
1008 trainWeights = Xtrain(:,1);
1009 % To conform with crossval syntax, weights are prepended to the predictor
1010 % matrix. Empty weights are temporarily converted to a vector of NaNs.
1011 % This clause restores them to empty.  Same occurs below for test weights.
1012 if any(isnan(trainWeights))
1013     trainWeights = [];
1014 end
1015 trainOffset = Xtrain(:,2);
1016 if any(isnan(trainOffset))
1017     trainOffset = 0;
1018 end
1019 
1020 Xtrain = Xtrain(:,3:end);
1021 if size(Ytrain,2) == 2
1022     trainN = Ytrain(:,1);
1023     Ytrain = Ytrain(:,2);
1024 else
1025     trainN = 1;
1026 end
1027 
1028 % These quantities depend on the particular set of responses used for the fit.
1029 % Within crossval, we work with a subset of the original data. Therefore,
1030 % these quantities must be redefined on each invocation.
1031 mu = startingVals(distr,Ytrain,trainN);
1032 eta = linkFun(mu);
1033 if isequal(distr,'binomial')
1034     sqrtvarFun = @(mu) sqrt(mu).*sqrt(1-mu) ./ sqrt(trainN);
1035     devFun = @(mu,y) 2*trainN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
1036 end
1037 
1038 penalizedFit = @(x,y,wlsfit,b,active,mu,eta) penalizedFitPartition(x,y, ...
1039     trainOffset,trainWeights,trainN,wlsfit,b,active,mu,eta,sqrtvarFun);
1040 
1041 [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(Xtrain, Ytrain, trainWeights, ...
1042     alpha, standardize, distr, link, dlinkFun, trainOffset, isCanonical, dlinkFunCanonical, devFun);
1043 
1044 % Will convert from lambda penalty matched to average log-likelihood
1045 % per observation (which is what lasso uses) to an equivalent
1046 % penalty for total log-likelihood (which is what glmIRLS uses.)
1047 if isempty(trainWeights) && isscalar(trainN)
1048     totalWeight = size(Xtrain,1);
1049 elseif ~isempty(trainWeights) && isscalar(trainN)
1050     totalWeight = sum(trainWeights);
1051 elseif isempty(trainWeights) && ~isscalar(trainN)
1052     totalWeight = sum(trainN);
1053 else
1054     totalWeight = sum(trainWeights .* trainN);
1055 end
1056 
1057 lambdaMax = lambdaMax * totalWeight;
1058 
1059 [B,Intercept] = lassoFit(Xtrain,Ytrain, ...
1060     trainWeights,lambda,alpha,dfmax,standardize,reltol, ...
1061     lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,true,nullDev,nullIntercept);
1062 Bplus = [Intercept; B];
1063 
1064 testWeights = Xtest(:,1);
1065 if any(isnan(testWeights))
1066     testWeights = ones(size(Xtest,1),1);
1067 end
1068 testOffset = Xtest(:,2);
1069 if any(isnan(testOffset))
1070     testOffset = 0;
1071 end
1072 Xtest = Xtest(:,3:end);
1073 if size(Ytest,2) == 2
1074     testN = Ytest(:,1);
1075     Ytest = Ytest(:,2);
1076 else
1077     testN = 1;
1078 end
1079 
1080 % For binomial with two-column form of response, the deviance function
1081 % depends on the number of trials for each observation, which is provided
1082 % in testN. Within crossval, the test set is a subset of the original data.
1083 % Therefore, the deviance function needs to be redefined for each test set.
1084 if isequal(distr,'binomial')
1085     devFun = @(mu,y) 2*testN.*(y.*log((y+(y==0))./mu) + (1-y).*log((1-y+(y==1))./(1-mu)));
1086 end
1087 
1088 numFits = size(Bplus,2);
1089 dev = zeros(1,numFits);
1090 for i=1:numFits
1091     if ~isequal(testOffset,0)
1092         mu = glmval(Bplus(:,i), Xtest, link, 'Offset',testOffset);
1093     else
1094         mu = glmval(Bplus(:,i), Xtest, link);
1095     end
1096     di = devFun(mu,Ytest);
1097     dev(i) = sum(testWeights' * di);
1098 end
1099 
1100 end %-lassoFitAndPredict
1101 
1102 % ===================================================
1103 %                    lassoFit()
1104 % ===================================================
1105 
1106 function [B,Intercept,lambda,varargout] = ...
1107     lassoFit(X,Y,weights,lambda,alpha,dfmax,standardize,reltol, ...
1108     lambdaMax,ever_active,penalizedFit,mu,eta,dataClass,userSuppliedLambda,nullDev,nullIntercept)
1109 %
1110 % ------------------------------------------------------
1111 % Perform model fit for each lambda and the given alpha
1112 % ------------------------------------------------------
1113 
1114 regressionType = 'GLM';
1115 
1116 [~,P] = size(X);
1117 nLambda = length(lambda);
1118 
1119 % If X has any constant columns, we want to exclude them from the
1120 % coordinate descent calculations.  The corresponding coefficients
1121 % will be returned as zero.
1122 constantPredictors = (range(X)==0);
1123 ever_active = ever_active & ~constantPredictors;
1124 
1125 % === weights and standardization ===
1126 %
1127 observationWeights = ~isempty(weights);
1128 if ~isempty(weights)
1129     observationWeights = true;
1130     weights = weights(:)';
1131     % Normalize weights up front.
1132     weights = weights / sum(weights);
1133 end
1134 
1135 if standardize
1136     if ~observationWeights
1137         % Center and scale
1138         [X0,muX,sigmaX] = zscore(X,1);
1139         % Avoid divide by zero with constant predictors
1140         sigmaX(constantPredictors) = 1;
1141     else
1142         % Weighted center and scale
1143         muX = weights*X;
1144         X0 = bsxfun(@minus,X,muX);
1145         sigmaX = sqrt( weights*(X0.^2) );
1146         % Avoid divide by zero with constant predictors
1147         sigmaX(constantPredictors) = 1;
1148         X0 = bsxfun(@rdivide, X0, sigmaX);
1149     end
1150 else
1151     switch regressionType
1152         case 'OLS'           
1153             if ~observationWeights
1154                 % Center
1155                 muX = mean(X,1);
1156                 X0 = bsxfun(@minus,X,muX);
1157                 sigmaX = 1;
1158             else
1159                 % Weighted center
1160                 muX = weights*X;
1161                 X0 = bsxfun(@minus,X,muX);
1162                 sigmaX = 1;
1163             end
1164         case 'GLM'
1165             X0 = X;
1166             % For GLM don't center until we get inside the IRLS
1167             sigmaX = 1;
1168             muX = zeros(1,size(X,2));
1169     end
1170 end
1171 
1172 % For OLS, Y is centered, for GLM it is not.
1173 switch regressionType
1174     case 'OLS'
1175         if ~observationWeights
1176             muY = mean(Y);
1177         else
1178             muY = weights*Y;
1179         end
1180         Y0 = bsxfun(@minus,Y,muY);
1181     case 'GLM'
1182         Y0 = Y;
1183 end
1184 
1185 % Preallocate the returned matrix of coefficients, B,
1186 % and other variables sized to nLambda.
1187 
1188 B = zeros(P,nLambda);
1189 
1190 b = zeros(P,1,dataClass);
1191 
1192 if nLambda > 0
1193     Extras(nLambda) = struct('Intercept', nullIntercept, 'Deviance', nullDev);
1194     for i=1:nLambda-1, Extras(i) = Extras(nLambda); end
1195     intercept = nullIntercept;
1196 end
1197 
1198 active = false(1,P);
1199 
1200 for i = 1:nLambda
1201     
1202     lam = lambda(i);
1203     disp (['iteration ', num2str(i), ' out of ' , num2str(nLambda)]);
1204     if lam >= lambdaMax
1205         continue;
1206     end
1207     
1208     % This anonymous function adapts the conventions of the IRLS algorithm
1209     % for glm fit to the coordinate descent algorithm for penalized WLS,
1210     % which is used within the IRLS algorithm.
1211     wlsfit = @(x,y,weights,b,active) glmPenalizedWlsWrapper(y,x,b,active,weights,lam, ...
1212         alpha,reltol,ever_active);
1213 
1214     [b,active,extras,mu,eta] = penalizedFit(X0,Y0,wlsfit,[intercept;b],active,mu,eta);
1215 
1216     B(:,i) = b;
1217     
1218     Extras(i) = extras;
1219     
1220     % Halt if maximum model size ('DFmax') has been met or exceeded.
1221     if sum(active) > dfmax
1222         % truncate B and lambda output arguments
1223         lambda = lambda(1:(i-1));
1224         B = B(:,1:(i-1));
1225         Extras = Extras(:,1:(i-1));
1226         break
1227     end
1228     
1229     % Halt if we have exceeded a threshold on the percent of
1230     % null deviance left unexplained.
1231     if ~(userSuppliedLambda || isempty(nullDev))
1232         if extras.Deviance < 1.0e-3 * nullDev
1233             lambda = lambda(1:i);
1234             B = B(:,1:i);
1235             Extras = Extras(:,1:i);
1236             break
1237         end
1238     end
1239     
1240 end % of lambda sequence
1241 
1242 % ------------------------------------------
1243 % Unwind the centering and scaling (if any)
1244 % ------------------------------------------
1245 
1246 B = bsxfun(@rdivide, B, sigmaX');
1247 B(~ever_active,:) = 0;
1248 
1249 switch regressionType
1250     case 'OLS'
1251         Intercept = muY-muX*B;
1252     case 'GLM'
1253         Intercept = zeros(1,length(lambda));
1254         for i=1:length(lambda)
1255             Intercept(i) = Extras(i).Intercept;
1256         end
1257         if isempty(lambda)
1258             Intercept = [];
1259         else
1260             Intercept = Intercept - muX*B;
1261         end
1262 end
1263 
1264 % ------------------------------------------
1265 % Calculate Mean Prediction Squared Error (or other GoF)
1266 % ------------------------------------------
1267 
1268 switch regressionType
1269     case 'OLS'
1270         Intercept = muY-muX*B;
1271         BwithI = [Intercept; B];
1272         fits = [ones(size(X,1),1) X]*BwithI;
1273         residuals = bsxfun(@minus, Y, fits);
1274         if ~observationWeights
1275             mspe = mean(residuals.^2);
1276         else
1277             % This line relies on the weights having been normalized.
1278             mspe = weights * (residuals.^2);
1279         end        
1280         varargout{1} = mspe;
1281     case 'GLM'
1282         deviance = zeros(1,length(lambda));
1283         for i=1:length(lambda)
1284             deviance(i) = Extras(i).Deviance;
1285         end
1286         if isempty(lambda)
1287             deviance = [];
1288         end
1289         varargout{1} = deviance;
1290 end
1291 
1292 end %-lassoFit
1293 
1294 % ===================================================
1295 %                 thresholdScreen()
1296 % ===================================================
1297 
1298 function potentially_active = thresholdScreen(X0, wX0, Y0, ...
1299     b, active, threshold)
1300 r = Y0 - X0(:,active)*b(active,:);
1301 % We don't need the (b.*wX2)' term that one might expect, because it
1302 % is zero for the inactive predictors.
1303 potentially_active = abs(r' *wX0) > threshold;
1304 end %-thresholdScreen
1305 
1306 % ===================================================
1307 %                 cdescentCycleNewCandidates()
1308 % ===================================================
1309 
1310 function [b,active,wX2,wX2calculated,shrinkFactor] = ...
1311     cdescentCycleNewCandidates(X0, weights, wX0, wX2, wX2calculated, Y0, ...
1312     b, active, shrinkFactor, threshold, candidates)
1313 %
1314 r = Y0 - X0(:,active)*b(active,:);
1315 bold = b;
1316 
1317 for j=find(candidates);
1318     % Regress j-th partial residuals on j-th predictor
1319     bj = wX0(:,j)' * r;
1320     
1321     margin = abs(bj) - threshold;
1322     
1323     % Soft thresholding
1324     if margin > 0
1325         if ~wX2calculated(j)
1326             wX2(j) = weights * X0(:,j).^2;
1327             wX2calculated(j) = true;
1328             shrinkFactor(j) = wX2(j) + shrinkFactor(j);
1329         end
1330         
1331         b(j) = sign(bj) .* margin ./ shrinkFactor(j);
1332 
1333         active(j) = true;
1334     end
1335     
1336     r = r - X0(:,j)*(b(j)-bold(j));
1337 end
1338  
1339 end %-cdescentCycleNewCandidates
1340 
1341 % ===================================================
1342 %                 cdescentCycleNoRecalc()
1343 % ===================================================
1344 
1345 function [b,active] = ...
1346     cdescentCycleNoRecalc(X0, wX0, wX2, Y0, b, active, shrinkFactor, threshold)
1347 %
1348 r = Y0 - X0(:,active)*b(active,:);
1349 bwX2 = b.*wX2;
1350 bold = b;
1351 
1352 for j=find(active);
1353     % Regress j-th partial residuals on j-th predictor
1354     bj = wX0(:,j)' * r + bwX2(j);
1355     
1356     margin = abs(bj) - threshold;
1357     
1358     % Soft thresholding
1359     if margin > 0
1360         b(j) = sign(bj) .* margin ./ shrinkFactor(j);
1361     else
1362         b(j) = 0;
1363         active(j) = false;
1364     end
1365     
1366     r = r - X0(:,j)*(b(j)-bold(j));
1367 end
1368  
1369 end %-cdescentCycleNoRecalc
1370 
1371 % ===================================================
1372 %                    penalizedWls()
1373 % ===================================================
1374 
1375 function [b,varargout] = ...
1376     penalizedWls(X,Y,b,active,weights,lambda,alpha,reltol)
1377 
1378     weights = weights(:)';
1379     
1380     [~,P] = size(X);
1381 
1382     wX = bsxfun(@times,X,weights');
1383     
1384     wX2 = zeros(P,1);
1385     wX2(active) = (weights * X(:,active).^2)';
1386     wX2calculated = active;
1387     
1388     threshold = lambda * alpha;
1389     
1390     shrinkFactor = wX2 + lambda * (1-alpha);
1391 
1392     % Iterative coordinate descent until converged
1393     while true
1394         
1395         bold = b;
1396         old_active = active;
1397 
1398         [b,active] = cdescentCycleNoRecalc(X,wX,wX2,Y, b,active,shrinkFactor,threshold);
1399 
1400         if ~any( abs(b(old_active) - bold(old_active)) > reltol * max(1.0,abs(bold(old_active))) )
1401             % Cycling over the active set converged.
1402             % Do one full pass through the predictors.
1403             % If there is no predictor added to the active set, we're done.
1404             % Otherwise, resume the coordinate descent iterations.
1405             bold = b;
1406             potentially_active = thresholdScreen(X,wX,Y,b,active,threshold);
1407             new_candidates = potentially_active & ~active;
1408             if any(new_candidates)
1409                 [b,new_active,wX2,wX2calculated,shrinkFactor] = ...
1410                     cdescentCycleNewCandidates(X,weights,wX,wX2,wX2calculated,Y, ...
1411                     b,active,shrinkFactor,threshold,new_candidates);
1412             else
1413                 new_active = active;
1414             end
1415 
1416             if isequal(new_active, active)
1417                 break
1418             else
1419                 super_active = active | new_active;
1420                 if ~any( abs(b(super_active) - bold(super_active)) > reltol * max(1.0,abs(bold(super_active))) )
1421                     % We didn't change the coefficients enough by the full pass
1422                     % through the predictors to warrant continuing the iterations.
1423                     % The active coefficients after this pass and the
1424                     % active coefficients prior to this pass differ.
1425                     % This implies that a coefficient changed between zero
1426                     % and a small numeric value.  There is no "fit-wise" reason
1427                     % to prefer the before and after coefficients, so we
1428                     % choose the more parsimonious alternative.
1429                     if sum(new_active) > sum(active)
1430                         b = bold;
1431                     else
1432                         active = new_active;
1433                     end
1434                     break
1435                 else
1436                     active = new_active;
1437                 end
1438             end
1439         end
1440         
1441     end
1442     
1443     varargout{1} = active;
1444         
1445 end %-penalizedWls()
1446  
1447 % ===================================================
1448 %                    glmPenalizedWlsWrapper()
1449 % ===================================================
1450 
1451 function [b,varargout] = glmPenalizedWlsWrapper(X,Y,b,active,weights, ...
1452     lambda,alpha,reltol,ever_active)
1453 
1454 % The coordinate descent in penalizedWls() assumes centered X and Y,
1455 % and it assumes X does NOT have a column of ones.
1456 
1457 X0 = X(:,2:end);
1458 
1459 weights = weights(:)';
1460 
1461 normedWeights = weights / sum(weights);
1462 
1463 % Center X
1464 muX = normedWeights * X0;
1465 X0 = bsxfun(@minus,X0,muX);
1466 
1467 % Center Y.
1468 muY = normedWeights * Y;
1469 Y = Y - muY;
1470 
1471 [bPredictors,varargout{1}] = penalizedWls(X0, Y, b(2:end), ...
1472     active,weights,lambda,alpha,reltol);
1473 
1474 bPredictors(~ever_active,:) = 0;
1475 
1476 % Since X and Y were centered for the WLS we can calculate the intercept
1477 % after the fact.
1478 Intercept = muY-muX*bPredictors;
1479 b = [Intercept; bPredictors];
1480 
1481 end %-glmPenalizedWlsWrapper()
1482 
1483 function [lambdaMax, nullDev, nullIntercept] = computeLambdaMax(X, Y, weights, alpha, standardize, ...
1484     distr, link, dlinkFun, offset, isCanonical, dlinkFunCanonical, devFun)
1485 
1486 % lambdaMax is the penalty term (lambda) beyond which coefficients
1487 % are guaranteed to be all zero: there is no benefit to calculating
1488 % penalized fits with lambda > lambdaMax.
1489 %
1490 % nullDev is the deviance of the fit using just a constant term.
1491 %
1492 % The input parameter 'devFun' is used only as a sanity check to see if glmfit
1493 % gets a plausible fit with intercept term only.
1494 
1495 % Head off potential cruft in the command window.
1496 wsIllConditioned2 = warning('off','stats:glmfit:IllConditioned');
1497 wsIterationLimit = warning('off','stats:glmfit:IterationLimit');
1498 wsPerfectSeparation = warning('off','stats:glmfit:PerfectSeparation');
1499 wsBadScaling = warning('off','stats:glmfit:BadScaling');
1500 cleanupIllConditioned2 = onCleanup(@() warning(wsIllConditioned2));
1501 cleanupIterationLimit = onCleanup(@() warning(wsIterationLimit));
1502 cleanupPerfectSeparation = onCleanup(@() warning(wsPerfectSeparation));
1503 cleanupBadScaling = onCleanup(@() warning(wsBadScaling));
1504 
1505 if ~isempty(weights)
1506     observationWeights = true;
1507     weights = weights(:)';        
1508     % Normalized weights are used for standardization and calculating lambdaMax.
1509     normalizedweights = weights / sum(weights);
1510 else
1511     observationWeights = false;
1512 end
1513 
1514 [N,~] = size(X);
1515 
1516 % If we were asked to standardize the predictors, do so here because
1517 % the calculation of lambdaMax needs the predictors as we will use
1518 % them to perform fits.
1519 
1520 if standardize
1521     % If X has any constant columns, we want to protect against
1522     % divide-by-zero in normalizing variances.
1523     constantPredictors = (range(X)==0);
1524 
1525     if ~observationWeights
1526         % Center and scale
1527         [X0,~,~] = zscore(X,1);
1528     else
1529         % Weighted center and scale
1530         muX = normalizedweights * X;
1531         X0 = bsxfun(@minus,X,muX);
1532         sigmaX = sqrt( normalizedweights * (X0.^2) );
1533         % Avoid divide by zero with constant predictors
1534         sigmaX(constantPredictors) = 1;
1535         X0 = bsxfun(@rdivide, X0, sigmaX);
1536     end
1537 else
1538     if ~observationWeights
1539         % Center
1540         muX = mean(X,1);
1541         X0 = bsxfun(@minus,X,muX);
1542     else
1543         % Weighted center
1544         muX = normalizedweights(:)' * X;
1545         X0 = bsxfun(@minus,X,muX);
1546     end
1547 end
1548 
1549 constantTerm = ones(length(Y),1);
1550 if isscalar(offset)
1551     [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ...
1552         'link',link, 'weights',weights);
1553     predictedMu = glmval(coeffs,constantTerm,link,'constant','off');
1554 else
1555     [coeffs,nullDev] = glmfit(constantTerm,Y,distr,'constant','off', ...
1556         'link',link,'weights',weights,'offset',offset);
1557     predictedMu = glmval(coeffs,constantTerm,link,'constant','off','offset',offset);
1558 end
1559 
1560 nullIntercept = coeffs;
1561 
1562 % Sanity check. With badly matched link / distr / data, glmfit may not
1563 % have been able to converge to a reasonble estimate.  If so, the inputs
1564 % to lassoglm may constitute a difficult problem formulation with
1565 % unsatisfactory maximum likelihood solution.  Poor formulations
1566 % have been observed with mismatched links (ie, 'reciprocal' link with
1567 % the Poisson distribution, in place of canonical 'log').  As a screen for
1568 % this contingency, calculate the deviance we would get using the scalar
1569 % unmodeled mean for the response data. Call this quantity "muDev".
1570 % The value of muDev should be no better than the nullDev calculated
1571 % by glmfit above (albeit it might be equal or nearly equal).
1572 % If the value is better, warn that the computations are of uncertain validity.
1573 
1574 if observationWeights
1575     muDev = weights * devFun(mean(Y)*ones(length(Y),1), Y);
1576 else
1577     muDev = sum(devFun(mean(Y)*ones(length(Y),1), Y));
1578 end
1579 if (muDev - nullDev) / max([1.0 muDev nullDev]) < - 1.0e-4
1580     [~, lastid] = lastwarn;
1581     if strcmp(lastid,'stats:glmfit:BadScaling')
1582         % This reassignment of predicted values is not guaranteed to
1583         % improve matters, but is an attempt to generate a workable
1584         % sequence of lambda values. Note: Since it is a constant
1585         % value for all observations, observation weights are a wash.
1586         predictedMu = mean(Y)*ones(length(Y),1);
1587         warning(message('stats:lassoGlm:DifficultLikelihood'));
1588     end
1589 end
1590 
1591 if ~isCanonical
1592     X0 = bsxfun( @times, X0, dlinkFunCanonical(predictedMu) ./ dlinkFun(predictedMu) );
1593 end
1594 
1595 if ~observationWeights
1596     dotp = abs(X0' * (Y - predictedMu));
1597     lambdaMax = max(dotp) / (N*alpha);
1598 else
1599     wX0 = bsxfun(@times, X0, normalizedweights');
1600     dotp = abs(sum(bsxfun(@times, wX0, (Y - predictedMu))));
1601     lambdaMax = max(dotp) / alpha;
1602 end
1603 
1604 end %-computeLambdaMax()
1605 
1606 function lambda = computeLambdaSequence(lambdaMax, nLambda, lambdaRatio, LRdefault)
1607 
1608 % Fill in the log-spaced sequence of lambda values.
1609         
1610 if nLambda==1
1611     lambda = lambdaMax;
1612 else
1613     % Fill in a number "nLambda" of smaller values, on a log scale.
1614     if lambdaRatio==0
1615         lambdaRatio = LRdefault;
1616         addZeroLambda = true;
1617     else
1618         addZeroLambda = false;
1619     end
1620     lambdaMin = lambdaMax * lambdaRatio;
1621     loghi = log(lambdaMax);
1622     loglo = log(lambdaMin);
1623     logrange = loghi - loglo;
1624     interval = -logrange/(nLambda-1);
1625     lambda = exp(loghi:interval:loglo)';
1626     if addZeroLambda
1627         lambda(end) = 0;
1628     else
1629         lambda(end) = lambdaMin;
1630     end
1631 end
1632 
1633 end %-computeLambdaSequence
1634

Generated on Thu 19-Jan-2017 13:55:21 by m2html © 2005