Maximum likelihood estimates of the β’s are those values that maximize this log likelihood equation. This is accomplished by calculating the partial derivatives and setting them to zero. These equations are ∂L / ∂βik
Because of the nonlinear nature of the parameters, there is no closed‐form solution to these equations, and they must be solved iteratively. The Newton–Raphson [4–7] method is used to solve these equations. This method makes use of the information matrix, I(β), which is formed from the matrix of second partial derivatives.
The elements of the information matrix are given by
The information matrix is used because the asymptotic covariance matrix of the maximum likelihood estimates is equal to the inverse of the information matrix. That is,
The interpretation of the estimated regression coefficients is not straightforward. In logistic regression, not only is the relationship between X and Y nonlinear, but also, if the dependent variable has more than two unique values, there are several regression equations. Consider the usual case of a binary dependent variable, Y, and a single independent variable, X. Assume that Y is coded so it takes on the values 0 and 1. In this case, the logistic regression equation is ln(p/(1 − p)) = β0 + β1 X. Now consider impact of a unit increase in X. The logistic regression equation becomes ln(p ′ /(1 − p′)) = β0 + β1(X + 1) = β0 + β1 X + β1. We can isolate the slope by taking the difference between these two equations. We have
(2.9)
That is, β1 is the log of the ratio of the odds at X + 1 and X. Removing the logarithm by exponentiating both sides gives
Inferences about individual regression coefficients, groups of regression coefficients, goodness of fit, mean responses, and predictions of group membership of new observations are all of interest. These inference procedures can be treated by considering hypothesis tests and/or confidence intervals. The inference procedures in logistic regression rely on large sample sizes for accuracy. Two procedures are available for testing the significance of one or more independent variables in a logistic regression: likelihood ratio tests and Wald tests. Simulation studies usually show that the likelihood ratio test performs better than the Wald test. However, the Wald test is still used to test the significance of individual regression coefficients because of its ease of calculation.
The likelihood ratio test statistic is −2 times the difference between the log likelihoods of two models, one of which is a subset of the other. The likelihood ratio is defined as LR = −2[Lsubset − Lfull] = −2[ ln (lsubset/lfull)]. When the full model in the likelihood ratio test statistic is the saturated model, LR is referred to as the deviance. A saturated model is one that includes all possible terms (including interactions) so that the predicted values from the model equal the original data. The formula for the deviance is D = −2[LReduced − LSaturated]. The deviance may be calculated directly using the formula for the deviance residuals:
(2.10)
This expression may be used to calculate the log likelihood of the saturated model without actually fitting a saturated model. The formula is LSaturated = LReduced + D/2.
The deviance in logistic regression is analogous to the residual sum of squares in multiple regression. In fact, when the deviance is calculated in multiple regression, it is equal to the sum of the squared residuals. Deviance residuals, to be discussed later, may be squared and summed as an alternative way to calculate the deviance D.
The change in deviance, ΔD, due to excluding (or including) one or more variables is used in logistic regression just as the partial F test is used in multiple regression. Many texts use the letter G to represent ΔD, but we have already used G to represent the number of groups in Y. Instead of using the F