Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Samprit Chatterjee
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119392484

Скачать книгу

case, a confidence interval provides an alternative way of summarizing the degree of precision in the estimate of a regression parameter. A

confidence interval for

has the form

where

is the appropriate critical value at two‐sided level

for a

‐distribution on

degrees of freedom.

1.3.4 FITTED VALUES AND PREDICTIONS

The rough prediction interval

discussed in Section 1.3.2 is an approximate

interval because it ignores the variability caused by the need to estimate

and uses only an approximate normal‐based critical value. A more accurate assessment of predictive power is provided by a prediction interval given a particular value of

. This interval provides guidance as to how precise

is as a prediction of

for some particular specified value

, where

is determined by substituting the values

into the estimated regression equation. Its width depends on both

and the position of

relative to the centroid of the predictors (the point located at the means of all predictors), since values farther from the centroid are harder to predict as precisely. Specifically, for a simple regression, the estimated standard error of a predicted value based on a value

of the predicting variable is

More generally, the variance of a predicted value is

(1.10)

Here

is taken to include a

in the first entry (corresponding to the intercept in the regression model). The prediction interval is then

where

This prediction interval should not be confused with a confidence interval for a fitted value. The prediction interval is used to provide an interval estimate for a prediction of

for one member of the population with a particular value of

; the confidence interval is used to provide an interval estimate for the true expected value of

for all members of the population with a particular value of

. The corresponding standard error, termed the standard error for a fitted value, is the square root of

(1.11)

with corresponding confidence interval

A comparison of the two estimated variances (1.10) and (1.11) shows that the variance of the predicted value has an extra

term, which corresponds to the inherent variability in the population. Thus, the confidence interval for a fitted value will always be narrower than the prediction interval, and is often much narrower (especially for large samples), since increasing the sample size will always improve estimation of the expected response value, but cannot lessen the inherent variability in the population associated with the prediction of the target for a single observation.

1.3.5 CHECKING ASSUMPTIONS USING RESIDUAL PLOTS

All of these tests, intervals, predictions, and so on, are based on the belief that the assumptions of the regression model hold. Thus, it is crucially important that these assumptions be checked. Remarkably enough, a few very simple plots can provide much of the evidence needed to check the assumptions.

1 A plot of the residuals versus the fitted values. This plot should have no pattern to it; that is, no structure should be apparent. Certain kinds of structure indicate potential problems:A point (or a few points) isolated at the top or bottom, or left or right. In addition, often the rest of the points have a noticeable “tilt” to them. These isolated points are unusual observations and can have a strong effect on the regression. They need to be examined carefully and possibly removed from the data set.An impression of different heights of the point cloud as the plot is examined from left to right. This indicates potential heteroscedasticity (nonconstant variance).

2 Plots of the residuals versus each of the predictors. Again, a plot with no apparent structure is desired.

3 If the data set has a time structure to it, residuals should be plotted in time order. Again, there should be no apparent pattern. If there is a cyclical structure, this indicates that the errors are not uncorrelated, as they are supposed to be (that is, there is potentially autocorrelation in the errors).

4 A normal plot of the residuals. This plot assesses the apparent normality of the residuals, by plotting the observed ordered residuals on one axis and the expected positions (under normality) of those ordered residuals on the other. The plot should look like a straight line (roughly). Isolated points once again represent unusual observations, while a curved line indicates that the errors are probably not normally distributed, and tests and intervals might not be trustworthy.

Note that all of these plots

Скачать книгу