Introduction to Linear Regression Analysis. Douglas C. Montgomery. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Douglas C. Montgomery
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119578758

Скачать книгу

interval (2.55) widen as x₀ increases. Furthermore, the length of the CI (2.54) at x = 0 is zero because the model assumes that the mean y at x = 0 is known with certainty to be zero. This behavior is considerably different than observed in the intercept model. The prediction interval (2.55) has nonzero length at x₀ = 0 because the random error in the future observation must be taken into account.

It is relatively easy to misuse the no-intercept model, particularly in situations where the data lie in a region of x space remote from the origin. For example, consider the no-intercept fit in the scatter diagram of chemical process yield (y) and operating temperature (x) in Figure 2.12a. Although over the range of the regressor variable 100°F ≤ x ≤ 200°F, yield and temperature seem to be linearly related, forcing the model to go through the origin provides a visibly poor fit. A model containing an intercept, such as illustrated in Figure 2.12b, provides a much better fit in the region of x space where the data were collected.

Frequently the relationship between y and x is quite different near the origin than it is in the region of x space containing the data. This is illustrated in Figure 2.13 for the chemical process data. Here it would seem that either a quadratic or a more complex nonlinear regression model would be required to adequately express the relationship between y and x over the entire range of x. Such a model should only be entertained if the range of x in the data is sufficiently close to the origin.

Figure 2.12 Scatter diagrams and regression lines for chemical process yield and operating temperature: (a) no-intercept model; (b) intercept model.

Figure 2.13 True relationship between yield and temperature.

The scatter diagram sometimes provides guidance in deciding whether or not to fit the no-intercept model. Alternatively we may fit both models and choose between them based on the quality of the fit. If the hypothesis β₀ = 0 cannot be rejected in the intercept model, this is an indication that the fit may be improved by using the no-intercept model. The residual mean square is a useful way to compare the quality of fit. The model having the smaller residual mean square is the best fit in the sense that it minimizes the estimate of the variance of y about the regression line.

Generally R² is not a good comparative statistic for the two models. For the intercept model we have

Note that R² indicates the proportion of variability around in49-1 explained by regression. In the no-intercept case the fundamental analysis-of-variance identity (2.32) becomes

so that the no-intercept model analogue for R² would be

The statistic in50-1 indicates the proportion of variability around the origin (zero) accounted for by regression. We occasionally find that in50-2 is larger than R² even though the residual mean square (which is a reasonable measure of the overall quality of the fit) for the intercept model is smaller than the residual mean square for the no-intercept model. This arises because in50-3 is computed using uncorrected sums of squares.

There are alternative ways to define R² for the no-intercept model. One possibility is

However, in cases where in50-4 is large, in50-5 can be negative. We prefer to use MS_Res as a basis of comparison between intercept and no-intercept regression models. A nice article on regression models with no intercept term is Hahn [1979].

Example 2.8 The Shelf-Stocking Data

The time required for a merchandiser to stock a grocery store shelf with a soft drink product as well as the number of cases of product stocked is shown in Table 2.10. The scatter diagram shown in Figure 2.14 suggests that a straight line passing through the origin could be used to express the relationship between time and the number of cases stocked. Furthermore, since if the number of cases x = 0, then shelf stocking time y = 0, this model seems intuitively reasonable. Note also that the range of x is close to the origin.

The slope in the no-intercept model is computed from Eq. (2.50) as

Therefore, the fitted equation is

This regression line is shown in Figure 2.15. The residual mean square for this model is MS_Res = 0.0893 and in50-6 . Furthermore, the t statistic for testing H₀: β₁ = 0 is t₀ = 91.13, for which the P value is 8.02 × 10⁻²¹. These summary statistics do not reveal any startling inadequacy in the no-intercept model.

We may also fit the intercept model to the data for comparative purposes. This results in

The t statistic for testing H₀: β₀ = 0 is t₀ = −0.65, which is not significant, implying that the no-intercept model may provide a superior fit. The residual mean square for the intercept model is MS_Res = 0.0931 and R² = 0.9997. Since MS_Res for the no-intercept model is smaller than MS_Res for the intercept model, we conclude that the no-intercept model is superior. As noted previously, the R² statistics are not directly comparable.

TABLE 2.10 Shelf-Stocking Data for Example 2.8

Times,

Скачать книгу