Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Samprit Chatterjee
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119392484

Скачать книгу

href="#fb3_img_img_2dabac4b-e3c8-53e1-aeca-79f78f32430d.png" alt="images"/> that helps address this problem is the corrected images

(2.3) equation

(Hurvich and Tsai, 1989). Equation (2.3) shows that (especially for small samples) models with fewer parameters will be more strongly preferred when minimizing images than when minimizing images , providing stronger protection against overfitting. In large samples, the two criteria are virtually identical, but in small samples, or when considering models with a large number of parameters, images is the better choice. This suggests the following model selection rule:

1 Choose the model that minimizes . In case of tied values, the simplest model (smallest ) would be chosen. In these data, this rule implies choosing , although the value for is virtually identical to that of . Note that the overall level of the values is not meaningful, and should not be compared to values or values for other data sets; it is only the value for a model for a given data set relative to the values of others for that data set that matter.

images , images , and images have the desirable property that they are efficient model selection criteria. This means that in the (realistic) situation where the set of candidate models does not include the “true” model (that is, a good model is just viewed as a useful approximation to reality), as the sample gets larger the error obtained in making predictions using the model chosen using these criteria becomes indistinguishable from the error obtained using the best possible model among all candidate models. That is, in this large‐sample predictive sense, it is as if the best approximation was known to the data analyst. Another well‐known criterion, the Bayesian Information Criterion images [which substitutes images for images in (2.2)], does not have this property, but is instead a consistent criterion. Such a criterion has the property that if the “true” model is in fact among the candidate models the criterion will select that model with probability approaching images as the sample size increases. Thus, images is a more natural criterion to use if the goal is to identify the “true” predictors with nonzero slopes (which of course presumes that there are such things as “true” predictors in a “true” model). images will generally choose simpler models than images because of its stronger penalty ( images for images ), and a version images that adjusts images as in (2.3) leads to even simpler models. This supports the notion that from a predictive point of view including a few unnecessary predictors (overfitting) is far less damaging than is omitting necessary predictors (underfitting).

A final way of comparing models is from a directly predictive point of view. Since a rough images prediction interval is images , a useful model from a predictive point of view is one with small images , suggesting choosing a model that has small images while still being as simple as possible. That is,

1 Increase the number of predictors until levels off. For these data ( in the output refers to ), this implies choosing or .

Taken together, all of these rules imply that the appropriate set of models to consider are those with two, three, or four predictors. Typically, the strongest model of each size (which will have highest images , highest images , lowest images , lowest images , and lowest images , so there is no controversy as to which one is strongest) is examined. The output on pages 31–32 provides summaries for the top three models of each size, in case there are reasons to examine a second‐ or third‐best model (if, for example, a predictor in the best model is difficult or expensive to measure), but here we focus on the best model of each size. First, here is output for the best four‐predictor model.

Coefficients: Estimate Std.Error t value Pr(>|t|) VIF (Intercept) -6.852e+06 3.701e+06 -1.852 0.0678 . Bedrooms -1.207e+04 9.212e+03 -1.310 0.1940 1.252 Bathrooms 5.303e+04 1.275e+04 4.160 7.94e-05 1.374 *** Living.area 6.828e+01 1.460e+01 4.676 1.17e-05 1.417 *** Year.built 3.608e+03 1.898e+03 1.901 0.0609 1.187 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 46890 on 80 degrees of freedom Multiple R-squared: 0.5044, Adjusted R-squared: 0.4796 F-statistic: 20.35 on 4 and 80 DF, p-value: 1.356e-11

The images ‐statistic for number of bedrooms suggests very little evidence that it adds anything useful given the other predictors in the model, so we consider now the best three‐predictor model. This happens to be the best four‐predictor model with the one statistically insignificant predictor omitted, but this does not have to be the case.

Coefficients: Estimate Std.Error t value Pr(>|t|) VIF (Intercept) -7.653e+06 3.666e+06 -2.087 0.039988 * Bathrooms 5.223e+04 1.279e+04 4.084 0.000103 1.371 *** Living.area 6.097e+01 1.355e+01 4.498 2.26e-05 1.210 *** Year.built 4.001e+03 1.883e+03 2.125 0.036632 1.158 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 47090 on 81 degrees of freedom Multiple R-squared: 0.4937, Adjusted R-squared: 0.475 F-statistic: 26.33 on 3 and 81 DF, p-value: 5.489e-12

Each of the predictors is statistically significant at a images level, and this model recovers virtually all of the available fit ( images , while that using all six predictors is images ), so this seems to be a reasonable

Скачать книгу