Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Samprit Chatterjee
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119392484

Скачать книгу

rel="nofollow" href="#ulink_edba1f88-7c52-5535-a136-ecab1a4b9b9e">(2.4) to account for model selection uncertainty is just a part of the more general problem that standard degrees of freedom calculations are no longer valid when multiple models are being compared to each other as in the comparison of all models with a given number of predictors in best subsets. This affects other uses of those degrees of freedom, including the calculation of information measures like images

, and

, and thus any decisions regarding model choice. This problem becomes progressively more serious as the number of potential predictors increases and is the subject of active research. This will be discussed further in Chapter 14.

2.4 Indicator Variables and Modeling Interactions

It is not unusual for the observations in a sample to fall into two distinct subgroups; for example, people are either male or female. It might be that group membership has no relationship with the target variable (given other predictors); such a pooled model ignores the grouping and pools the two groups together.

On the other hand, it is clearly possible that group membership is predictive for the target variable (for example, expected salaries differing for men and women given other control variables could indicate gender discrimination). Such effects can be explored easily using an indicator variable, which takes on the value images for one group and images for the other (such variables are sometimes called dummy variables or images variables). The model takes the form

where images is an indicator variable with value images if the observation is a member of group and images otherwise. The usual interpretation of the slope still applies: images is the expected change in images associated with a one‐unit change in images holding all else fixed. Since images only takes on the values images or images , this is equivalent to saying that the expected target is images higher for group members ( images ) than nonmembers ( images ), holding all else fixed. This has the appealing interpretation of fitting a constant shift model, where the regression relationships for group members and nonmembers are identical, other than being shifted up or down; that is,

for nonmembers and

for members. The images ‐test for whether images is thus a test of whether a constant shift model (two parallel regression lines, planes, or hyperplanes) is a significant improvement over a pooled model (one common regression line, plane, or hyperplane).

Would two different regression relationships be better still? Say there is only one numerical predictor images ; the full model that allows for two different regression lines is

for nonmembers ( images ), and

for members ( images ). The pooled model and the constant shift model can be made to be special cases of the full model, by creating a new variable that is the product of images and images . A regression model that includes this variable,

corresponds to the two different regression lines

for nonmembers (since images ), implying images and images above, and

for members (since images ), implying images and images above.

The images ‐test for the slope of the product variable ( images ) is a test of whether the full model (two different regression lines) is significantly better than the constant shift model (two parallel regression lines); that is, it is a test of parallelism. The restriction images defines the pooled model as a special case of the full model, so the partial images ‐statistic based on (2.1),

on Скачать книгу