Example 1
The statement “Sales is a function of advertising dollars with r2 = 70 percent,” can be interpreted as “70 percent of the total variation of factory overhead is explained by regression equation or the change in advertising and the remaining 30 percent is accounted for by something other than advertising.”
The coefficient of determination is computed as
Where Y = actual values
Y′ = estimated values
In a simple regression, however, there is a shortcut method available:
where n = number of observations
X = value of independent value
Example 2
To illustrate the computations of various regression statistics, use the same data used in Sec. 19, Regression Analysis. All the sums required are computed and shown below. Note that the Y2 column is added in Table 20.1 to be used for r2.
Table 20.1: Computed Sums
From this table,
∑X = 174 ∑Y = 225 ∑XY = 3414 ∑X2 = 2792
Using the shortcut method for r2,
This means that about 60.84 percent of the total variation in total sales is explained by advertising and the remaining 39.16 percent is still unexplained. A relatively low r2 indicates that there is a lot of room for improvement in the forecasting equation (Y2 = $10.5836 + $0.5632X). Advertising or a combination of price and advertising might improve r2.
Note: A low r2 is an indication that the model is inadequate for explaining the y variable. The general causes for this problem are:
1.Use of a wrong functional form.
2.Poor choice of an x variable as the predictor.
3.Omission of some important variable or variables from the model.
2. Standard Error of the Estimate (Se)
The standard error of the estimate, designated Se, is defined as the standard deviation of the regression. It is computed as
Statistics can be used to gain some idea of accuracy of these predictions.
Since, t = 3.94 > 2, we conclude that the b coefficient is statistically significant. As was indicated previously, the table’s critical value (cut-off value) for 10 degrees of freedom is 2.228 (from Table 8 in the Appendix).
Rule of thumb: Any t value greater than +2 or less than 2 is acceptable. The higher the t value, the greater the confidence we have in the coefficient as a predictor. Low t values are indications of low reliability of the predictive power of that coefficient.
Example 3
Returning to our example data, Se is calculated as
Suppose you wish to make a prediction regarding an individual Y value--such as a prediction about the sales when an advertising expense = $10. Usually, we would like to have some objective measure of the confidence we can place in our prediction, and one such measure is a confidence (or prediction) interval constructed for Y.
Note: t is the critical value for the level of significance employed. For example, for a significant level of 0.025 (which is equivalent to a 95% confidence level in a two-tailed test), the critical value of t for 10 degrees of freedom is 2.228 (See Table A.2 in the Appendix). As can be seen, the confidence interval is the linear distance bounded by limits on either side of the prediction.
Example 4
If you want to have a 95 percent confidence interval of your prediction, the range for the prediction, given an advertising expense of $10 would be between $10,595.10 and $21,836.10, as determined as follows: Note that from Example 4.2, Y′ = $16.2156
The confidence interval is therefore established as follows:
$16.2156 ± (2.228)(2.3436)
= $16.2156 ± (2.228)(2.3436)
= $16.2156 ± 5.2215
which means the range for the prediction, given an advertising expense of $10 would be between $10.5951 and $21.8361. Note that $10.9941 = $16.2156 - 5.2215 and $21.4371 =$16.2156 + 5.2215.
3. Standard Error of the Regression Coefficient (Sb) and the t Statistic
The standard error of the regression coefficient, designated s, and the t statistic are closely related. Sb is calculated as:
or, in short-cut form,
Sb gives an estimate of the range where the true coefficient will “actually” fall.
The t statistics (or t value) is a measure of the statistical significance of an independent variable X in explaining the dependent variable Y. It is determined by dividing the estimated regression coefficient b by its standard error Sb It is then compared with the table t value (see Table 7 in the appendix). Thus, the t statistic measures how many standard errors the coefficient is away from zero. Low t values are indicators of low reliability of that coefficient.
Example 5
The Sb for our example is:
Since t - 3.94 > 2, the conclusion is that the b coefficient is statistically significant.
How is it used and applied?
The least-squares method is used to estimate both simple and multiple regressions, although in reality managers will confront multiple regression more often than simple regression. Computer software is used to estimate b’s. A spreadsheet program such as Excel can be used to develop a model and estimate most of the regression statistics discussed thus far. Table 20.1 shows the relevant statistics.
Regression analysis is a powerful statistical technique that is widely used by businesspersons and economists. In order to obtain a good fit and to achieve a high degree of accuracy, analysts must be familiar with statistics relating to regression, such as r2 and the t value, and be able to make further tests that are unique to multiple regression.
See also Sec. 19, Regression Analysis; Sec. 21, Simple Regression.
Table 20.1: Excel