This data structure is typical of the type of data to which multiple linear regression (MLR), or more generally, any modeling approach, is applied. This familiar tabular structure leads naturally to the representation and manipulation of data values as matrices.
To be more specific, a multiple linear regression model for our data can be represented as shown here:
(2.1)(21.022.818.718.114.324.4)=[ 161102.62Man(0)14932.32Man(0)181753.44Auto(1)161053.46Auto(1)182453.57Auto(1)14623.19Auto(1) ]*(β0β1β2β3β4)+(ε1ε2ε3ε4ε5ε6)
Here are various items to note:
1. The values of the response, MPG, are presented on the left side of the equality sign, in the form of a column vector, which is a special type of matrix that contains only a single column. In our example, this is the only column in the response matrix Y.
2. The rectangular array to the immediate right of the equality sign, delineated by square brackets, consists of five columns. There is a column of ones followed by four columns consisting of the values of our four predictors, Number of Cylinders, HP (horsepower), Weight, and Transmission. These five columns are the columns in the matrix X.
3. In parentheses, next to the entries in the last column of X, the Transmission value labels, “Man” and “Auto” have been assigned the numerical values 0 and 1, respectively. Because matrices can contain only numeric data, the values of the variable Transmission have to be coded in a numerical form. When a nominal variable is included in a regression model, JMP automatically codes that column, and you can interpret reports without ever knowing what has happened behind the scenes. But if you are curious, select Help > Books > Fitting Linear Models, and search for “Nominal Effects” and “Nominal Factors”.
4. The column vector consisting of βs, denoted β, contains the unknown coefficients that relate the entries in X to the entries in Y. These are usually called regression parameters.
5. The column vector consisting of epsilons (εi), denotedε, contains the unknown errors. This vector represents the variation that is unexplained when we model Y using X.
The symbol “*” in Equation (2.1) denotes matrix multiplication. The expanded version of Equation (2.1) is:
(2.2)(21.022.818.718.114.324.4)=(β0+6β1+110β2+2.62β3+ε1β0+4β1+ 93β2+2.32β3+ε2β0+8β1+175β2+3.44β3+β4+ε3β0+6β1+105β2+3.46β3+β4+ε4β0+8β1+245β2+3.57β3+β4+ε5β0+4β1+ 62β2+3.19β3+β4+ε6)
Equation (2.2) indicates that each response is to be modeled as a linear function of the unknown βs.
We can represent Equation (2.1) more generically as:
(2.3)(Y1Y2Y3Y4Y5Y6)=[ X10X11X12X13X14X20X21X22X23X24X30X31X32X33X34X40X41X42X43X44X50X51X52X53X54X60X61X62X63X64 ]*(β0β1β2β3β4)+(ε1ε2ε3ε4ε5ε6)
Now we can write Equation (2.3) succinctly as:
(2.4) Y = Xβ + ε,
Here
Y=(Y1Y2Y3Y4Y5Y6),
X=[ X10X11X12X13X14X20X21X22X23X24X30X31X32X33X34X40X41X42X43X44X50X51X52X53X54X60X61X62X63X64 ],
β=(β0β1β2β3β4),
and
ε=(ε1ε2ε3ε4ε5ε6)
For a column vector like Y, we need only one index to designate the row in which an element occurs. For the 6 by 5 matrix X, we require two indices. The first designates the row and the second designates the column. Note that we have not specified the matrix multiplication operator in Equation (2.4); it is implied by the juxtaposition of any two matrices.
Equation (2.4) enables us to note the following:
1. The entries in X consist of the column of ones followed by the observed data on each of the four predictors.
2. Even though the entries in X are observational data, rather than the result of a designed experiment, the matrix X is still called the design matrix.
3. The vector ε, which contains the errors,εi, is often referred to as the noise.
4. Once we have estimated the column vector β, we are able to obtain predicted values of MPG. By comparing these predicted values to their actual values, we obtain estimates of the errors,εi. These differences, namely the actual minus the predicted values, are called residuals.
5. If the model provides a good fit, we expect the residuals to be small, in some sense. We also expect them to show a somewhat random pattern, indicating that our model adequately captures the structural relationship between X and Y. If the residuals show a structured pattern, one remedy might be to specify a more complex model by adding additional columns to X; for example, columns that define interaction terms and/or power terms (Draper and Smith 1998).
6. The values in β are the coefficients or parameters that correspond to each column or term in the design matrix X (including the first, constant term). In terms of this linear model, their interpretation is straightforward. For example, β3 is the expected change in MPG for a unit change in Weight.
7. Note that the dimensions of the matrices (number of rows and columns) have to conform in Equation (2.4). In our example, Y is a 6 by 1 matrix, X is a 6 by 5 matrix, β is a 5 by 1 matrix, and ε is a 6 by 1 matrix.
Estimating the Coefficients
So how do we calculate β from the data we have collected? There are numerous approaches, partly depending on the assumptions you are prepared or required to make about the noise component, ε. It is generally assumed that the X variables are measured without noise, so that the noise is associated only with the measurement of the response, Y.
It is also generally assumed that the errors,εi, are identically and independently distributed according to a normal distribution (Draper and Smith 1998). Once a model is fit to the data, your next step should be to check if the pattern of residuals is consistent with this assumption.
For a full introduction to MLR in JMP using the Fit Model platform, select Help > Books > Fitting Linear Models. When the PDF opens, go to the chapter entitled “Standard Least Squares Report and Options.”
More generally, returning to the point about matrix dimensions, the dimensions of the components of a regression model of the form
(2.5) Y = Xβ + ε
can be represented as follows:
• Y is an n x 1 response matrix.
• X is an n x m design matrix
• β is an m x 1 coefficient vector.
• ε is an n x 1 error vector.
Here n is the number of observations and m is the number of columns in X. For now, we assume that there is only one column in Y, but later on, we consider situations where Y has multiple columns.
Let’s pause for a quick linear algebra review. If A is any r x s matrix with elements αij, then the matrix A’, with elements αji, is called the transpose of A. Note that the rows of A are the columns of A’. We denote the inverse of a square q x q matrix B by