Table 1.6N
b – unstandardized regression coefficients; s.e. – standard errors
a Australian Capital Territory is the reference category
* < 0.05, ** p < 0.01, *** p < 0.001
The results in Table 1.6 suggest that region impacts on how individual characteristics affect the dependent variable. There are many statistically significant interactions. To really unravel what they mean, we have to look at them along with the main effects of the variables in the model. We can graph the main effects with the interaction effects and demonstrate the overall effect. We’re not going to do that here, but we do address graphing interaction effects later. The main point from these results is that there are significant interactions. Perhaps we have just solved the problem of group effects?
Unfortunately not. There are still problems with this model. While there is no shortage of examples of this type of analysis in published work, one major problem with this approach is the nature of the cross-level interaction term. Cross-level interaction terms refer to interaction terms which have variables at different levels of aggregation. In this case, we have interacted individual characteristics (Level 1) with group characteristics (Level 2). This approach is fraught with problems. Treating group-level variables as though they are properties of individuals may result in flawed parameter estimations and downwardly biased standard errors (Hox and Kreft, 1994), and so we are more likely to find significant results. This also results in problems in the calculation of degrees of freedom, which leads to flawed estimates of the standard errors and faulty results. The problems associated with degrees of freedom are explained in more detail below.
Degrees of freedom and statistical significance
Degrees of freedom are a problem when using OLS to model multilevel relationships. When we use OLS and simply add group-level variables, such as region in the example above, we create a model that assumes individual-level degrees of freedom. At this point you may well be wondering, ‘What are degrees of freedom?’ – fair enough. As the name implies, degrees of freedom are related to how many of the values in a formula are able to vary when the statistic is being calculated. Our example data contains 13,646 students in eight different regions. These students then have 13,646 individual pieces of data. We use this information to estimate statistical relationships. In general, each statistic that we need to estimate requires one degree of freedom – because it is no longer allowed to vary. Many equations contain the mean, for example. Once we calculate a group mean, it is no longer able to vary. Again, once we calculate a standard deviation, we lose another degree of freedom. In the our examples above, degrees of freedom are determined from individual data, but if we have group characteristics in this individual-level data set, OLS calculates the degrees of freedom as though they are simply related to characteristics of individuals. In terms of group characteristics, the degrees of freedom should be based on the number of regions (8) rather than the number of pupils (13,646). The numbers – 13,646 versus 8 – are obviously very different. Degrees of freedom are integral in calculating tests of statistical significance. The resulting error from using the wrong degrees of freedom in OLS calculations is that it increases our likelihood of rejecting the null hypothesis when we should not. In other words, we are more likely to get statistically significant results – when we shouldn’t – if we use the individual level degrees of freedom instead of the group level degrees of freedom.
Table 1.7
Adapted from Diez-Roux (2000: 173)
Table 1.7 summarizes the OLS ‘workarounds’ discussed above and their associated problems. The overarching problem is that when you use OLS models on data better suited to multilevel techniques you are very likely to underestimate standard errors and therefore increase the likelihood of results being statistically significant, possibly rejecting a null hypothesis when you should not. In other words, you are more likely to make a Type I error. If you correctly model your multilevel data then your results will be more accurately specified, more elegant, and more convincing, and your statistical techniques will match your conceptual model.
Multilevel modeling, in general and specific aspects of it, has also come in for some criticism. As with many debates of this kind, there is unlikely to be a firm and final conclusion, but we do advocate that users of any technique are aware of the criticisms and current debates. So we suggest that you start with this series of papers: Gorard (2003a, 2003b, 2007) and Plewis and Fielding (2003).
Software
In the main text of this book we use Stata 13 software. We assume that the reader is familiar with the Stata software program as we do not see this book as an introduction to Stata – see Pevalin and Robson (2009) for such a treatment for an earlier version of Stata. At the time of writing, Stata 13 was the current version, with some changes for the main commands used in this book from version 12 – namely, the change from the xtmixed to the mixed command. While this changed little of the output and only a few options that are now defaults, it did impact on some of the other commands written by others for use after xtmixed. So, at times we use the Stata 12 command xtmixed, which works perfectly well in Stata 13 but is not officially supported. Stata 14 was released while this book was in production and we have checked that the do-files run in version 14.
As you may have gathered from the previous paragraph we use this font for Stata commands in the text. We italicize variable names in the text when we use them on their own, not part of a Stata command, but we also use italics to emphasize some points – we hope that the difference is clear.
In the text we use the /// symbol to break a Stata command over more than one line. For example:
mixed z_read cen_pos || schoolid: cen_pos, ///
stddev covariance(unstructured) nolog
This is only done to keep the commands on the book page. In the do-files (available on the accompanying webpage at the URL given below), the command can run on much further to the right. We have tried to keep what you see on these pages and what you see in the do-file the same, but if you come across a command without the /// in the do-file then don’t worry about it.
At a number of points we include the Stata commands that we used to perform certain tasks, including data manipulation and variable creation. As with all software, there are a number of ways to get what you want, some elegant and some a bit cumbersome. Our rationale is to start by using commands that are easy to follow and then move on to some of the more integrated features in Stata. In doing so, we hope that what we are doing is more transparent. If your programming skills are more advanced than those we demonstrate then we’re sure you’ll be able to think of more elegant ways of coding in no time at all. In the do-files to accompany this book we sometimes include alternate ways of programming to illustrate the versatility of the software.
There is a webpage to accompany this book at https://study.sagepub.com/robson. On this webpage you will find do-files and data files for each of the chapters so you can run through them in Stata and amend them for your own use. You will also find the chapter commands in R, with some explanation how to use R for these multilevel models. The webpage will be very much a ‘live’ document with links to helpful sites and other resources. In the list of references we have noted which papers are ‘open access’. Links to these papers are also on the webpage.
We have chosen to use Stata in this book for two reasons. First, we wanted to use a general-purpose statistical software package rather than a specialized multilevel package. If