Once you have finished reading the introduction, method, and results sections, you should have a pretty good idea about what was done, to whom, and what was found. In the discussion section, you will read the researcher’s interpretation of the research, comments about unexpected findings, and speculations about the importance of the work or its application.
The Discussion
The dissertation adviser of one of the authors of this book told her that he never read the discussion section of research reports. He was not interested in the interpretation of the authors. He interpreted the findings and their importance himself. We consider this good advice for seasoned researchers but not for students. The discussion section of a research article is where the author describes how the results fit into the literature. This is a discussion of the theories that are supported by the research and the theories that are not. It is also where you will find suggestions from the author as to where the research should go in the future—what questions are left unanswered and what new questions the research raises. Indeed, the discussion section may direct you in your selection of a research project. You may wish to contact the author to see if research is already being conducted on the questions posed in the discussion. Remember that it is important to be a critical consumer of research. Do not simply accept what is said in the discussion. Ask yourself if the results really do support the author’s conclusions. Are there other possible interpretations?
In the discussion section of our example article, Knez (2001) relates the findings to his previous work and the research of others. He discusses the lack of effect of light on mood and questions the mood measure that was used. We think that another possibility, which he does not explore, is that lighting may not have an influence on mood. He also describes the effect of light on cognitive performance as being something new to the literature. We could speculate that this small effect might not be a reliable finding. Certainly, the weak p values reported in the results section would indicate either that the study should be replicated or that the results were a fluke. Again, as we said before, you need to be critical when reading the literature.
Basic Statistical Procedures
Tests of Significance
t Test.
The simplest experiment involves two groups, an experimental group and a control group. The researchers treat the groups differently (the IV) and measure their performance (the DV). The question, then, is “Did the treatment work?” Are the groups significantly different after receiving the treatment? If the research involves comparing means from two groups, the t test may be the appropriate test of significance. Be aware that the t test can also be used in nonexperimental studies. For example, a researcher who compares the mean performance of women with that of men might use a t test, but this is not an experiment.
Typically, a researcher will report the group means, whether the difference was statistically significant, and the t-test results. In essence, the t test is an evaluation of the difference between two means relative to the variability in the data. Simply reporting the group means is not enough, because a large difference between two means might not be statistically significant when examined relative to the large variability of the scores of each group. Alternatively, a small difference between two means may be statistically significant if there is very little variation in scores within each group. The t test is a good test when you want to compare two groups, but what if you have more than two groups?
F Test.
The F test of significance is used to compare means of more than two groups. There are numerous experimental (and quasi-experimental) designs, known as ANOVAs, that are analyzed with the F test. Indeed, when we were graduate students, we took entire courses in ANOVA. In general, the F test, like the t test, compares between-group variability with within-group variability.
As with the t test, the researcher will report the group means and whether the differences were statistically significant. From a significant F test, the researcher knows that at least two means were significantly different. To specify which groups were different from which others, the researcher must follow the F test with post hoc (after the fact) comparisons. For example, if there were three groups and the F test was statistically significant, a post hoc test might find that all three group means were statistically significantly different or perhaps that only one mean differed from the other two. There are a large number of post hoc tests (e.g., Scheffé, Tukey’s least significant difference, and Bonferroni) that have slightly different applications. What is common to all these tests is that each produces a p value that is used to indicate which means differ from which.
As indicated above, many designs are analyzed with an F test, and they have names that indicate the number of IVs. You will find a one-way ANOVA used when there is one IV, a two-way ANOVA when there are two IVs, and a three-way ANOVA (you guessed it) when there are three. A null hypothesis is tested for each IV by calculating an F statistic. The advantage of the two- and three-way ANOVAs is that an interaction effect can also be tested. An interaction occurs when different combinations of the levels of the IVs have different effects on the DV. For example, if we wanted to investigate the effect of environmental noise (silent vs. noisy) on reading comprehension and the effect of different-colored paper (white, yellow, and pink) on reading comprehension, we could use a two-way ANOVA to evaluate the effect of each IV and also whether the color of paper might interact with the noise to influence reading comprehension. It may be that noise produces a reduction in reading comprehension for white paper but not for yellow or pink paper. The interaction effect is important because it indicates that a variable is acting as a moderating variable. In this example, the effect of environmental noise on reading comprehension is moderated by the color of the paper.
There is another type of ANOVA that is used to control for a possible confounding variable. This procedure also uses the F statistic and is called analysis of covariance, or ANCOVA. Using our paper color example, suppose we want to test whether the color of paper will influence reading comprehension, but our participants vary considerably in age. This could pose a serious confound because reading comprehension changes with age. If we measured age, we can use ANCOVA to remove variability in reading comprehension that is due to age and then test the effect of color. The procedure removes the variance due to age from the DV before the F is calculated for the effect of color. Consequently, we are testing the effect of color after we have taken into account the effect of age.
The statistics described here are useful for comparing group means, but you may come across research where the variables are categories and the data are summarized by counting the frequency of things. When there are frequency counts instead of scores, you may see a chi-square test.
Chi-Square Test.
Do people prefer Coke or Pepsi? Suppose we have offered both drinks and asked people to declare a preference. We count the number of people preferring each drink. These data are not measures, and means cannot be calculated. If people’s preference did not differ between the two drinks, we would expect about the same number of people to pick each, and we could use a chi-square test, called the goodness-of-fit test, to test our hypothesis. In chi-square, our null hypothesis is that our observed frequencies will not be different from those we would expect by chance.
In the literature, you will likely see the data summarized by reporting the frequencies of each category either as total counts or perhaps as percentages of the total. Then you may read a statement that the frequencies in the groups are statistically significant, followed by a report of the chi-square statistic and p value.
Chi-square is called a nonparametric or distribution-free test because the test does not make the assumption that the population is distributed normally. Indeed, hypotheses about the shape of the population distribution are exactly