Different patients with heart disease observed in the same way may have differing average levels of step counts (physical activity levels) from each other but with similar patterns of variation about these levels. The variation in mean step count levels from patient to patient is termed between‐subject variation.
Observations on different subjects are usually regarded as independent. That is, the data values on one subject are not influenced by those obtained from another. This, however, may not always be the case, particularly with subjective measures such as pain or quality of life which may be influenced by the subject's personal judgement, and different patients may assist each other when recording their quality of life.
In the investigation of total variability it is very important to distinguish within‐subject from between‐subject variability. In a study there may be measures made on different individuals and also repeatedly on the same individual. Between‐ and within‐ subject variation will always be present in any biological material, whether animals, healthy subjects, patients, or histological sections. The experimenter must be aware of possible sources which contribute to the variation, decide which are of importance in the intended study, and design the study appropriately.
2.7 Presentation
Graphs
In any graph there are clearly certain items that are important. For example, scales should be labelled clearly with appropriate dimensions added. The plotting symbols are also important; a graph is used to give an impression of pattern in the data, so bold and relatively large plotting symbols are desirable. This is particularly important if it is to be reduced for publication purposes or presented as a slide in a talk.
A graph should never include too much clutter; for example, many overlapping groups each with a different symbol. In such a case it is usually preferable to give a series of graphs, albeit smaller, in several panels. The choice of scales for the axes will depend on the particular data set. If transformations of the axes are used, for example, plotting on a log scale, it is usually better to mark the axes using the original units as this will be more readily understood by the reader. Breaks in scales should be avoided. If breaks are unavoidable under no circumstances must points on either side of a break be joined. If both axes have the same units, then use the same scale for each. If this cannot be done easily, it is sensible to indicate the line of equality, perhaps faintly in the figure. False impressions of trend, or lack of it, in a time plot can sometimes be introduced by omitting the zero point of the vertical axis. This may falsely make a mild trend, for example a change from 101 to 105, into an apparently strong trend (seemingly as though from 1 to 5). There must always be a compromise between clarity of reproduction that is filling the space available with data points and clarity of message. Appropriate measures of variability should also be included. One such is to indicate the range of values covered by two standard deviations each side of a plotted mean.
It is important to distinguish between a bar chart and a histogram. Bar charts display counts in mutually exclusive categories, and so the bars should have spaces between them. Histograms show the distribution of a continuous variable and so should not have spaces between the bars. It is not acceptable to use a bar‐chart to display a mean with standard error bars (see Chapter 6). These should be indicated with a data point surrounded with errors bars, or better still a 95% confidence interval.
With currently available graphics software one can now perform extensive exploration of the data, not only to determine more carefully their structure, but also to find the best means of summary and presentation. This is usually worth considerable effort.
Tables
Although graphical presentation is very desirable it should not be overlooked that tabular methods are very important (see Table 2.3). In particular, tables can give more precise numerical information than a graph, such as the number of observations, the mean and some measure of variability of each tabular entry. They often take less space than a graph containing the same information. Standard statistical computer software can be programmed to provide basic summary statistics in tabular form on many variables.
2.8 Points When Reading the Literature
1 Is the number of subjects involved clearly stated?
2 Are appropriate measures of location and variation used in the paper? For example, if the distribution of the data is skewed, then has the median rather the mean been quoted? Is it sensible to quote a standard deviation, or would a range or interquartile range, be better? In general do not use SD for data which have skewed distributions.
3 On graphs, are appropriate axes clearly labelled and scales indicated?
4 Do the titles adequately describe the contents of the tables and graphs?
5 Do the graphs indicate the relevant variability? For example, if the main object of the study is a within‐subject comparison, has the within‐subject variability been illustrated?
6 Does the method of display convey all the relevant information in a study? Can one assess the distribution of the data from the information given?
2.9 Technical Details
Calculating the Sample Median
If the n observations in a sample are arranged in increasing or decreasing order, the median is the middle value. If there are n observations the median is the ½(n + 1)th ordered value. If the number of observations, n, is odd there will be a unique median – the ½(n + 1)th ordered value. If n is even, there is strictly no middle observation, but the median is defined by convention as the mean of the two middle observations – the ½nth and (½n + 1)th.
Calculating the median for the foot corn size data, as the number of observations is even (n = 16), the median is the average of the two middle observations – the ½(16)th and ([½ × 16] + 1)th, i.e. the eighth and ninth ordered values. So the median corn size is (3 + 3)/2 = 3 mm.
Calculating the Quartiles and Inter Quartile Range
Arrange the n observations in increasing or decreasing order. Split the data set into four equal parts –or quartiles using three cut‐points:
1 Lower quartile (25th centile) or the ¼(n + 1)th ordered value;
2 Median(50th centile) or the ½ (n + 1)th ordered value;
3 Upper quartile(75th centile) or the ¾(n + 1)th ordered value.
The interquartile range (IQR) is the upper quartile minus the lower quartile.
It should be noted there is not a single standard convention for calculating the quartiles. When the quartile lies between two observations the simplest option is to take the mean of the two observations. A second option is for the lower and upper quartiles to be the ¼(n + 1)th and ¾(n + 1)th ordered values respectively. A third option is for the lower and upper quartiles to be the ¼n+½ and ¾n+ ½ ordered values respectively. It is also common to round the ¼ and ¾ to the nearest integer and only use interpolation