Medical Statistics. David Machin. Читать онлайн. Newlib. NEWLIB.NET

Автор: David Machin
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Медицина
Год издания: 0
isbn: 9781119423652
Скачать книгу
the same data displayed as a pie chart. One often sees pie charts in the literature. However, generally they are to be avoided as they can be difficult to interpret, particularly when the number of categories becomes greater than five. In addition, unless the percentages in the individual categories are displayed (as here) it can be much more difficult to estimate them from a pie chart than from a bar chart. For both chart types it is important to include the number of observations on which it is based, particularly when comparing more than one chart. Neither of these charts should be displayed in three dimensions (see Figure 2.3b for a three‐dimensional pie chart). Three‐dimensional charts feature in many spreadsheet packages, but are not recommended since they distort the information presented. They make it very difficult to extract the correct information from the figure, and, for example in Figure 2.3b the sectors that appear nearer the reader are over emphasised.

Pie chart depicting where 202 patients with foot corns were treated.

      (Source: Farndon et al. 2013).

      If the sample is further classified into whether the patient was treated with corn plasters or scalpel then it becomes impossible to present the data as a single pie or bar chart. We could present the data as two separate pie‐charts or bar charts side by side but it is preferably to present the data in one graph with the same scales and axes to make the visual comparisons easier.

      (Source: Farndon et al. 2013).

      If you do use the relative frequency scale as we have, then it is recommended good practice to report the actual total sample sizes for each group in the legend. In this way, given the total sample size and relative frequency (from the height of the bars) we can work out the actual numbers treated in each centre.

      A quantitative measurement contains more information than a categorical one, and so summarising these data is more complex. One chooses summary statistics to condense a large amount of information into a few intelligible numbers, the sort that could be communicated verbally. The two most important pieces of information about a quantitative measurement are ‘what is the average value?’ and ‘what is the spread of the data?’ These are categorised as measures of location (sometimes ‘central tendency’) and measures of spread or variability. A measure of location (average) and variability (spread) provides an informative but brief summary of a set of observations.

      Measures of Location – The Three ‘Ms’ – Mean, Median and Mode

       Mean or Average

      The arithmetic mean or average of n observations images(pronounced x bar) is simply the sum of the observations divided by their number; thus

equation

      In the above equation, xi represents the individual sample values and images their sum. The Greek letter ‘∑’ (sigma) is the Greek capital ‘S’ and stands for ‘sum’ and simply means ‘add up the n observations xi from the first to the last (nth)’.

       Example – Calculation of the Mean – Corn Size Data (mm)

      In the randomised controlled trial that investigated the effectiveness of salicylic acid plasters compared with usual scalpel debridement for treatment of foot corns (Farndon et al. 2013), the baseline size of the index corn (at its widest diameter in mm) was measured by an independent podiatrist (foot specialist) who was not involved in the subsequent treatment of the patients. Consider the following 16 baseline corn sizes in mm, listed in ascending order, selected randomly from the 200 patients, with valid baseline corn size data, in the trial.

equation equation

      Thus, the mean images = 58/16 = 3.625 mm or 3.6 mm. It is usual to quote one more decimal place for the mean than the data recorded.

      The major advantage of the mean is that it uses all the data values and is, in a statistical sense, therefore efficient. The mean also characterises some important statistical distributions to be discussed in Chapter 4. The main disadvantage of the mean is that it is vulnerable to what are known as outliers. Outliers are single observations that, if excluded from the calculations, have noticeable influence on the results. For example, if we had entered ‘100 mm’ instead of ‘10 mm’, for the 16th patient, in the calculation of the mean, we would find the mean changed from 3.6 to 9.3 mm. It does not necessarily follow, however, that outliers should be excluded from the final data summary, or that they result from an erroneous measurement.

      If the data are binary, that is nominal and are coded 0 or 1, then images is the proportion of individuals with value 1, and this can also be expressed as a percentage. In the foot corn plaster trial, the corn had healed or resolved by a three‐month follow‐up in 52 out of 189 patients. If whether the corn was healed at a three‐month post‐randomisation follow‐up is coded as a ‘1’ for ‘yes, healed’, and a ‘0’ for ‘no, not healed’, then the mean of this variable is 0.257 or 25.7%.

       Median

      The median is estimated by first ordering the data from smallest to largest, and then counting upwards for half the observations. The estimate of the median is either the observation at the centre of the ordering in the case of an odd number of observations, or the simple average of the middle two observations if the total number of observations is even.

       Example – Calculation of the Median – Corn Size Data