1.1 Introduction
Although some health care practitioners may not carry out medical research, they will definitely be consumers of medical research. Thus, it is incumbent on them to be able to discern good studies from bad, to be able to verify whether the conclusions of a study are valid and to understand the limitations of such studies. The current emphasis on evidence‐based medicine (EBM), or more comprehensively evidence‐based health care (EBHC), requires that health care practitioners consider critically all evidence about whether a specific treatment works and this requires basic statistical knowledge.
Statistics is not only a discipline in its own right but it is also a fundamental tool for investigation in all biological and medical sciences. As such, any serious investigator in these fields must have a grasp of the basic principles. With modern computer facilities there is little need for familiarity with the technical details of statistical calculations. However, a health care professional should understand when such calculations are valid, when they are not and how they should be interpreted.
The use of statistical methods pervades the medical literature. In a survey of 305 original articles published in three UK journals of general practice: British Medical Journal (General Practice Section), British Journal of General Practice and Family Practice, over a one‐year period, Rigby et al. (2004) found that 66% used some form of statistical analysis. Another review by Strasak et al. (2007) of 91 original research articles published in The New England Journal of Medicine (NEJM) in 2004 (one of the prestigious peer‐reviewed medical journals) found an even higher figure with 95% containing inferential statistics, for example, testing hypotheses and deriving estimates. It appears, therefore, that the majority of papers published in these journals require some statistical knowledge for a complete understanding.
1.2 Why Use Statistics?
To students schooled in the ‘hard’ sciences of physics and chemistry it may be difficult to appreciate the variability of biological data. If one repeatedly puts blue litmus paper into acid solutions it turns red 100% of the time, not most (say 95%) of the time. In contrast, if one gives aspirin to a group of people with headaches, not all of them will experience relief. Penicillin was perhaps one of the few ‘miracle’ cures where the results were so dramatic that little evaluation was required. Absolute certainty in medicine is rare.
Measurements on human subjects seldom give exactly the same results from one occasion to the next. For example, O'Sullivan et al. (1999), found that the systolic blood pressure (SBP) in normal healthy children has a wide range, with 95% of children having SBPs below 130 mmHg when they were resting, rising to 160 mmHg during the school day, and falling again to below 130 mmHg at night. Furthermore, Hansen et al. (2010) in a study of over 8000 subjects found that increasing variability in blood pressure over 24 hours was a significant and independent predictor of mortality and of cardiovascular and stroke events.
Diagnostic tests are not perfect. Simply because a test for a disease is positive does not mean that the patient necessarily has the disease. Similarly, a negative test does not mean the patient is necessarily disease free. The UK National Health Service invites all women aged 50–70 for breast screening every three years. According to the NHS Breast Screening Information Leaflet (2018, https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/840343/Breast_screening_helping_you_decide.pdf): if 100 women have breast screening; 96 will have a normal result and 4 will need more tests. Of these, 1 cancer will be confirmed whilst 3 women will have no cancer detected.
One would think that pathologists, at least, would be consistent. However, a review by Elmore et al. (2017) showed that when it came to diagnosing melanotic skin lesions, in only 83% of cases where a lone pathologist made a diagnosis would the same diagnosis be confirmed by an independent panel. In 8% of cases the lone pathologist would give a worse prognosis, and in 9% of cases they would have underestimated the severity of the disease.
This variability is also inherent in responses to biological hazards. Most people now accept that cigarette smoking causes lung cancer and heart disease, and yet nearly everyone can point to an apparently healthy 80‐year‐old who has smoked for many years without apparent ill effect. Although it is now known from the report of Doll et al. (2004) that about half of all persistent cigarette smokers are killed by their habit, it is usually forgotten that until the 1950s, the cause of the rise in lung cancer deaths was a mystery and commonly associated with general atmospheric pollution from, for example, exhaust fumes of cars. It was not until the carefully designed and statistically analysed case–control and cohort studies of Richard Doll and Austin Bradford Hill and others, that smoking was identified as the true cause. Enstrom et al. (2003) moved the debate on to ask whether or not passive smoking causes lung cancer. This is a more difficult question to answer since the association is weaker. However, studies by Cao et al. (2015) have now shown that it is a major health problem and scientists at the International Agency for Research on Cancer (IARC) have concluded that there is sufficient evidence that second‐hand smoke causes lung cancer (IARC 2012). Restrictions on smoking in public places have been one consequence and in England and Wales since 1 October 2015 it has been illegal to smoke in a vehicle carrying anyone under the age of 18.
With such variability, it follows that in any comparison made in a medical context, such as people on different treatments, differences are almost bound to occur. These differences may be due to real effects, random variation or variation in some other factor that may affect an outcome. It is the job of the analyst to decide how much variation should be ascribed to chance or other factors, so that any remaining variation can be assumed to be due to a real effect. This is the art of statistics.
1.3 Statistics is About Common Sense and Good Design
A well‐designed study, poorly analysed, can be rescued by a reanalysis but a poorly designed study is beyond the redemption of even sophisticated statistical manipulation. Many experimenters consult the medical statistician only at the end of the study when the data have been collected. They believe that the job of the statistician is simply to analyse the data and, with powerful computers available, even complex studies with many variables can be easily processed. However, analysis is only part of a statistician's job, and calculation of the final ‘P‐value’ a minor one at that!
A far more important task for the medical statistician is to ensure that results are comparable and generalisable.
Example from the Literature – Drinking Coffee and Cancer (IARC 2018)
In 2016, a working group of 23 scientists from 10 countries met at IARC in Lyon, France, to review the research evidence of whether or not drinking coffee is carcinogenic and causes cancer. They reviewed the available data from more than 1000 observational and experimental studies. In rating the evidence, the working group gave the greatest weight to well‐conducted studies that controlled satisfactorily for important potential confounders, including tobacco and alcohol consumption. For bladder cancer, they found no consistent evidence of an association with drinking coffee, or of a dose–response relationship, that is drinking more coffee increased the incidence of cancer. In several studies, the relative risks of cancer for those drinking coffee compared to non‐drinkers were increased in men but women were either not affected or the risk decreased. IARC (2018) concluded from this that there was no evidence that drinking