Because researchers are free to bury any result they please, patients are exposed to harm on a staggering scale throughout the whole of medicine, from research to practice. Doctors can have no idea about the true effects of the treatments they give. Does this drug really work best, or have I simply been deprived of half the data? Nobody can tell. Is this expensive drug worth the money, or have the data simply been massaged? No one can tell. Will this drug kill patients? Is there any evidence that it’s dangerous? No one can tell.
This is a bizarre situation to arise in medicine, a discipline where everything is supposed to be based on evidence, and where everyday practice is bound up in medico-legal anxiety. In one of the most regulated corners of human conduct we’ve taken our eyes off the ball, and allowed the evidence driving practice to be polluted and distorted. It seems unimaginable. We will now see how deep this problem goes.
Why we summarise data
Missing data has been studied extensively in medicine. But before I lay out that evidence, we need to understand exactly why it matters, from a scientific perspective. And for that we need to understand systematic reviews and ‘meta-analysis’. Between them, these are two of the most powerful ideas in modern medicine. They are incredibly simple, but they were invented shockingly late.
When we want to find out if something works or not, we do a trial. This is a very simple process, and the first recorded attempt at some kind of trial was in the Bible (Daniel 1:12, if you’re interested). Firstly, you need an unanswered question: for example, ‘Does giving steroids to a woman delivering a premature baby increase the chances of that baby surviving?’ Then you find some relevant participants, in this case, mothers about to deliver a premature baby. You’ll need a reasonable number of them, let’s say two hundred for this trial. Then you divide them into two groups at random, give the mothers in one group the current best treatment (whatever that is in your town), while the mothers in the other group get current best treatment plus some steroids. Finally, when all two hundred women have gone through your trial, you count up how many babies survived in each group.
This is a real-world question, and lots of trials were done on this topic, starting from 1972 onwards: two trials showed that steroids saved lives, but five showed no significant benefit. Now, you will often hear that doctors disagree when the evidence is mixed, and this is exactly that kind of situation. A doctor with a strong pre-existing belief that steroids work – perhaps preoccupied with some theoretical molecular mechanism, by which the drug might do something useful in the body – could come along and say: ‘Look at these two positive trials! Of course we must give steroids!’ A doctor with a strong prior intuition that steroids were rubbish might point at the five negative trials and say: ‘Overall the evidence shows no benefit. Why take a risk?’
Up until very recently, this was basically how medicine progressed. People would write long, languorous review articles – essays surveying the literature – in which they would cite the trial data they’d come across in a completely unsystematic fashion, often reflecting their own prejudices and values. Then, in the 1980s, people began to do something called a ‘systematic review’. This is a clear, systematic survey of the literature, with the intention of getting all the trial data you can possibly find on one topic, without being biased towards any particular set of findings. In a systematic review, you describe exactly how you looked for data: which databases you searched, which search engines and indexes you used, even what words you searched for. You pre-specify the kinds of studies that can be included in your review, and then you present everything you’ve found, including the papers you rejected, with an explanation of why. By doing this, you ensure that your methods are fully transparent, replicable and open to criticism, providing the reader with a clear and complete picture of the evidence. It may sound like a simple idea, but systematic reviews are extremely rare outside clinical medicine, and are quietly one of the most important and transgressive ideas of the past forty years.
When you’ve got all the trial data in one place, you can conduct something called a meta-analysis, where you bring all the results together in one giant spreadsheet, pool all the data and get one single, summary figure, the most accurate summary of all the data on one clinical question. The output of this is called a ‘blobbogram’, and you can see one on the opposite page, in the logo of the Cochrane Collaboration, a global, non-profit academic organisation that has been producing gold-standard reviews of evidence on important questions in medicine since the 1980s.
This blobbogram shows the results of all the trials done on giving steroids to help premature babies survive. Each horizontal line is a trial: if that line is further to the left, then the trial showed steroids were beneficial and saved lives. The central, vertical line is the ‘line of no effect’: and if the horizontal line of the trial touches the line of no effect, then that trial showed no statistically significant benefit. Some trials are represented by longer horizontal lines: these were smaller trials, with fewer participants, which means they are prone to more error, so the estimate of the benefit has more uncertainty, and therefore the horizontal line is longer. Finally, the diamond at the bottom shows the ‘summary effect’: this is the overall benefit of the intervention, pooling together the results of all the individual trials. These are much narrower than the lines for individual trials, because the estimate is much more accurate: it is summarising the effect of the drug in many more patients. On this blobbogram you can see – because the diamond is a long way from the line of no effect – that giving steroids is hugely beneficial. In fact, it reduces the chances of a premature baby dying by almost half.
The amazing thing about this blobbogram is that it had to be invented, and this happened very late in medicine’s history. For many years we had all the information we needed to know that steroids saved lives, but nobody knew they were effective, because nobody did a systematic review until 1989. As a result, the treatment wasn’t given widely, and huge numbers of babies died unnecessarily; not because we didn’t have the information, but simply because we didn’t synthesise it together properly.
In case you think this is an isolated case, it’s worth examining exactly how broken medicine was until frighteningly recent times. The diagram on the opposite page contains two blobbograms, or ‘forest plots’, showing all the trials ever conducted to see whether giving streptokinase, a clot-busting drug, improves survival in patients who have had a heart attack.11
Look first only at the forest plot on the left. This is a conventional forest plot, from an academic journal, so it’s a little busier than the stylised one in the Cochrane logo. The principles, however, are exactly the same. Each horizontal line is a trial, and you can see that there is a hodgepodge of results, with some trials showing a benefit (they don’t touch the vertical line of no effect, headed ‘1’) and some showing no benefit (they do cross that line). At the bottom, however, you can see the summary effect – a dot on this old-fashioned blobbogram, rather than a diamond. And you can see very clearly that overall, streptokinase saves lives.
So what’s that on the right? It’s something called a cumulative meta-analysis. If you look at the list of studies on the left of the diagram, you can see that they are arranged in order of date. The cumulative meta-analysis on the right adds in each new trial’s results, as they arrived over history, to the previous trials’ results. This gives the best possible running estimate, each year, of how the evidence would have looked at that time, if anyone had bothered to do a meta-analysis on all the data available to them. From this cumulative blobbogram you can see that the horizontal lines, the ‘summary effects’, narrow over time as more and more data is collected, and the estimate of the overall benefit of this treatment becomes more accurate. You can also see that these horizontal lines stopped