Clinical trials in epidemiology usually take the form of randomized controlled trials. The aim is to see if an intervention to alter exposure results, over time, in a change in outcomes. In epidemiology, clinical trials can involve thousands of subjects followed over many years, whereas studies relating to drug treatments or food interventions relating to changes in blood or urine biochemistry may involve only tens of subjects. The purpose in having such large studies in epidemiology is to be able to generalize to the population with confidence, and to take into account the many confounders (see below) that may operate in the real world.
Researchers usually strive to achieve ‘double blind’ status in their study designs, but in nutritional interventions this may not always be possible. If the intervention involves changing diet, for example, or providing nutritional advice to increase fruit and vegetable consumption, the subject may be aware of the changes and is therefore no longer ‘blind’. Similarly, the researcher involved in administering the changes may be aware of which subjects are in the treatment group and which are in the placebo group. If this is the case, it is important to ensure that the person undertaking the statistical analysis is blinded to the identity of the groups being compared. This can be done through the coding of results for computer analysis so that the comparison is simply between group A and group B. Only at the end of the analysis is the identity of the group revealed. Even here, it may not always be possible to blind the analyst. In that case, the procedures for carrying out the statistical analyses should be described in advance so as to avoid a ‘fishing expedition’, hunting for statistically significant findings.
Community trials are intervention studies carried out in whole communities to see if some change in exposure is associated with a subsequent change in disease or mortality rates. Again, the study will involve a treatment community and a placebo community. The communities are matched (e.g. age and sex structure of the population, percentage of the population not in work). There may be more than one community in each group.
Community trials are pragmatic in nature. The aim is to see if community‐based interventions have sufficient penetration and impact to bring about changes in nutrition‐related outcomes (e.g. the percentage of adults over the age of 45 who are overweight or obese). The identity of the individuals in the community who make the desired changes and their individual outcomes may not be known. A questionnaire can be used to find out the proportion of the population who were aware of the intervention (for example, advice on healthy eating provided in a GP surgery by the community dietitian), and height and weight data could be taken from the routine ‘healthy man’ or ‘healthy woman’ screening programmes being run in the same groups of GP surgeries.
Community trials are much cheaper to run than clinical trials. They do not, however, provide the same wealth of detail about individual behaviours and outcomes and, as a consequence, provide less insight into causal mechanisms.
Confounding and Bias
A key feature of epidemiological studies is the attention paid to confounding factors and bias. Confounding factors are associated with both the exposure and the outcome. Suppose that we were comparing a group of cases who had recently had their first heart attack with a group of controls who had never had a heart attack. Say that we were interested in knowing if higher oily fish consumption was associated with lower rates of heart attack. We could measure oily fish consumption using a food frequency questionnaire that asked about usual diet. Suppose we found that it was lower among the cases. Is this good evidence that higher oily fish consumption is protective against having a heart attack? What if the cases, on average, were 10 years older than the controls, and that younger people tended to have higher levels of oily fish in their diet. This could explain the apparent association of higher oily fish consumption with decreased risk of heart attack. In this case, age would be referred to as a confounding factor. Confounding factors need to be associated with both the exposure and the outcome that we are interested in. We could match our cases with controls of the same age. Alternatively, we could use a statistical approach that took age into account in the analysis. The most common confounding factors – things like age, gender, social class, and education – need to be controlled for when comparing one group with another. Other factors such as smoking, disease status (e.g. diabetes), or body mass index (BMI) may also be taken into account, but these may be explanatory factors or factors in the causal pathway rather than true confounders.
Bias is a problem associated with measuring instruments or interviewers. Systematic bias occurs when everyone is measured with an instrument that always gives an answer that is too high or too low (like an inaccurate weighing machine). Bias can be constant (every measurement is inaccurate by the same amount) and or proportional (the size of the error is proportional to the size of the measurement, e.g. the more you weigh the greater the inaccuracy in the measurement). Bias is a factor that can affect any study and should be carefully controlled.
Some types of bias may simply reduce our ability to detect associations between exposure and outcome. This is ‘noise in the system’. It means that there may be an association between exposure and outcome, but our data are too ‘noisy’ for us to be able to detect it. For example, we know that there is day‐to‐day and week‐to‐week variation in food and drink consumption. We need to try and collect sufficient information to be able to classify subjects according to their ‘usual’ consumption.
Other types of bias mean that the information that we obtain is influenced by the respondent’s ability to give us accurate information. Subjects who are overweight or obese, for example, or who have higher levels of dietary restraint, tend to under‐report their overall food consumption, especially things like confectionery or foods perceived as ‘fatty’. Subjects who are more health‐conscious may over‐report their fruit and vegetable consumption because they regard these foods as ‘healthy’ and want to make a good impression on the interviewer. In these instances, making comparisons between groups becomes problematic because the amount of bias is related to the type of individual which may in turn be related to their disease risk.
Dealing with issues such as confounding, residual confounding, factors in the causal pathway, and different types of bias are fully addressed in epidemiological textbooks [9, 10].
1.7 DATA, RESULTS, AND PRESENTATION
First of all, a few definitions are needed:
Statistic – a numerical observation
Statistics – numerical facts systematically collected (also the science of the analysis of data)
Data – what you collect (the word ‘data’ is plural – the singular is ‘datum’ – so we say ‘the data are…’ not ‘the data is…’)
Results – a summary of your data
1.7.1 Data Are What You Collect, Results Are What You Report
No one else is as interested in your data as you are. You must love your data, look after them carefully (think of the cuddly statistician), and cherish each observation. You must make sure that every observation collected is accurate, and that when the data are entered into a spreadsheet, they do not contain any errors. When you have entered all your data, you need to ‘clean’ your data, making sure that there are no rogue values, and that the mean and the distribution of values is roughly what you were expecting. Trapping the errors at this stage is essential. There is nothing worse than spending days or weeks undertaking detailed statistical analysis and preparing tables and figures for a report, only to discover that there are errors in your data set, meaning that you have to go back and do everything all over again.
TIP
Allow adequate time in your project to clean the data properly. This means
Check for values that are outside the range of permitted values.