Statistical assessment of risk scores involves 2 key factors: discrimination and calibration. Discrimination is the ability of the tool to identify those who will develop the disease and those who will not. This is commonly measured using the area under the curve (AUC) on a receiver operating characteristic curve, which incorporates both sensitivity and specificity. A similar measure is the concordance statistic or “c-statistic” [32]. Values range from 0.5, indicating no discrimination, to 1.0, indicating perfect discrimination. Calibration describes the correlation between risk predicted by the tool and the observed event rate in the population. There are a few methods for assessing calibration, including the Hosmer-Lemeshow test, which compares mean predicted risk to observed outcome rates across deciles of the distribution of expected risks [32].
Model Impact Studies
The implementation of cardiovascular risk scores and risk-based therapeutic decisions should be evaluated in interventional trials to ascertain whether their use actually alters clinical decision making and improves patient outcomes. A 2017 Cochrane review synthesised trials investigating cardiovascular risk scores [34]. It was not specific to patients with diabetes. Forty-one randomised control trials were identified. The review concluded that there was uncertainty as to whether use of risk scores altered cardiovascular event rates but there was some weak evidence that they may lead to more favourable risk factor levels and increased prescribing of preventative medications. Among patients with type 2 diabetes specifically, a systematic review found that only the Framingham Risk Score has been subjected to intervention trials [31]. Two out of the three of these trials had found some benefit with regard to the prescription of preventative medications, but no significant effect was observed on risk of cardiovascular events [35, 36]. Despite the integration of cardiovascular risk prediction models into diabetes guidelines, there is still inadequate clinical evidence to validate their role.
Risk Scores in Diabetes Guidelines
There is controversy about the use of risk scores in patients with diabetes given the cardiovascular risk inferred by diabetes itself. There is also concern that scores developed in general populations may not include diabetes-specific risk factors such as duration of disease and microalbuminuria. Thus, various guidelines have differences in their recommendations relating to the use of risk scores.
Generally, it is accepted that patients with diabetes at particularly high risk do not need evaluation with a risk score. Such patients include those with established cardiovascular disease, micro or macroalbuminuria or markedly elevated single risk factors (e.g., marked hypertension or dyslipidaemia) [37]. At present a number of organisations, such as the European Society of Cardiology, European Association for the Study of Diabetes, American Diabetes Association and Joint British Societies (including, among others, the British Cardiac Society and Diabetes UK), do not recommend the use of risk scores for patients with diabetes [27, 28, 38].
The World Health Organisation recommends the use of their risk prediction charts in patients with diabetes, unless there is overt nephropathy or other significant renal disease [39]. The International Diabetes Federation guidelines suggest assessment of absolute cardiovascular risk as an option for stratifying risk, with equations developed for people with diabetes preferred [37]. The International Diabetes Federation recommends that ultimately the choice of risk assessment should be made at a country level, taking into account local epidemiological data and the potential impact on healthcare resources.
The Framingham risk equation has been one of the most widely utilised and assessed scores. It is less widely recommended in patients with diabetes than previously, but is still recommended in some countries, including Australia unless other factors imply high risk, including age greater than 60 years, presence of microalbuminuria, moderate or severe chronic kidney disease, or markedly elevated systolic blood pressure or cholesterol [40]. The use of a score developed in a non-diabetes specific cohort established decades ago in a predominantly white town in the USA is debated [41]. This risk model has been externally validated in a number of diabetes populations, mostly showing reasonable discrimination (AUC 0.56–0.80) but poor calibration (p values <0.05 with the Hosmer-Lemeshow test) [31]. Since discrimination is based on the ability to score those who go on to have an event as higher than those who do not, and therefore relates to the importance of the risk factors included in the equation, it is not surprising that it is easily generalised from one population to another. However, calibration measures how well the absolute values translate. Thus, a score that underestimates everyone’s risk by 50% would have good discrimination (all those who have an event will have a higher score than those who don’t have an event), but poor calibration, as the annual predicted event rate would be 50% of the real rate. Recalibration can be used to update an old score to contemporary event rates and risk factor levels.
National Institute for Health and Care Excellence guidelines previously recommended use of the UKPDS risk engine but now suggest QRISK2 [29]. The UKPDS risk engine was developed in the trial cohort which enrolled patients with recently diagnosed diabetes. The QRISK2 score is not diabetes-specific having been developed using a large primary care database utilising individual level medical records linked to hospital admissions data. The model has recently been updated (QRISK3) using data from almost 8 million adult patients in the derivation cohort and almost 3 million in the validation cohort [42]. Three models are available for use depending on the patient data available, with the most comprehensive including 18 predictor variables. The outcome is 10-year risk of coronary heart disease, ischaemic stroke or transient ischaemic attack. The score is designed to be built into the electronic medical record and give a computer-generated result. It performed well in the validation cohort with Harrell’s C statistics (similar to AUC) of 0.858–0.880 across the 3 models in men and women, suggesting good discrimination. On average, the score explains almost 60% of the variability in time to diagnosis of cardiovascular disease (R2). However, among patients with known type 2 diabetes, it only explained 25.2% of the variability in women and 22.9% in men. Independent external validation studies have been performed of the earlier QRISK2 score, showing similar results to the validation studies performed within the GP database [42]. There hasn’t been an intervention trial to confirm the score’s clinical impact.