Background Clinical guidelines commonly recommend preventative treatments for people above a risk threshold. Therefore, decision-makers must have faith in risk prediction tools and model-based cost-effectiveness analyses for people at different levels of risk. Two problems that arise are inadequate handling of competing risks of death and failing to account for direct treatment disutility (i.e. the hassle of taking treatments). We explored these issues using two case studies: primary prevention of cardiovascular disease using statins and osteoporotic fracture using bisphosphonates. Objectives Externally validate three risk prediction tools [QRISK®3, QRISK®-Lifetime, QFracture-2012 (ClinRisk Ltd, Leeds, UK)]; derive and internally validate new risk prediction tools for cardiovascular disease [competing mortality risk model with Charlson Comorbidity Index (CRISK-CCI)] and fracture (CFracture), accounting for competing-cause death; quantify direct treatment disutility for statins and bisphosphonates; and examine the effect of competing risks and direct treatment disutility on the cost-effectiveness of preventative treatments. Design, participants, main outcome measures, data sources Discrimination and calibration of risk prediction models (Clinical Practice Research Datalink participants: aged 25–84 years for cardiovascular disease and aged 30–99 years for fractures); direct treatment disutility was elicited in online stated-preference surveys (people with/people without experience of statins/bisphosphonates); costs and quality-adjusted life-years were determined from decision-analytic modelling (updated models used in National Institute for Health and Care Excellence decision-making). Results CRISK-CCI has excellent discrimination, similar to that of QRISK3 (Harrell’s c = 0.864 vs. 0.865, respectively, for women; and 0.819 vs. 0.834, respectively, for men). CRISK-CCI has systematically better calibration, although both models overpredict in high-risk subgroups. People recommended for treatment (10-year risk of ≥ 10%) are younger when using QRISK-Lifetime than when using QRISK3, and have fewer observed events in a 10-year follow-up (4.0% vs. 11.9%, respectively, for women; and 4.3% vs. 10.8%, respectively, for men). QFracture-2012 underpredicts fractures, owing to under-ascertainment of events in its derivation. However, there is major overprediction among people aged 85–99 years and/or with multiple long-term conditions. CFracture is better calibrated, although it also overpredicts among older people. In a time trade-off exercise (n = 879), statins exhibited direct treatment disutility of 0.034; for bisphosphonates, it was greater, at 0.067. Inconvenience also influenced preferences in best–worst scaling (n = 631). Updated cost-effectiveness analysis generates more quality-adjusted life-years among people with below-average cardiovascular risk and fewer among people with above-average risk. If people experience disutility when taking statins, the cardiovascular risk threshold at which benefits outweigh harms rises with age (≥ 8% 10-year risk at 40 years of age; ≥ 38% 10-year risk at 80 years of age). Assuming that everyone experiences population-average direct treatment disutility with oral bisphosphonates, treatment is net harmful at all levels of risk. Limitations Treating data as missing at random is a strong assumption in risk prediction model derivation. Disentangling the effect of statins from secular trends in cardiovascular disease in the previous two decades is challenging. Validating lifetime risk prediction is impossible without using very historical data. Respondents to our stated-preference survey may not be representative of the population. There is no consensus on which direct treatment disutilities should be used for cost-effectiveness analyses. Not all the inputs to the cost-effectiveness models could be updated. Conclusions Ignoring competing mortality in risk prediction overestimates the risk of cardiovascular events and fracture, especially among older people and those with multimorbidity. Adjustment for competing risk does not meaningfully alter cost-effectiveness of these preventative interventions, but direct treatment disutility is measurable and has the potential to alter the balance of benefits and harms. We argue that this is best addressed in individual-level shared decision-making. Study registration This study is registered as PROSPERO CRD42021249959. Funding This award was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research programme (NIHR award ref: 15/12/22) and is published in full in Health and Social Care Delivery Research; Vol. 12, No. 4. See the NIHR Funding and Awards website for further award information. Plain language summary Before offering a medicine to prevent disease, prescribers must expect it to do more good than harm. This balance depends on how likely it is that the person will develop the disease we want to prevent. But people might first die for other reasons. We call this a ‘competing risk’. In most cases, the mathematical tools we use to estimate the chance of developing a disease do not account for competing risks. Another problem is that, when weighing up the benefits and harms of medicines, we ignore the hassle they cause patients, even when they do not cause side effects. We used two examples: statins to prevent heart disease and bisphosphonates to prevent fractures. First, we assessed if existing tools get predictions wrong by not accounting for competing risks. We found that they exaggerate the chance of heart attacks and strokes. However, the exaggeration is greatest among people who would clearly benefit from preventative treatment. So it may not change treatment decisions much. The fracture prediction tool we studied was very inaccurate, exaggerating risk among older people, but underestimating risk among younger people. We made a new fracture risk prediction tool. It gave better predictions, but it was still inaccurate for people aged > 85 years and those with several health problems. Next, we asked people questions designed to put a number on the hassle that statins and bisphosphonates cause. Most people thought that taking either is inconvenient, but the hassle factor for bisphosphonates is bigger. Finally, we updated the mathematical models that the National Institute for Health and Care Excellence used when recommending statins and bisphosphonates. We worked out if competing risks and the hassle of taking medicines make a difference to results. Statins remain a good idea for almost everyone, unless they really hate the idea of taking them. But bisphosphonates would do more harm than good for anyone who agrees with the hassle factor we found. Scientific summary Background Clinical guidelines help define and disseminate best practice. Guidelines increasingly use risk prediction tools to help target primary preventative treatments at people at highest risk. In National Institute for Health and Care Excellence (NICE) guidelines, the choice of risk threshold is commonly informed by model-based cost-effectiveness analyses (CEAs) for different levels of baseline risk. Risk prediction modelling and model-based CEA are therefore increasingly important for developing guidelines that recommend long-term preventative medicines, including primary prevention of cardiovascular disease (CVD) using statins and prevention of osteoporotic fracture using bisphosphonates. Risk prediction and competing mortality risk Most risk prediction models do not account for competing mortality risk, which is when someone dies of another condition (e.g. lung cancer) before experiencing the event being predicted (e.g. CVD or fracture). This can lead to overprediction of event rates among older people and those with multimorbidity. Model-based cost-effectiveness analysis Competing mortality risk is accounted for in model-based CEA, but whole-population estimates of competing mortality will not be correct at all levels of risk of CVD and fracture. Existing models also do not account for all harms, notably direct treatment disutility (DTD), which is the disutility arising from the hassles of taking treatments. Even small levels of DTD can be enough to outweigh relatively small lifetime benefits of primary prevention medication, but, to our knowledge, DTD impact has not been systematically estimated previously. Aim and objectives The overall aim was to improve the evidence generated from risk prediction models and model-based CEAs to inform decision-making for selecting primary prevention treatments for CVD and osteoporotic fracture. The prespecified objectives were to: externally validate the recommended risk prediction tools for primary prevention of CVD [QRISK®3 (ClinRisk Ltd, Leeds, UK)] and for osteoporotic fracture [QFracture-2012 (ClinRisk Ltd)] derive and internally validate new CVD and osteoporotic fracture risk prediction models accounting for competing risks of death externally validate the QRISK-Lifetime CVD risk prediction tool quantify the magnitude, variation and distribution of DTD in the general population and among people treated with statins or bisphosphonates examine the effect of accounting for competing risks and DTD on cost-effectiveness in the context of statins and bisphosphonates for the primary prevention of CVD and osteoporotic fracture, respectively. The prediction modelling protocol was approved by the Clinical Practice Research Datalink (CPRD) Independent Scientific Advisory Committee (reference number 16_248). The Health Research Authority approved the DTD elicitation study (Integrated Research Application System: 220,492) and granted ethics approval (Research Ethics Committee: 17/NW/0124). A systematic review for CEA model parameters was registered with PROSPERO (CRD42021249959). Methods Objective 1 methods For CVD modelling, CPRD GOLD data were used to define a cohort aged 25–84 years without CVD or prior statin prescription. The outcome was incident CVD. Multiple imputation was used to account for missing data. The performance of the published QRISK3–2017 model was evaluated in terms of discrimination (the ability of a tool to distinguish between those with and those without an event) and calibration (whether or not predicted risk is the same as observed risk) in the whole population, stratified by age and Charlson Comorbidity Index (CCI), and in subgroups with type 1 diabetes, type 2 diabetes and chronic kidney disease (CKD). Observed risk was estimated with and without accounting for competing risk (using Aalen–Johansen and Kaplan–Meier estimators, respectively). For fracture modelling, the cohort was aged 30–99 years (prior fracture or bisphosphonate treatment were allowed) with follow-up to specified fracture, death from non-fracture causes, deregistration or end of study. Two outcomes were defined: major osteoporotic fracture (MOF) and hip fracture. QFracture-2012 performance was evaluated as for QRISK3. For both cohorts, the earliest study entry date was 1 January 2004 and the end of the study was 31 March 2016. Objective 2 methods Using the same data set as objective 1, participants were randomly allocated to derivation and test data sets in a 2 : 1 ratio. For CVD, two Fine–Gray models were derived in the derivation data set and internally validated in the test data set, alongside QRISK3. The competing mortality risk model (CRISK) accounted for competing mortality only, whereas the competing mortality risk model with Charlson Comorbidity Index (CRISK-CCI) also included the modified CCI as a predictor. Model performance was examined using discrimination and calibration. For fracture, separate Fine–Gray models (CFracture) were estimated for MOF and hip fracture. Objective 3 methods The same data were used as for objective 1, but with the age range restricted to 30–84 years to match QRISK®-Lifetime (ClinRisk Ltd). As lifetime risk is not observed in this data set, model performance was evaluated at 10 years, and reclassification examined the characteristics of those recommended for treatment on the basis of a QRISK3 10-year risk of > 10%, a QRISK-Lifetime 10-year risk of > 10% and the QRISK-Lifetime highest risk, with thresholds chosen to recommend the same number of people for treatment as with QRISK3 > 10%. Objective 4 methods Two groups of participants were recruited to studies to elicit DTD of preventative statins and bisphosphonates: people with direct experience of taking one of the medicines and a sample of the general population. We described the process of taking each medicine (one tablet per day for statins, one tablet per week taken on an empty stomach with a requirement to stay upright for at least 30 minutes for bisphosphonates). Elicitation used time trade-off (TTO) (primary analysis) and best–worst scaling (BWS) (exploratory analysis) surveys iteratively developed using think-aloud interviews with 19 patients, and online pilot studies. Objective 5 methods For statins for the primary prevention of CVD, we modified the cohort-level decision-analytic model used in NICE’s lipid modification guideline [NICE. Lipid Modification: Cardiovascular Risk Assessment and the Modification of Blood Lipids for the Primary and Secondary Prevention of Cardiovascular Disease. Clinical Guideline (CG181). Methods, Evidence and Recommendations. July 2014. URL: https://web.archive.org/web/20220201050407/https://www.nice.org.uk/guidance/cg181/evidence/lipid-modification-update-full-guideline-pdf-243786637 (accessed 12 October 2022)]. General updates included rapid reviews to identify utility values and costs associated with CVD events, new regressions to predict baseline quality of life for people without CVD (based on Health Survey for England data) and type of first CVD event (based on data from objective 1), and inputs (costs, life expectancy) were updated to present-day values. For bisphosphonates for the prevention of fracture, we used the discrete-event simulation developed for NICE’s Technology Appraisal 464 [NICE. Bisphosphonates for Treating Osteoporosis. Technology Appraisal Guidance (TA464). London: NICE; 2017]. For both models, we explored competing risk by parameterising probability of non-cause-specific death using relative survival models adjusting for predicted risk (QRISK3 or QFracture-2012). We incorporated DTD as elicited in objective 4 under three assumptions (lifelong, time limited, diminishing over time). We explored how these factors alone or in combination affect the estimated value of the preventative medicines in terms of cost per quality-adjusted life-year (QALY). Results Objectives 1 and 2: predicting cardiovascular disease Discrimination of QRISK3 in the whole external validation cohort was excellent (Harrell’s c = 0.865 for women, 0.834 for men), and comparable to the previous internal validation. However, discrimination was worse among people with more comorbidity, and was poor to moderate among older people (e.g. c = 0.611 for women and 0.585 for men aged 75–84 years). Calibration in the whole population, ignoring competing risks, was very good, with minor overprediction. There was larger overprediction among older people, which was considerable after accounting for competing risks. Among people with type 1 diabetes, discrimination was excellent (c = 0.830 for women, 0.853 for men). There was evidence of overprediction at higher levels of predicted risk, which was larger after accounting for competing risks, although most overprediction happened well above the NICE 10% threshold for offering treatment. Discrimination among people with CKD was only moderate (women, c = 0.705; men, c = 0.671), but calibration was reasonable at recommended treatment thresholds. The new competing risk model (CRISK-CCI) had similar discrimination to QRISK3 in the whole population (women, c = 0.864; men, c = 0819), with the same pattern of worse discrimination among older people and those with more comorbidity. Calibration was systematically better than QRISK3, although, as with QRISK3, there was overprediction in some subgroups with high predicted risk. Objectives 1 and 2: predicting fracture Observed age-stratified incidences of both MOF and hip fracture were considerably higher in this study than in a previous external validation, which was partly explained by the use of hospital data in this study to ascertain fractures. Discrimination of QFracture-2012 in external validation was excellent among women (MOF, c = 0.813; hip fracture, c = 0.918) and good to excellent among men (MOF, c = 0.738; hip fracture, c = 0.888), similar to QFracture-2012 internal validation, but had poor to moderate discrimination among older people. Ignoring competing risks, QFracture-2012 showed serious underprediction in the whole population and in all subgroups of age and comorbidity, which was worse for hip fracture than for MOF. Accounting for competing risks reduced observed underprediction in the whole population, but there was very major overprediction among older people and at higher levels of predicted risk among people with more comorbidity. The new competing risk model (CFracture) had similar discrimination to QFracture-2012 in the internal validation cohort (women: c = 0.813 for MOF, c = 0.914 for hip fracture; men: c = 0.734 for MOF, c = 0.883 for hip fracture). CFracture was better calibrated than QFracture-2012 but showed overprediction at higher levels of predicted risk for MOF (both sexes) and for hip fracture (among men). CFracture calibration was poor among people aged 85–99 years for both outcomes. Objective 3: predicting lifetime cardiovascular disease risk Evaluated at 10 years’ follow-up, QRISK-Lifetime had excellent discrimination (women, c = 0.844; men, c = 0.808) in the whole population, with the same pattern as QRISK3 and CRISK-CCI of worse discrimination among older people and those with high comorbidity. QRISK-Lifetime underpredicted 10-year risk among people at higher predicted risk, particularly older people, implying that estimated lifetime risk will be underpredicted. A total of 5.3% of participants were recommended for treatment by both QRISK3 and QRISK-Lifetime, and 27.4% by one or the other, but not both. Participants recommended for treatment by QRISK-Lifetime were younger than those recommended by QRISK3 (mean age: women, 50.5 vs. 71.3 years, respectively; men, 46.3 vs. 63.8 years, respectively), were much more likely to have a strong family history of CVD (women: 36.3% vs. 6.3%, respectively; men: 20.0% vs. 7.2%, respectively) and had many fewer observed events during the 10-year follow-up (women with a CVD event: 4.0% vs. 11.9%, respectively; men with a CVD event: 4.3% vs. 10.8%, respectively). Objective 4: direct treatment disutility elicitation When measured by TTO, long-term statin use was associated with mean DTD of 0.034 among people willing to take statins; the equivalent number for bisphosphonates was significantly greater, at 0.067. The findings from the BWS experiment had face validity in that inconvenience influenced preferences. However, the estimated values for DTD are implausibly large. Consistent with previous studies, these findings suggest three distinct preference phenotypes: some people would avoid taking the medicines at all costs, some people see no problem with them and some people are willing to trade length of life to avoid treatment. The first group are unlikely to initiate treatment and the second group do not anticipate DTD; in the third group, depending on the individual’s strength of preference to avoid treatment and the magnitude of expected QALY gains from prevention, DTD may imply that a preventative medicine’s negative characteristics outweigh its benefits. Objective 5: model-based cost-effectiveness analysis General updates to the CVD model made high-intensity statins more cost-effective for primary prevention. Introducing accurate adjustment for competing risk of non-CVD death had the expected effect: more QALYs among people with below-average CVD risk for their sex and age (who experience lower rates of other-cause mortality) and fewer QALYs among people with above-average risk (whose non-CVD life expectancy is attenuated). However, the impact on incremental cost-effectiveness is minor, and statins remain almost universally cost-effective. Incorporating DTD has a more obvious effect, especially when we assume that it applies undiminished for as long as people take statins for primary prevention. Under that circumstance, the threshold at which expected long-term benefits outweigh DTD-related harm rises with age: for a 40-year-old, a 10-year risk of ≥ 8% would be enough to make treatment net beneficial whereas, for an 80-year-old, that figure rises to 38%. The model assessing bisphosphonates for the primary prevention of osteoporotic fragility fracture shows that we overestimate value for money among people at the highest risk if we do not adjust for competing risk of non-fracture death. However, this generally affects only the magnitude of expected net benefit among people for whom some degree of benefit is expected. Even among people at highest risk of fracture, average QALY gains associated with bisphosphonates are small and swamped by DTD of any duration. Consequently, it is impossible to identify any group of people for whom oral bisphosphonates represent an effective use of NHS resources, if we assume population-level average DTD for everyone to whom the decision applies. Conclusions Implications for healthcare Ignoring competing mortality in risk prediction overestimates the risk of CVD and fracture among older people and those with multimorbidity, which will lead to overestimation of the benefits of treatment. This affects fracture risk prediction more than CVD because CVD is a more substantial proportion of total mortality. The QFracture-2012 prediction tool simultaneously underestimates fracture risk among people without high competing mortality risk, partly because it did not include fractures recorded only in hospital data in its derivation. CVD and fracture risk prediction are improved by accounting for competing mortality risks, and transparency of the tools would be improved by fully publishing the codes used to define events and predictors. We have demonstrated an effective method of making accurate adjustment for competing risk of non-cause-specific death in decision-analytic CEAs. Although it made relatively little difference to the estimated cost-effectiveness of preventative interventions in the examples we explored, we have shown that it could potentially be important. Therefore, we recommend that modellers consider this issue when designing analyses of preventative treatments. Although we have demonstrated that DTD exists and has the potential to alter the balance of benefits and harms for preventative treatments, we do not recommend that population-level average DTD is incorporated in base-case CEAs. Rather, we recommend that decision-makers review scenarios with and scenarios without DTD and highlight its possible impact, enabling prescribers to engage in shared decision-making that gives appropriate weight to individual preferences. Research recommendations The excellent discrimination of QRISK3 and QFracture-2012 arises from including a very broad range of ages, but discrimination and calibration in subgroups are less good. Comparing models created in smaller age groups with whole-population models would be useful. Mortality is only one competing risk, and older people and those with multimorbidity are at risk of many different events. It is important to develop models that better account for multiple important events. Cost-effectiveness analysis of statins for the primary prevention of CVD could usefully be further modified to (1) enable stratification according to specific coexisting long-term conditions, (2) account for likely adherence to statins in practice and (3) update secondary transitions reflecting the subsequent natural history of CVD among people experiencing events. Future CEAs of bisphosphonates for the primary prevention of osteoporotic fragility fracture should explore different fracture risk prediction models, and use those based on demonstrable good ascertainment of fractures and accounting for competing mortality risk. Study registration This study is registered as PROSPERO CRD42021249959. Funding This award was funded by the National Institute for Health and Care Research (NIHR) Health and Social Care Delivery Research programme (NIHR award ref: 15/12/22) and is published in full in Health and Social Care Delivery Research; Vol. 12, No. 4. See the NIHR Funding and Awards website for further award information.