Background Guidelines on the management of depression recommend that practitioners use patient-reported outcome measures for the follow-up monitoring of symptoms, but there is a lack of evidence of benefit in terms of patient outcomes. Objective To test using the Patient Health Questionnaire-9 questionnaire as a patient-reported outcome measure for monitoring depression, training practitioners in interpreting scores and giving patients feedback. Design Parallel-group, cluster-randomised superiority trial; 1 : 1 allocation to intervention and control. Setting UK primary care (141 group general practices in England and Wales). Inclusion criteria Patients aged ≥ 18 years with a new episode of depressive disorder or symptoms, recruited mainly through medical record searches, plus opportunistically in consultations. Exclusions Current depression treatment, dementia, psychosis, substance misuse and risk of suicide. Intervention Administration of the Patient Health Questionnaire-9 questionnaire with patient feedback soon after diagnosis, and at follow-up 10–35 days later, compared with usual care. Primary outcome Beck Depression Inventory, 2nd edition, symptom scores at 12 weeks. Secondary outcomes Beck Depression Inventory, 2nd edition, scores at 26 weeks; antidepressant drug treatment and mental health service contacts; social functioning (Work and Social Adjustment Scale) and quality of life (EuroQol 5-Dimension, five-level) at 12 and 26 weeks; service use over 26 weeks to calculate NHS costs; patient satisfaction at 26 weeks (Medical Informant Satisfaction Scale); and adverse events. Sample size The original target sample of 676 patients recruited was reduced to 554 due to finding a significant correlation between baseline and follow-up values for the primary outcome measure. Randomisation Remote computerised randomisation with minimisation by recruiting university, small/large practice and urban/rural location. Blinding Blinding of participants was impossible given the open cluster design, but self-report outcome measures prevented observer bias. Analysis was blind to allocation. Analysis Linear mixed models were used, adjusted for baseline depression, baseline anxiety, sociodemographic factors, and clustering including practice as random effect. Quality of life and costs were analysed over 26 weeks. Qualitative interviews Practitioner and patient interviews were conducted to reflect on trial processes and use of the Patient Health Questionnaire-9 using the Normalization Process Theory framework. Results Three hundred and two patients were recruited in intervention arm practices and 227 patients were recruited in control practices. Primary outcome data were collected for 252 (83.4%) and 195 (85.9%), respectively. No significant difference in Beck Depression Inventory, 2nd edition, score was found at 12 weeks (adjusted mean difference –0.46, 95% confidence interval –2.16 to 1.26). Nor were significant differences found in Beck Depression Inventory, 2nd Edition, score at 26 weeks, social functioning, patient satisfaction or adverse events. EuroQol-5 Dimensions, five-level version, quality-of-life scores favoured the intervention arm at 26 weeks (adjusted mean difference 0.053, 95% confidence interval 0.013 to 0.093). However, quality-adjusted life-years over 26 weeks were not significantly greater (difference 0.0013, 95% confidence interval –0.0157 to 0.0182). Costs were lower in the intervention arm but, again, not significantly (–£163, 95% confidence interval –£349 to £28). Cost-effectiveness and cost–utility analyses, therefore, suggested that the intervention was dominant over usual care, but with considerable uncertainty around the point estimates. Patients valued using the Patient Health Questionnaire-9 to compare scores at baseline and follow-up, whereas practitioner views were more mixed, with some considering it too time-consuming. Conclusions We found no evidence of improved depression management or outcome at 12 weeks from using the Patient Health Questionnaire-9, but patients’ quality of life was better at 26 weeks, perhaps because feedback of Patient Health Questionnaire-9 scores increased their awareness of improvement in their depression and reduced their anxiety. Further research in primary care should evaluate patient-reported outcome measures including anxiety symptoms, administered remotely, with algorithms delivering clear recommendations for changes in treatment. Study registration This study is registered as IRAS250225 and ISRCTN17299295. Funding This award was funded by the National Institute for Health and Care Research (NIHR) Health Technology Assessment programme (NIHR award ref: 17/42/02) and is published in full in Health Technology Assessment; Vol. 28, No. 17. See the NIHR Funding and Awards website for further award information. Plain language summary Depression is common, can be disabling and costs the nation billions. The National Health Service recommends general practitioners who treat people with depression use symptom questionnaires to help assess whether those people are getting better over time. A symptom questionnaire is one type of patient-reported outcome measure. Patient-reported outcome measures appear to benefit people having therapy and mental health care, but this approach has not been tested thoroughly in general practice. Most people with depression are treated in general practice, so it is important to test patient-reported outcome measures there, too. In this study, we tested whether using a patient-reported outcome measure helps people with depression get better more quickly. The study was a ‘randomised controlled trial’ in general practices, split into two groups. In one group, people with depression completed the Patient Health Questionnaire, or ‘PHQ-9’, patient-reported outcome measure, which measures nine symptoms of depression. In the other group, people with depression were treated as usual without the Patient Health Questionnaire-9. We fed the results of the Patient Health Questionnaire-9 back to the people with depression themselves to show them how severe their depression was and asked them to discuss the results with the practitioners looking after them. We found no differences between the patient-reported outcome measure group and the control group in their level of depression; their work or social life; their satisfaction with care from their practice; or their use of medicines, therapy or specialist care for depression. However, we did find that their quality of life was improved at 6 months, and the costs of the National Health Service services they used were lower. Using the Patient Health Questionnaire-9 can improve patients’ quality of life, perhaps by making them more aware of improvement in their depression symptoms, and less anxious as a result. Future research should test using a patient-reported outcome measure that includes anxiety and processing the answers through a computer to give practitioners clearer advice on possible changes to treatment for depression. Scientific summary Some text in this chapter has been adapted from the study protocol published as: Kendrick T, Moore M, Leydon G, et al. Patient-reported outcome measures for monitoring primary care patients with depression (PROMDEP): study protocol for a randomised controlled trial. Trials 2020;21:441. https://doi.org/10.1186/s13063-020-04344-9. This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article unless otherwise stated. Background Depression is common and costly. It can lead to chronic disability, poor quality of life, suicide, and high service use and costs. National Institute for Health and Care Excellence guidelines recommend different treatments for more severe and less severe depression, but general practitioners, who treat more the majority of people with depression in primary care, are often inaccurate in their global clinical assessments of depression severity, and treatment is not targeted to patients most likely to benefit. The National Institute for Health and Care Excellence recommends that practitioners consider using validated patient-reported outcome measures to inform treatment at diagnosis and follow-up of people with depression, but there is insufficient evidence that these measures improve depression management and outcomes for patients in primary care. Aim and objectives The aim of the study was to answer the research question: What is the effectiveness and cost-effectiveness of assessing primary care patients with depression or low mood soon after diagnosis and again at follow-up 10–35 days later, using the Patient Health Questionnaire-9 combined with patient feedback and practitioner guidance on treatment? The objectives were to (1) carry out a cluster-randomised controlled trial to compare the intervention with usual care; (2) provide intervention arm patients with written feedback on their Patient Health Questionnaire-9 scores, indicating evidence-based treatments relevant to the level of severity of depression to discuss with practitioners; (3) train practitioners to interpret Patient Health Questionnaire-9 scores and their implications for choice of treatment, taking into account contextual factors; (4) follow up participants for 26 weeks, with research assessments at 12 and 26 weeks; (5) determine the primary outcome of depressive symptoms on the Beck Depression Inventory, 2nd edition, at 12-week follow-up; (6) examine secondary outcomes, including depressive symptoms on the Beck Depression Inventory, 2nd edition, at 26 weeks, and social functioning and quality of life at both 12- and 26-week follow-ups; (7) measure patient satisfaction, adverse events, antidepressant treatment, secondary care contacts, service use, and costs over 26 weeks’ follow-up, and perform cost-effectiveness and cost–utility analyses; and (8) carry out a qualitative process analysis to explore participants’ reflections on the use of the Patient Health Questionnaire-9 and the potential for implementing it in practice. Methods The study design was a parallel-group, cluster-randomised superiority trial with 1 : 1 allocation to intervention and control arms. The setting was UK primary care (141 group general practices in England and Wales). Inclusion criteria were age ≥ 18 years with a new episode of depressive disorder or symptoms. Patients were recruited mainly through regular medical records searches but also opportunistically at consultations for new episodes of depression. Exclusion criteria were current treatment for depression; dementia; psychosis; substance misuse; or a significant risk of suicide. The intervention was administration of the Patient Health Questionnaire-9 questionnaire as a PROM soon after diagnosis and at follow-up 10–35 days later. Patients were given written feedback on their Patient Health Questionnaire-9 scores and potential treatments to discuss with their general practitioners. Practitioners were trained in interpreting Patient Health Questionnaire-9 scores and taking them into account in treatment decisions. The primary outcome was depressive symptoms on the Beck Depression Inventory, 2nd edition, at 12 weeks. Secondary outcomes were Beck Depression Inventory, 2nd edition, scores at 26 weeks; social functioning (on the Work and Social Adjustment Scale) and quality of life (on the EuroQol-5 Dimensions, five-level) at 12 and 26 weeks; service use including antidepressant treatment and primary and secondary care contacts over 26 weeks to calculate NHS costs; and patient satisfaction at 26 weeks (on the Medical Informant Satisfaction Scale). For our sample size calculation, we assumed a baseline mean Beck Depression Inventory, 2nd edition, score of 24.0 with a standard deviation of 10.0 (derived from a feasibility study), and mean scores at 12-week follow-up of 14.0 in the intervention arm and 17.0 in the control arm. The anticipated difference of 3.0 points (effect size of 0.3) represented the minimum clinically important difference on the Beck Depression Inventory, 2nd edition. At the 5% level of significance, to have 90% power to detect that difference we calculated we needed 235 patients analysed per arm. We aimed to recruit a mean of six patients per practice and assumed an intracluster correlation coefficient of 0.03 (from the feasibility study), which gave a cluster design effect of 1.15, meaning we needed 270 per arm. We assumed a 20% loss to follow-up at 12 weeks, so the total sample size needed was 270 × 2/0.8 and our original target sample size was a total of 676 patients recruited, from 113 practices, by three recruitment centres (the University of Southampton, the University of Liverpool and University College London). We subsequently revised the target sample size on finding a significant correlation coefficient of > 0.5 between baseline and follow-up values for the primary outcome, which meant that we needed only 222 patients analysed per arm and, therefore, a target sample size of 554 patients recruited (revised 10 June 2021). Cluster randomisation of practices to intervention and control arms was carried out remotely by a Clinical Trials Unit statistician using computerised sequence generation, with minimisation by recruiting centre, size of practice and urban or rural location. Blinding of participating practitioners and patients to allocation was impossible given the nature of the intervention and the cluster-randomised design, but self-report outcome measures were used to prevent researcher rating bias, and statistical analysis was blind to allocation. Differences between intervention and control arms in the outcomes of depressive symptoms, social functioning and quality of life measured at 12- and 26-week follow-up were analysed using linear mixed models, adjusting for baseline depression; duration of depression; history of depression; baseline anxiety; sociodemographic factors (gender, age, socioeconomic position, housing, education, marital status and dependants), and clustering including a random effect for practice. Patient satisfaction, quality of life (quality-adjusted life-years) and costs were compared between the arms over the 26 weeks’ study follow-up period. Differences between the arms in the process of care for depression were also analysed, including patients’ self-reported use of antidepressants at the 12- and 26-week follow-up points, and medication and contacts with mental health services (community mental health nurses, counsellors, psychologists, psychiatrists, other therapists and social workers) recorded in practice medical records over the 26 weeks’ follow-up. A health economic evaluation was undertaken from an NHS and Personal Social Services perspective. The outcomes were expressed as incremental cost per point improvement in the Beck Depression Inventory, 2nd edition, clinical outcome (cost-effectiveness analysis), and incremental cost per quality-adjusted life-year gained (cost–utility analysis). The primary analysis at 26 weeks used a generalised linear mixed model to estimate the differences in costs and quality-adjusted life-years (using the EuroQol-5 Dimensions, five-level to calculate patient utilities), adjusted for baseline quality of life; baseline anxiety; sociodemographic factors; and practice as a random effect. Incremental cost-effectiveness ratios and a cost-effectiveness acceptability curve were generated using non-parametric bootstrapping. Qualitative interviews with participating practitioners and patients in both arms were conducted to reflect on their involvement in the trial and analysed using reflexive thematic analysis. Intervention arm participants were asked about barriers, facilitators, benefits and problems related to using the Patient Health Questionnaire-9, including questions derived from the normalisation process theory framework. Results Practices and patients As the number of patients recruited per practice was smaller than anticipated, we recruited significantly more than our target of 113 practices, eventually reaching a total of 189, but 48 practices subsequently withdrew (24 in each arm), so the final number of active practices was 141: 72 intervention and 69 control (28 above our original target). Practice characteristics were well balanced by arm. Of 11,468 patients approached in consultations or through mailed invitations, 1058 (9.2%) returned reply slips about the study: 574 (10.6% of those approached) in the intervention arm and 484 (8.0% of those approached) in the control arm. After the exclusion of patients declining to participate, ineligible at screening or uncontactable, 529 patients were assessed at baseline: 302 (5.5% of those approached) in the intervention arm and 227 (3.8% of those approached) in the control arm. The ratio of intervention to control arm patients recruited was, therefore, 1.3 to 1, which may have reflected lower motivation to take part among control arm practices. Of 529 patients recruited, 453 (85.6%) were followed up at 12 weeks: 254 intervention arm (84.1%) and 199 control arm (87.7%) patients. At the 26-week point, 414 patients (78.3%) were followed up: 230 intervention arm (76.2%) and 184 control arm (81.1%). Medical records data were collected for 259 intervention arm patients (85.8%) and 201 control arm patients (88.5%). The mean BDI-II score for depressive symptoms at baseline was higher in the intervention arm, at 24.1 (standard deviation 8.89) than in the control arm, at 22.4 (standard deviation 9.52). Baseline anxiety and quality-of-life scores were also worse in the intervention arm. Control arm patients were more likely to have had two or more previous depressive episodes. Demographic characteristics were relatively well balanced. Clinical outcomes At the 12-week follow-up, the mean Beck Depression Inventory, 2nd edition, score was 18.5 (standard deviation 10.2) in the intervention arm and 16.9 (standard deviation 10.3) in the control arm. The adjusted mean score was slightly lower in the intervention arm, but this was not statistically significant (mean adjusted difference –0.46, 95% confidence interval –2.16 to 1.26; p = 0.60). At 26 weeks, the mean Beck Depression Inventory, 2nd edition, scores were 15.1 (standard deviation 10.8) in the intervention arm and 14.7 (standard deviation 10.6) in the control arm (mean adjusted difference –1.63, 95% confidence interval –3.48 to 0.21; p = 0.08). Social functioning on the Work and Social Adjustment Scale and Medical Informant Satisfaction Scale satisfaction with care scores favoured the intervention, but the differences found were not statistically significant. A post hoc analysis at 26 weeks showed similar proportions improving by ≥ 50% on the Beck Depression Inventory, 2nd edition, in the intervention and control arms (45.1% vs. 37.3%), but the proportion remitting to a score of