The need to improve the quality of care is well recognised. Yet accomplishing this is complicated, messy, and uncertain, requiring that researchers tackle technical (science) and adaptive (emotional, social, cultural, and political) challenges.1 Tension exists between those who say “just do something” to improve quality and those who say “science should be the guide.”2 The two linked studies (doi:10.1136/bmj.d195; doi:10.1136/bmj.d199) suggest that more science is needed. Benning and colleagues evaluated a large patient safety programme (the Safer Patients Initiative; SPI) in the United Kingdom, led by the Institute for Healthcare Improvement.3 4 The Health Foundation initiated and supported the initiative and, laudably, an independent evaluation, grounded in theory and conducted by experts in epidemiology, biostatistics, medical sociology, health services research, and clinical medicine. They performed a quantitative and qualitative evaluation at organisational and patient levels. The evaluation included five substudies that looked at whether the interventions worked and why. In addition to using a rigorous research design, the authors conducted a state of the art analysis, which included using different approaches to evaluate changes over time in treatment and comparison hospitals. This evaluation will serve as a model for the field. It required, however, an interdisciplinary team of experts and appropriate research funding, both of which are rarerare. The study’s findings are partly encouraging and partly worrying. On the encouraging side, the study provides convincing evidence that safety and quality improved in NHS hospitals in England over the study period (about 18 months). This should provide comfort to UK citizens, the NHS, and parliament. Patients are less likely to be harmed from the care they receive. The NHS should try to understand why these improvements occurred and how they can be strengthened and replicated broadly across the UK. For those who hoped SPI would transform care, the findings are disconcerting. The authors found that the initiative had no discernible additional effect on patient safety; care improved to the same extent in both treatment and comparison hospitals, highlighting the need for robust evaluation with concurrent controls. It is, of course, difficult to measure the impact of patient safety interventions, especially diffuse interventions like SPI. The initiative might possibly have provided benefits that were not measured or may emerge over time. It is also difficult in these types of large scale evaluations to find an appropriate comparison group. In areas where intervention hospitals were performing well at baseline, it would be difficult to show improvement. The study should be a wakeup call to those implementing patient safety programmes. Too many patients in the UK, and the rest of the world, continue to experience preventable harm. The quality improvement field needs to embrace science, favour evidence over anecdote, and move beyond using only one generic framework for improvement (the plan, do, study, act cycle).5 Different types of patient safety challenges exist, such as translating evidence into practice, improving teamwork and organisational culture, identifying and mitigating hazards, and reducing diagnostic errors. Each type of problem should be informed by specific theories, methods, and measures.6 Although well intentioned, it is not surprising that the SPI had less of an impact than the investigators anticipated. It was not grounded in a theory of organisational change.7 It asked hospitals to implement 43 interventions, when most hospitals would find it difficult to implement three. Clinicians thought that many of the interventions were supported by weak evidence and that some measures were not valid. The initiative was largely top down, with limited input from local clinicians. Moreover, it did not target areas where teams performed poorly (in many of the areas, teams were performing nearly flawlessly before the initiative). The interventions and measures were not sufficiently pilot tested, and quality control over the quality improvement data collected by the local teams was virtually non-existent. Clinicians who push back against patient safety interventions are often viewed as “knaves.”8 This study suggests that some of that resistance may be warranted. Some interventions focused on areas that were not problematic and used evidence and measures that doctors did not always perceive as valid, potentially souring clinicians’ attitudes towards efforts to improve patient safety. We need clinicians to lead patient safety efforts. For this to happen, they must believe that interventions and measures are based on science and that their patients will benefit. Yet when interventions deal with both technical and adaptive challenges, broad scale improvement in patient safety is possible. Several patient safety interventions have shown significant improvements in patient outcomes by having clinicians and researchers collaborate when developing and pilot testing the programme. Such innovations include centralised collection of performance measures and evidence summaries by the researchers, and local innovation of programme implementation by the clinicians.9 10 11 12 These studies provide three important lessons. Firstly, patient safety studies require robust design and evaluation.13 Funding agencies need to support the development and implementation of patient safety programmes that include rigorous evaluation. These programmes should be grounded in change theory, include evidence based interventions, valid measures, and data quality control. Although theory and interventions evolve over time, patient safety programmes should be developed in collaboration with clinicians and be pilot tested, and measures should be validated before broad implementation. Secondly, care in the UK is improving; we should understand how and why. Thirdly, quality improvement efforts must improve, embracing rather than running from science. The science for quality improvement differs from basic and clinical research. It requires input from clinicians, health service researchers, social scientists, and human factors and systems engineers; in addition, it uses change theory, mixed methods, and robust evaluation. These studies provide a model.