Actionable real-world evidence (RWE) requires accurate estimation of causal treatment effects. Calibration of RWE against randomized controlled trials (RCTs) is sometimes done to demonstrate that RWE can support the same causal conclusion as RCTs. Disagreements can occur when studies in each pair asked different questions in different populations or due to the presence of residual bias. Distinguishing among reasons for differences will impact the level of confidence in RWE. Several projects, such as the RCT DUPLICATE initiative funded by US FDA, NIH, and others, have launched in the last few years with the aim of assessing whether nonrandomized database studies can in some circumstances produce conclusions on the effectiveness of medications that are similar to those provided by RCTs.1,2 While comparison of randomized and nonrandomized findings is not new, heightened current interest is in part spurred by new initiatives at several regulatory agencies focused on assessing the role real-world evidence (RWE) can play in regulatory decision-making.3,4 FDA defines RWE as evidence on the benefits and risks of medications derived from routinely collected healthcare data, although other data sources such as patient registries may also be used. A key challenge for any project attempting to calibrate RWE findings against RCT findings is that differences between treatment effect estimates from the two study types can be driven by bias due to lack of randomization in RWE and/or by other differences in the design, such as inclusion/exclusion criteria, outcome measurement, or motivations for patients to adhere to study medications. Even if RWE studies are designed to match the corresponding RCT as closely as possible, emulation of all study components is typically impossible. Emulating target trials In clinical epidemiology, it has been recommended for several decades to contemplate how a randomized trial would be designed to answer a specific question before designing the nonrandomized counterpart to study the same question. Hernan and Robins call the former a hypothetical “target trial,” which would then be emulated by a nonrandomized study.5 This process has proven very useful in clarifying design choices for nonrandomized research on medications. It is also a highly flexible process, as specification of the target trial is often iterative. Realities of data collection, patient access, and other practical considerations impose constraints to the nonrandomized emulation; the hypothetical target trial can thus be adjusted so that the design that is feasible in a given healthcare database can accord with the design of the target trial. In calibrating RWE studies against existing RCTs, the process of emulating a target RCT in nonrandomized data is similar, except that in this case the target trial is already underway or completed. Thus, adjusting the design of the target RCT to improve the feasibility of the design in existing data is not possible, making exact emulation more difficult. Instead, investigators must adapt the design elements that they can from the trial, given the constraints of the database, and note the trial specifications that cannot be completely emulated. This process will highlight unavoidable emulation differences between a completed or in-progress RCT and the RWE replication. When assembling a series of RWE replications of RCTs, as in the RCT DUPLICATE initiative, there will be some trials where RWE specifications can closely emulate the RCT and others that cannot be emulated as closely. Examples of the latter are run-in periods during which non-adherent or drug intolerant patients are excluded before randomization, or, to homogenize patients, run-in periods that place all patients on a common medication before randomization. Therefore, enumerating – and quantifying to the extent possible – such emulation differences can provide insight into whether and to what extent differences in treatment effect estimates between RWE and corresponding RCTs are due to bias related to a lack of randomization versus other differences in design between the two study types. Specifically, attempting to correlate measures of emulation difference, such as those suggested in Table 1, with the magnitude of differences in treatment effect estimates may provide understanding of which emulation differences are most important in contributing to the “efficacy-effectiveness gap” between RWE and RCT findings, as long as bias due to confounding is not also correlated with the emulation difference of interest. Once better understood, such metrics could possibly be transformed into a simple three-point emulation scale. Table 1. Challenges in calibrating RWE against RCTs and measures of differences between the study types.