Author: "Balzer, Laura B." / Publication Type: Reports - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Balzer, Laura B."' showing total 16 results

Start Over Author "Balzer, Laura B." Publication Type Reports

16 results on '"Balzer, Laura B."'

1. Large Language Models as Co-Pilots for Causal Inference in Medical Studies

Author: Alaa, Ahmed, Phillips, Rachael V., Kıcıman, Emre, Balzer, Laura B., van der Laan, Mark, and Petersen, Maya
Subjects: Computer Science - Artificial Intelligence
Abstract: The validity of medical studies based on real-world clinical data, such as observational studies, depends on critical assumptions necessary for drawing causal conclusions about medical interventions. Many published studies are flawed because they violate these assumptions and entail biases such as residual confounding, selection bias, and misalignment between treatment and measurement times. Although researchers are aware of these pitfalls, they continue to occur because anticipating and addressing them in the context of a specific study can be challenging without a large, often unwieldy, interdisciplinary team with extensive expertise. To address this expertise gap, we explore the use of large language models (LLMs) as co-pilot tools to assist researchers in identifying study design flaws that undermine the validity of causal inferences. We propose a conceptual framework for LLMs as causal co-pilots that encode domain knowledge across various fields, engaging with researchers in natural language interactions to provide contextualized assistance in study design. We provide illustrative examples of how LLMs can function as causal co-pilots, propose a structured framework for their grounding in existing causal inference frameworks, and highlight the unique challenges and opportunities in adapting LLMs for reliable use in epidemiological research.
Published: 2024

2. Causal Inference in Randomized Trials with Partial Clustering and Imbalanced Dependence Structures

Author: Nugent, Joshua R., Kakande, Elijah, Chamie, Gabriel, Kabami, Jane, Owaraganise, Asiphas, Havlir, Diane V., Kamya, Moses, and Balzer, Laura B.
Subjects: Statistics - Methodology
Abstract: In many randomized trials, participants are grouped into clusters, such as neighborhoods or schools, and these clusters are assumed to be the independent unit. This assumption, however, might not reflect the underlying dependence structure, with serious consequences to statistical power. First, consider a cluster randomized trial where participants are artificially grouped together for the purposes of randomization. For intervention participants the groups are the basis for intervention delivery, but for control participants the groups are dissolved. Second, consider an individually randomized group treatment trial where participants are randomized and then post-randomization, intervention participants are grouped together for intervention delivery, while the control participants continue with the standard of care. In both trial designs, outcomes among intervention participants will be dependent within each cluster, while outcomes for control participants will be effectively independent. We use causal models to non-parametrically describe the data generating process for each trial design and formalize the conditional independence in the observed data distribution. For estimation and inference, we propose a novel implementation of targeted minimum loss-based estimation (TMLE) accounting for partial clustering and the imbalanced dependence structure. TMLE is a model-robust approach, leverages covariate adjustment and machine learning to improve precision, and facilitates estimation of a large set of causal effects. In finite sample simulations, TMLE achieved comparable or markedly higher statistical power than common alternatives. Finally, application of TMLE to real data from the SEARCH-IPT trial resulted in 20-57\% efficiency gains, demonstrating the real-world consequences of our proposed approach.
Published: 2024

3. When exposure affects subgroup membership: Framing relevant causal questions in perinatal epidemiology and beyond

Author: Gupta, Shalika, Balzer, Laura B., Kamya, Moses R., Havlir, Diane V., and Petersen, Maya L.
Subjects: Statistics - Methodology
Abstract: Perinatal epidemiology often aims to evaluate exposures on infant outcomes. When the exposure affects the composition of people who give birth to live infants (e.g., by affecting fertility, behavior, or birth outcomes), this "live birth process" mediates the exposure effect on infant outcomes. Causal estimands previously proposed for this setting include the total exposure effect on composite birth and infant outcomes, controlled direct effects (e.g., enforcing birth), and principal stratum direct effects. Using perinatal HIV transmission in the SEARCH Study as a motivating example, we present two alternative causal estimands: 1) conditional total effects; and 2) conditional stochastic direct effects, formulated under a hypothetical intervention to draw mediator values from some distribution (possibly conditional on covariates). The proposed conditional total effect includes impacts of an intervention that operate by changing the types of people who have a live birth and the timing of births. The proposed conditional stochastic direct effects isolate the effect of an exposure on infant outcomes excluding any impacts through this live birth process. In SEARCH, this approach quantifies the impact of a universal testing and treatment intervention on infant HIV-free survival absent any effect of the intervention on the live birth process, within a clearly defined target population of women of reproductive age with HIV at study baseline. Our approach has implications for the evaluation of intervention effects in perinatal epidemiology broadly, and whenever causal effects within a subgroup are of interest and exposure affects membership in the subgroup.
Published: 2024

4. The Causal Roadmap and Simulations to Improve the Rigor and Reproducibility of Real-Data Applications

Author: Nance, Nerissa, Petersen, Maya L., van der Laan, Mark, and Balzer, Laura B.
Subjects: Statistics - Methodology
Abstract: The Causal Roadmap outlines a systematic approach to asking and answering questions of cause-and-effect: define the quantity of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. To protect research integrity, it is essential that the algorithm for statistical estimation and inference be pre-specified prior to conducting any effectiveness analyses. However, it is often unclear which algorithm will perform optimally for the real-data application. Instead, there is a temptation to simply implement one's favorite algorithm -- recycling prior code or relying on the default settings of a computing package. Here, we call for the use of simulations that realistically reflect the application, including key characteristics such as strong confounding and dependent or missing outcomes, to objectively compare candidate estimators and facilitate full specification of the Statistical Analysis Plan. Such simulations are informed by the Causal Roadmap and conducted after data collection but prior to effect estimation. We illustrate with two worked examples. First, in an observational longitudinal study, outcome-blind simulations are used to inform nuisance parameter estimation and variance estimation for longitudinal targeted minimum loss-based estimation (TMLE). Second, in a cluster randomized trial with missing outcomes, treatment-blind simulations are used to examine Type-I error control in Two-Stage TMLE. In both examples, realistic simulations empower us to pre-specify an estimation approach that is expected to have strong finite sample performance and also yield quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.
Published: 2023

5. Statistical Analysis Plan for Primary and Selected Secondary Health Endpoints of the SEARCH-Youth Study

Author: Balzer, Laura B., Ruel, Theodore, Havlir, Diane V., and Team, the SEARCH-Youth Study
Subjects: Statistics - Applications
Abstract: This document provides the statistical analytic plan (SAP) for evaluating health outcomes in the SEARCH-Youth study, a cluster randomized trial designed to evaluate the effect of a combination intervention on HIV viral suppression among adolescents and young adults with HIV in rural Uganda and Kenya (Clinicaltrials.gov: NCT03848728). The SAP was locked prior to unblinding and effect estimation. This SAP was embargoed until November 04, 2022 when it was submitted to arXiv., Comment: 14 pages, 1 figure
Published: 2022

6. Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Author: Balzer, Laura B., Cai, Erica, Garraza, Lucas Godoy, and Amaranath, Pracheta
Subjects: Statistics - Methodology, Statistics - Machine Learning
Abstract: Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N$<$40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using $V$-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups., Comment: 27 pages (double spaced); 2 figures; 9 tables
Published: 2022

7. Blurring cluster randomized trials and observational studies using Two-Stage TMLE to address sub-sampling, missingness, and minimal independent units

Author: Nugent, Joshua R., Marquez, Carina, Charlebois, Edwin D., Abbott, Rachel, and Balzer, Laura B.
Subjects: Statistics - Methodology, Statistics - Applications
Abstract: Cluster randomized trials (CRTs) often enroll large numbers of participants, but due to logistical and fiscal challenges, only a subset of participants may be selected for measurement of certain outcomes, and those sampled may, purposely or not, be unrepresentative of all participants. Missing data also present a challenge: if sampled individuals with measured outcomes are dissimilar from those with missing outcomes, unadjusted estimates of arm-specific outcomes and the intervention effect may be biased. Further, CRTs often enroll and randomize few clusters by necessity, limiting statistical power and raising concerns about finite sample performance. Motivated by a sub-study of the SEARCH community randomized trial on the incidence of TB infection, we demonstrate interlocking methods to handle these challenges. First, we extend Two-Stage targeted minimum loss-based estimation (TMLE) to account for three sources of missingness: (1) sampling for the sub-study; (2) measurement of baseline status among those sampled, and (3) measurement of final status among those in the incidence cohort (i.e., persons known to be at risk at baseline). Second, we critically evaluate the assumptions under which sub-units of the cluster can be considered the conditionally independent unit, improving precision and statistical power but also causing the CRT to behave more like an observational study. Our application to the SEARCH highlights the impact of different assumptions on measurement and dependence as well as the real-life gains of our approach for bias reduction and efficiency improvement.
Published: 2022

8. Statistical Analysis Plan for Health Outcomes in Phase 1 of the SEARCH-IPT Study

Author: Balzer, Laura B., Nugent, Joshua, Havlir, Diane V., and Chamie, Gabriel
Subjects: Statistics - Applications
Abstract: This document provides the statistical analytic plan (SAP) for evaluating health outcomes in Phase 1 of the SEARCH-IPT Study, a cluster randomized trial to evaluate whether a multicomponent intervention increases uptake of isoniazid (INH) preventive therapy (IPT) and reduces the incidence of tuberculosis (TB) in Uganda (Clinicaltrials.gov: NCT03315962). The SAP was locked prior to unblinding and effect estimation. This SAP was embargoed until November 19, 2021 when it was submitted to arXiv., Comment: 13 pages and 1 figure
Published: 2021

9. Evaluating shifts in mobility and COVID-19 case rates in U.S. counties: A demonstration of modified treatment policies for causal inference with continuous exposures

Author: Nugent, Joshua R. and Balzer, Laura B.
Subjects: Statistics - Applications, Statistics - Machine Learning
Abstract: Previous research has shown mixed evidence on the associations between mobility data and COVID-19 case rates, analysis of which is complicated by differences between places on factors influencing both behavior and health outcomes. We aimed to evaluate the county-level impact of shifting the distribution of mobility on the growth in COVID-19 case rates from June 1 - November 14, 2020. We utilized a modified treatment policy (MTP) approach, which considers the impact of shifting an exposure away from its observed value. The MTP approach facilitates studying the effects of continuous exposures while minimizing parametric modeling assumptions. Ten mobility indices were selected to capture several aspects of behavior expected to influence and be influenced by COVID-19 case rates. The outcome was defined as the number of new cases per 100,000 residents two weeks ahead of each mobility measure. Primary analyses used targeted minimum loss-based estimation (TMLE) with a Super Learner ensemble of machine learning algorithms, considering over 20 potential confounders capturing counties' recent case rates as well as social, economic, health, and demographic variables. For comparison, we also implemented unadjusted analyses. For most weeks considered, unadjusted analyses suggested strong associations between mobility indices and subsequent growth in case rates. However, after confounder adjustment, none of the indices showed consistent associations after hypothetical shifts to reduce mobility. While identifiability concerns limit our ability to make causal claims in this analysis, MTPs are a powerful and underutilized tool for studying the effects of continuous exposures.
Published: 2021

10. Defining and Estimating Effects in Cluster Randomized Trials: A Methods Comparison

Author: Benitez, Alejandra, Petersen, Maya L., van der Laan, Mark J., Santos, Nicole, Butrick, Elizabeth, Walker, Dilys, Ghosh, Rakesh, Otieno, Phelgona, Waiswa, Peter, and Balzer, Laura B.
Subjects: Statistics - Methodology
Abstract: Across research disciplines, cluster randomized trials (CRTs) are commonly implemented to evaluate interventions delivered to groups of participants, such as communities and clinics. Despite advances in the design and analysis of CRTs, several challenges remain. First, there are many possible ways to specify the causal effect of interest (e.g., at the individual-level or at the cluster-level). Second, the theoretical and practical performance of common methods for CRT analysis remain poorly understood. Here, we present a general framework to formally define an array of causal effects in terms of summary measures of counterfactual outcomes. Next, we provide a comprehensive overview of CRT estimators, including the t-test, generalized estimating equations (GEE), augmented-GEE, and targeted maximum likelihood estimation (TMLE). Using finite sample simulations, we illustrate the practical performance of these estimators for different causal effects and when, as commonly occurs, there are limited numbers of clusters of different sizes. Finally, our application to data from the Preterm Birth Initiative (PTBi) study demonstrates the real-world impact of varying cluster sizes and targeting effects at the cluster-level or at the individual-level. Specifically, the relative effect of the PTBI intervention was 0.81 at the cluster-level, corresponding to a 19% reduction in outcome incidence, and was 0.66 at the individual-level, corresponding to a 34% reduction in outcome risk. Given its flexibility to estimate a variety of user-specified effects and ability to adaptively adjust for covariates for precision gains while maintaining Type-I error control, we conclude TMLE is a promising tool for CRT analysis.
Published: 2021

11. Evaluating the Impact of State-Level Public Masking Mandates on New COVID-19 Cases and Deaths in the United States: A Demonstration of the Causal Roadmap

Author: Wong, Angus K. and Balzer, Laura B.
Subjects: Statistics - Applications, Statistics - Methodology
Abstract: At a national-level, we sought to investigate the effect of public masking mandates on COVID-19 in Fall 2020. Specifically, we aimed to evaluate how the relative growth of COVID-19 cases and deaths would have differed if all states had issued a mandate to mask in public by September 1, 2020 versus if all states had delayed issuing such a mandate. To do so, we applied the Causal Roadmap, a formal framework for causal and statistical inference. The outcome was defined as the state-specific relative increase in cumulative cases and in cumulative deaths {21, 30, 45, 60}-days after September 1. Despite the natural experiment in state-level masking policies, the causal effect of interest was not identifiable. Nonetheless, we specified the target statistical parameter as the adjusted rate ratio (aRR): the expected outcome with early implementation divided by the expected outcome with delayed implementation, after adjusting for state-level confounders. To minimize strong estimation assumptions, primary analyses used targeted maximum likelihood estimation (TMLE) with Super Learner. After 60-days and at a national-level, early implementation was associated 9% reduction in new COVID-19 cases (aRR: 0.91; 95%CI: 0.88-0.95) and a 16% reduction in new COVID-19 deaths (aRR: 0.84; 95%CI: 0.76-0.93). Although lack of identifiability prohibited causal interpretations, application of the Causal Roadmap facilitated estimation and inference of statistical associations, providing timely answers to pressing questions in the COVID-19 response., Comment: 34 total page (including supp materials)
Published: 2021
Full Text: View/download PDF

12. Two-Stage TMLE to Reduce Bias and Improve Efficiency in Cluster Randomized Trials

Author: Balzer, Laura B., van der Laan, Mark, Ayieko, James, Kamya, Moses, Chamie, Gabriel, Schwab, Joshua, Havlir, Diane V., and Petersen, Maya L.
Subjects: Statistics - Methodology, Statistics - Applications, Statistics - Machine Learning
Abstract: Cluster randomized trials (CRTs) randomly assign an intervention to groups of individuals (e.g., clinics or communities) and measure outcomes on individuals in those groups. While offering many advantages, this experimental design introduces challenges that are only partially addressed by existing analytic approaches. First, outcomes are often missing for some individuals within clusters. Failing to appropriately adjust for differential outcome measurement can result in biased estimates and inference. Second, CRTs often randomize limited numbers of clusters, resulting in chance imbalances on baseline outcome predictors between arms. Failing to adaptively adjust for these imbalances and other predictive covariates can result in efficiency losses. To address these methodological gaps, we propose and evaluate a novel two-stage targeted minimum loss-based estimator (TMLE) to adjust for baseline covariates in a manner that optimizes precision, after controlling for baseline and post-baseline causes of missing outcomes. Finite sample simulations illustrate that our approach can nearly eliminate bias due to differential outcome measurement, while existing CRT estimators yield misleading results and inferences. Application to real data from the SEARCH community randomized trial demonstrates the gains in efficiency afforded through adaptive adjustment for baseline covariates, after controlling for missingness on individual-level outcomes., Comment: 37 pages total; main text is 17 pgs with 2 figures and 3 tables; supp material is 14 pgs with 1 figure and 5 tables
Published: 2021
Full Text: View/download PDF

13. The covariate-adjusted residual estimator and its use in both randomized trials and observational settings

Author: Lauer, Stephen A., Reich, Nicholas G., and Balzer, Laura B.
Subjects: Statistics - Methodology, Statistics - Applications
Abstract: We often seek to estimate the causal effect of an exposure on a particular outcome in both randomized and observational settings. One such estimation method is the covariate-adjusted residuals estimator, which was designed for individually or cluster randomized trials. In this manuscript, we study the properties of this estimator and develop a new estimator that utilizes both covariate adjustment and inverse probability weighting We support our theoretical results with a simulation study and an application in an infectious disease setting. The covariate-adjusted residuals estimator is an efficient and unbiased estimator of the average treatment effect in randomized trials; however, it is not guaranteed to be unbiased in observational studies. Our novel estimator, the covariate-adjusted residuals estimator with inverse probability weighting, is unbiased in randomized and observational settings, under a reasonable set of assumptions. Furthermore, when these assumptions hold, it provides efficiency gains over inverse probability weighting in observational studies. The covariate-adjusted residuals estimator is valid for use in randomized trials, but should not be used in observational studies. The covariate-adjusted residuals estimator with inverse probability weighting provides an efficient alternative for use in randomized and observational settings.
Published: 2019

14. A Primer on Causality in Data Science

Author: Saddiki, Hachem and Balzer, Laura B.
Subjects: Statistics - Applications, Statistics - Methodology, Statistics - Machine Learning
Abstract: Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure, randomized or not, on an outcome interest. Even studies that are seemingly non-causal, such as those with the goal of prediction or prevalence estimation, have causal elements, including differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in those data. In this work, we review the 'Causal Roadmap' of Petersen and van der Laan (2014) to provide an introduction to some key concepts in causal inference. Similar to other causal frameworks, the steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to express the causal parameter as a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. We believe that using such a framework in Data Science will help to ensure that our statistical analyses are guided by the scientific question driving our research, while avoiding over-interpreting our results. We focus on the effect of an exposure occurring at a single time point and highlight the use of targeted maximum likelihood estimation (TMLE) with Super Learner., Comment: 26 pages (with references); 4 figures
Published: 2018

15. Statistical Analysis Plan for SEARCH Phase I: Health Outcomes among Adults

Author: Balzer, Laura B., Havlir, Diane V., Schwab, Joshua, Van Der Laan, Mark J., and Petersen, Maya L.
Subjects: Statistics - Applications
Abstract: This document provides the analytic plan for evaluating adult HIV incidence, health, and implementation outcomes for the first phase of the SEARCH Study. Locked: November 27, 2017. Embargoed until July 25, 2018., Comment: 40 pgs
Published: 2018

16. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Author: Balzer, Laura B., Zheng, Wenjing, van der Laan, Mark J., and Petersen, Maya L.
Subjects: Statistics - Methodology
Abstract: We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment.
Published: 2017
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

16 results on '"Balzer, Laura B."'

1. Large Language Models as Co-Pilots for Causal Inference in Medical Studies

2. Causal Inference in Randomized Trials with Partial Clustering and Imbalanced Dependence Structures

3. When exposure affects subgroup membership: Framing relevant causal questions in perinatal epidemiology and beyond

4. The Causal Roadmap and Simulations to Improve the Rigor and Reproducibility of Real-Data Applications

5. Statistical Analysis Plan for Primary and Selected Secondary Health Endpoints of the SEARCH-Youth Study

6. Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

7. Blurring cluster randomized trials and observational studies using Two-Stage TMLE to address sub-sampling, missingness, and minimal independent units

8. Statistical Analysis Plan for Health Outcomes in Phase 1 of the SEARCH-IPT Study

9. Evaluating shifts in mobility and COVID-19 case rates in U.S. counties: A demonstration of modified treatment policies for causal inference with continuous exposures

10. Defining and Estimating Effects in Cluster Randomized Trials: A Methods Comparison

11. Evaluating the Impact of State-Level Public Masking Mandates on New COVID-19 Cases and Deaths in the United States: A Demonstration of the Causal Roadmap

12. Two-Stage TMLE to Reduce Bias and Improve Efficiency in Cluster Randomized Trials

13. The covariate-adjusted residual estimator and its use in both randomized trials and observational settings

14. A Primer on Causality in Data Science

15. Statistical Analysis Plan for SEARCH Phase I: Health Outcomes among Adults

16. A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Database

16 results on '"Balzer, Laura B."'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources