Start Over

Informed strategies for multivariate missing data

Authors :: Cai, Mingyang
Methodology and statistics for the behavioural and social sciences
Leerstoel van Buuren
van Buuren, Stef
Vink, Gerko
University Utrecht
Publication Year :: 2022
Publisher :: Utrecht University, 2022.
Abstract: Joint modelling (JM) and fully conditional specification (FCS) are two widely used strategies for imputing multivariate missing data. JM involves specifying a multivariate distribution for the missing data and drawing imputations from their conditional distributions. The FCS approach specifies the distribution for each partially observed variable conditional on all other variables. The main advantage of FCS over JM is that FCS allows for tremendous flexibility in multivariate model design. However, there are often extra structures in the missing data that FCS cannot model properly in practice. Moreover, it is challenging to preserve the relations among multiple variables when performing the imputation on a variable-by-variable basis. This thesis aims to develop hybrid imputation that provides a strategy to specify hybrids of JM and FCS. To achieve this goal, I propose different solutions to missing data problems when applying FCS is not optimal. In chapter 2, I first discuss some general methods to impute squares. I improve the polynomial combination method and compare it with the substantive model compatible fully conditional specification method. Finally, I summarise the properties of both approaches. In chapter 3, I develop multivariate predictive mean matching, which allows simultaneous imputation of multiple missing variables. I combine the methodology of univariate predictive mean matching and canonical regression analysis. The advantage of this imputation method is the preservation of relations among a set of missing variables. Finally, I show the potential scenarios where multivariate predictive mean matching could be used and discuss the limitations. In chapter 4, I develop the hybrid imputation method to estimate individual treatment effects. The idea is that by imputing unobserved outcomes, we could calculate the differences between potential outcomes under different treatment conditions. However, there is a problem the data has no information about the correlation between potential outcomes. The proposed hybrid imputation method specifies the partial correlation and performs a sensitivity analysis to overcome this problem. Finally, I demonstrate the validity of the proposed hybrid imputation method and show how to apply it in practice. In chapter 5, I investigate the compatibility of FCS when the prior for conditional models are informative. Many authors illustrated the compatibility property of FCS when the prior for conditional models is non-informative. However, the compatibility property in the case of informative priors has not received much attention. I demonstrate that FCS under the normal linear model with an informative inverse-gamma prior is compatible with a joint distribution and provide the corresponding normal inverse-Wishart prior distribution for the joint distribution. In chapter 6, I develop a novel strategy to diagnose multiple imputation models based on posterior predictive checking. The general idea is that if the imputation model is congenial to the substantive model, the expected value of the observed data is in the centre of corresponding predictive posterior distributions. By applying the proposed diagnosis method, the researcher could compare the `over-imputed’ data with the observed data and evaluate the fitness of the imputation model.