Back to Search Start Over

Conformal Prediction with Missing Values

Authors :
Zaffran, Margaux
Dieuleveut, Aymeric
Josse, Julie
Romano, Yaniv
EDF R&D (EDF R&D)
EDF (EDF)
Inria Sophia Antipolis - Méditerranée (CRISAM)
Institut National de Recherche en Informatique et en Automatique (Inria)
Centre de Mathématiques Appliquées - Ecole Polytechnique (CMAP)
École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)
Médecine de précision par intégration de données et inférence causale (PREMEDICAL)
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Desbrest de santé publique (IDESP)
Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)
Technion - Israel Institute of Technology [Haifa]
Scholarship for Mathematics granted by the Séphora Berrebi Foundation
Hi! Paris
ISRAEL SCIENCE FOUNDATION (grant No. 729/21)
ANR-11-LABX-0056,LMH,LabEx Mathématique Hadamard(2011)
ANR-19-CHIA-0002,SCAI,Inférence statistique, méthodes numériques et Intelligence Artificielle(2019)
ANR-16-IDEX-0006,MUSE,MUSE(2016)
Publication Year :
2023

Abstract

Conformal prediction is a theoretically grounded framework for constructing predictive intervals. We study conformal prediction with missing values in the covariates -- a setting that brings new challenges to uncertainty quantification. We first show that the marginal coverage guarantee of conformal prediction holds on imputed data for any missingness distribution and almost all imputation functions. However, we emphasize that the average coverage varies depending on the pattern of missing values: conformal methods tend to construct prediction intervals that under-cover the response conditionally to some missing patterns. This motivates our novel generalized conformalized quantile regression framework, missing data augmentation, which yields prediction intervals that are valid conditionally to the patterns of missing values, despite their exponential number. We then show that a universally consistent quantile regression algorithm trained on the imputed data is Bayes optimal for the pinball risk, thus achieving valid coverage conditionally to any given data point. Moreover, we examine the case of a linear model, which demonstrates the importance of our proposal in overcoming the heteroskedasticity induced by missing values. Using synthetic and data from critical care, we corroborate our theory and report improved performance of our methods.<br />Code for our experiments can be found at https://github.com/mzaffran/ConformalPredictionMissingValues . To be published in the proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....06de5c33ade6b35b301cbd4521ed3f7c