1. Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
- Author
-
Klarner, Leo, Rudner, Tim G. J., Reutlinger, Michael, Schindler, Torsten, Morris, Garrett M., Deane, Charlotte, and Teh, Yee Whye
- Subjects
Quantitative Biology - Biomolecules ,Computer Science - Machine Learning ,Statistics - Machine Learning - Abstract
Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques., Comment: Published in the Proceedings of the 40th International Conference on Machine Learning (ICML 2023)
- Published
- 2023