1. A biologically-inspired multi-modal evaluation of molecular generative machine learning
- Author
-
Vinogradova, Elizaveta, Artykbayev, Abay, Amanatay, Alisher, Karatayev, Mukhamejan, Mametkulov, Maxim, Li, Albina, Suleimenov, Anuar, Salimzhanov, Abylay, Pats, Karina, Zhumagambetov, Rustam, Molnár, Ferdinand, Peshkov, Vsevolod, and Fazli, Siamac
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,J.2 ,J.3 ,I.2.5 ,Computer Science - Artificial Intelligence ,Biomolecules (q-bio.BM) ,I.2.0 ,Quantitative Biology - Quantitative Methods ,Machine Learning (cs.LG) ,Artificial Intelligence (cs.AI) ,Quantitative Biology - Biomolecules ,FOS: Biological sciences ,Quantitative Methods (q-bio.QM) - Abstract
While generative models have recently become ubiquitous in many scientific areas, less attention has been paid to their evaluation. For molecular generative models, the state-of-the-art examines their output in isolation or in relation to its input. However, their biological and functional properties, such as ligand-target interaction is not being addressed. In this study, a novel biologically-inspired benchmark for the evaluation of molecular generative models is proposed. Specifically, three diverse reference datasets are designed and a set of metrics are introduced which are directly relevant to the drug discovery process. In particular we propose a recreation metric, apply drug-target affinity prediction and molecular docking as complementary techniques for the evaluation of generative outputs. While all three metrics show consistent results across the tested generative models, a more detailed comparison of drug-target affinity binding and molecular docking scores revealed that unimodal predictiors can lead to erroneous conclusions about target binding on a molecular level and a multi-modal approach is thus preferrable. The key advantage of this framework is that it incorporates prior physico-chemical domain knowledge into the benchmarking process by focusing explicitly on ligand-target interactions and thus creating a highly efficient tool not only for evaluating molecular generative outputs in particular, but also for enriching the drug discovery process in general., 59 pages, 26 figures Project GitHub repository, https://gitlab.com/cheml.io/abraham
- Published
- 2022