1. Predicting kinase inhibitors using bioactivity matrix derived informer sets
- Author
-
Michael A. Newton, Ching-pei Lee, Anthony Gitter, Peng Yu, Stephen J. Wright, Gene E. Ananiev, Nathan Wlodarchak, Huikun Zhang, Scott A. Wildman, Spencer S. Ericksen, F. Michael Hoffmann, and Julie C. Mitchell
- Subjects
0301 basic medicine ,Computer science ,Databases, Pharmaceutical ,Kinase Inhibitors ,Drug Evaluation, Preclinical ,Protozoan Proteins ,computer.software_genre ,01 natural sciences ,Biochemistry ,Tyrosine Kinases ,User-Computer Interface ,0302 clinical medicine ,Mathematical and Statistical Techniques ,Drug Discovery ,Medicine and Health Sciences ,Prospective Studies ,Enzyme Inhibitors ,Biology (General) ,0303 health sciences ,Computational model ,Ecology ,Drug discovery ,Cheminformatics ,Statistics ,3. Good health ,Enzymes ,Identification (information) ,Data Acquisition ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Data mining ,Research Article ,Prioritization ,Computer and Information Sciences ,Drug Research and Development ,QH301-705.5 ,High-throughput screening ,Library Screening ,Protein Serine-Threonine Kinases ,Machine learning ,Research and Analysis Methods ,Set (abstract data type) ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Structure-Activity Relationship ,Viral Proteins ,Genetics ,Humans ,Computer Simulation ,Statistical Methods ,Molecular Biology Techniques ,Molecular Biology ,Protein Kinase Inhibitors ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Pharmacology ,Virtual screening ,Molecular Biology Assays and Analysis Techniques ,business.industry ,Supervised learning ,Experimental data ,Biology and Life Sciences ,Proteins ,Computational Biology ,High Throughput Screening ,0104 chemical sciences ,Chemical screening ,High-Throughput Screening Assays ,010404 medicinal & biomolecular chemistry ,030104 developmental biology ,Biological target ,Enzymology ,Artificial intelligence ,business ,computer ,Protein Kinases ,030217 neurology & neurosurgery ,Mathematics ,Databases, Chemical ,Forecasting - Abstract
Prediction of compounds that are active against a desired biological target is a common step in drug discovery efforts. Virtual screening methods seek some active-enriched fraction of a library for experimental testing. Where data are too scarce to train supervised learning models for compound prioritization, initial screening must provide the necessary data. Commonly, such an initial library is selected on the basis of chemical diversity by some pseudo-random process (for example, the first few plates of a larger library) or by selecting an entire smaller library. These approaches may not produce a sufficient number or diversity of actives. An alternative approach is to select an informer set of screening compounds on the basis of chemogenomic information from previous testing of compounds against a large number of targets. We compare different ways of using chemogenomic data to choose a small informer set of compounds based on previously measured bioactivity data. We develop this Informer-Based-Ranking (IBR) approach using the Published Kinase Inhibitor Sets (PKIS) as the chemogenomic data to select the informer sets. We test the informer compounds on a target that is not part of the chemogenomic data, then predict the activity of the remaining compounds based on the experimental informer data and the chemogenomic data. Through new chemical screening experiments, we demonstrate the utility of IBR strategies in a prospective test on three kinase targets not included in the PKIS., Author summary In the early stages of drug discovery efforts, computational models are used to predict activity and prioritize compounds for experimental testing. New targets commonly lack the data necessary to build effective models, and the screening needed to generate that experimental data can be costly. We seek to improve the efficiency of the initial screening phase, and of the process of prioritizing compounds for subsequent screening. We choose a small informer set of compounds based on publicly available prior screening data on distinct targets. We then collect experimental data on these informer compounds and use that data to predict the activity of other compounds in the set for the target of interest. Computational and statistical tools are needed to identify informer compounds and to prioritize other compounds for subsequent phases of screening. We find that selection of informer compounds on the basis of bioactivity data from previous screening efforts is superior to the traditional approach of selection of a chemically diverse subset of compounds. We demonstrate the success of this approach in retrospective tests on the Published Kinase Inhibitor Sets (PKIS) chemogenomic data and in prospective experimental screens against three additional non-human kinase targets.
- Published
- 2019