Back to Search
Start Over
An effective self-supervised framework for learning expressive molecular global representations to drug discovery
- Source :
- Briefings in bioinformatics. 22(6)
- Publication Year :
- 2021
-
Abstract
- How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
- Subjects :
- 0301 basic medicine
Models, Molecular
Computer science
Machine learning
computer.software_genre
01 natural sciences
03 medical and health sciences
chemistry.chemical_compound
Drug Delivery Systems
Drug Discovery
Molecular graph
Representation (mathematics)
Molecular Biology
010405 organic chemistry
business.industry
Drug discovery
Deep learning
Node (networking)
Pipeline (software)
0104 chemical sciences
030104 developmental biology
chemistry
Benchmark (computing)
Graph (abstract data type)
Artificial intelligence
Neural Networks, Computer
business
computer
Databases, Chemical
Information Systems
Subjects
Details
- ISSN :
- 14774054
- Volume :
- 22
- Issue :
- 6
- Database :
- OpenAIRE
- Journal :
- Briefings in bioinformatics
- Accession number :
- edsair.doi.dedup.....4f00ddad6595ead7aadedc6cafbeb0ef