Back to Search
Start Over
Insights into deep learning framework for molecular property prediction based on different tokenization algorithms.
- Source :
-
Chemical Engineering Science . Mar2024, Vol. 285, pN.PAG-N.PAG. 1p. - Publication Year :
- 2024
-
Abstract
- • Deeply investigation of pre-training models using different tokenization algorithms. • The BERT model for capturing advanced features of molecular is developed. • A novel deep learning framework is proposed for predicting complex properties. • Application of fine-tuning in correlating molecular structure and properties. With the rapid development of deep learning, research on quantitative structure–property relationships based on deep learning has received widespread attention. The deep learning architecture combining Bidirectional Encoder Representation from Transformers (BERT) and Feedforward Neural Networks (FNN) is proposed to compare the performance of different tokenization algorithms. And t-distributed stochastic neighbor embedding reveals valuable information about the mechanism of structure–property relationships. Additionally, a deep learning framework, BERT-Convolutional Neural Network (CNN)-FNN, is developed based on the optimal tokenization algorithm to accurately predict the σ -profile and V COSMO. The molecular structures are vectorized with the BERT model capturing local and global features of the entire molecule. And the CNN model enhances the latent representation associated with molecular properties, while the FNN model establishes the correlation. The deep learning frameworks predict σ -profile and V COSMO properties with R2 greater than 0.9703, making it a promising intelligent tool for guiding solvent design and screening. [ABSTRACT FROM AUTHOR]
- Subjects :
- *DEEP learning
*LANGUAGE models
*FEEDFORWARD neural networks
*ALGORITHMS
Subjects
Details
- Language :
- English
- ISSN :
- 00092509
- Volume :
- 285
- Database :
- Academic Search Index
- Journal :
- Chemical Engineering Science
- Publication Type :
- Academic Journal
- Accession number :
- 174708801
- Full Text :
- https://doi.org/10.1016/j.ces.2023.119471