Back to Search Start Over

Insights into deep learning framework for molecular property prediction based on different tokenization algorithms.

Authors :
Yan, Jianlin
Zhang, Zhenyu
Meng, Miaomiao
Li, Jun
Sun, Lanyi
Source :
Chemical Engineering Science. Mar2024, Vol. 285, pN.PAG-N.PAG. 1p.
Publication Year :
2024

Abstract

• Deeply investigation of pre-training models using different tokenization algorithms. • The BERT model for capturing advanced features of molecular is developed. • A novel deep learning framework is proposed for predicting complex properties. • Application of fine-tuning in correlating molecular structure and properties. With the rapid development of deep learning, research on quantitative structure–property relationships based on deep learning has received widespread attention. The deep learning architecture combining Bidirectional Encoder Representation from Transformers (BERT) and Feedforward Neural Networks (FNN) is proposed to compare the performance of different tokenization algorithms. And t-distributed stochastic neighbor embedding reveals valuable information about the mechanism of structure–property relationships. Additionally, a deep learning framework, BERT-Convolutional Neural Network (CNN)-FNN, is developed based on the optimal tokenization algorithm to accurately predict the σ -profile and V COSMO. The molecular structures are vectorized with the BERT model capturing local and global features of the entire molecule. And the CNN model enhances the latent representation associated with molecular properties, while the FNN model establishes the correlation. The deep learning frameworks predict σ -profile and V COSMO properties with R2 greater than 0.9703, making it a promising intelligent tool for guiding solvent design and screening. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00092509
Volume :
285
Database :
Academic Search Index
Journal :
Chemical Engineering Science
Publication Type :
Academic Journal
Accession number :
174708801
Full Text :
https://doi.org/10.1016/j.ces.2023.119471