Back to Search Start Over

Stochastic machine learning via sigma profiles to build a digital chemical space.

Authors :
Abranches DO
Maginn EJ
Colón YJ
Source :
Proceedings of the National Academy of Sciences of the United States of America [Proc Natl Acad Sci U S A] 2024 Jul 30; Vol. 121 (31), pp. e2404676121. Date of Electronic Publication: 2024 Jul 23.
Publication Year :
2024

Abstract

This work establishes a different paradigm on digital molecular spaces and their efficient navigation by exploiting sigma profiles. To do so, the remarkable capability of Gaussian processes (GPs), a type of stochastic machine learning model, to correlate and predict physicochemical properties from sigma profiles is demonstrated, outperforming state-of-the-art neural networks previously published. The amount of chemical information encoded in sigma profiles eases the learning burden of machine learning models, permitting the training of GPs on small datasets which, due to their negligible computational cost and ease of implementation, are ideal models to be combined with optimization tools such as gradient search or Bayesian optimization (BO). Gradient search is used to efficiently navigate the sigma profile digital space, quickly converging to local extrema of target physicochemical properties. While this requires the availability of pretrained GP models on existing datasets, such limitations are eliminated with the implementation of BO, which can find global extrema with a limited number of iterations. A remarkable example of this is that of BO toward boiling temperature optimization. Holding no knowledge of chemistry except for the sigma profile and boiling temperature of carbon monoxide (the worst possible initial guess), BO finds the global maximum of the available boiling temperature dataset (over 1,000 molecules encompassing more than 40 families of organic and inorganic compounds) in just 15 iterations (i.e., 15 property measurements), cementing sigma profiles as a powerful digital chemical space for molecular optimization and discovery, particularly when little to no experimental data is initially available.<br />Competing Interests: Competing interests statement:The authors declare no competing interest.

Details

Language :
English
ISSN :
1091-6490
Volume :
121
Issue :
31
Database :
MEDLINE
Journal :
Proceedings of the National Academy of Sciences of the United States of America
Publication Type :
Academic Journal
Accession number :
39042681
Full Text :
https://doi.org/10.1073/pnas.2404676121