Back to Search Start Over

Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings

Authors :
Jules Schleinitz
Maxime Langevin
Yanis Smail
Benjamin Wehnert
Laurence Grimaud
Rodolphe Vuilleumier
École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)
Laboratoire des biomolécules (LBM UMR 7203)
Chimie Moléculaire de Paris Centre (FR 2769)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Ecole Nationale Supérieure de Chimie de Paris - Chimie ParisTech-PSL (ENSCP)
Université Paris sciences et lettres (PSL)-Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris)
Université Paris sciences et lettres (PSL)-Institut de Chimie du CNRS (INC)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Institut de Chimie du CNRS (INC)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Département de Chimie - ENS Paris
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Centre National de la Recherche Scientifique (CNRS)
Sanofi [Vitry-sur-Seine]
SANOFI Recherche
Processus d'Activation Sélective par Transfert d'Energie Uni-électronique ou Radiatif (UMR 8640) (PASTEUR)
Département de Chimie - ENS Paris
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS-PSL)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut de Chimie du CNRS (INC)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
Sorbonne Université (SU)
Jules SCHLEINITZ, Jules
Source :
Journal of the American Chemical Society, Journal of the American Chemical Society, 2022, 144 (32), pp.14722-14730. ⟨10.1021/jacs.2c05302⟩
Publication Year :
2022
Publisher :
HAL CCSD, 2022.

Abstract

International audience; Synthetic yield prediction using machine learning is intensively studied. Previous work focused on two categories of datasets: High-Throughput Experimentation data, as an ideal case study and datasets extracted from proprietary databases, which are known to have a strong reporting bias towards high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a dataset on 1 nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.

Details

Language :
English
ISSN :
00027863 and 15205126
Database :
OpenAIRE
Journal :
Journal of the American Chemical Society, Journal of the American Chemical Society, 2022, 144 (32), pp.14722-14730. ⟨10.1021/jacs.2c05302⟩
Accession number :
edsair.doi.dedup.....4d3c26b7ba60bd2f74329d2a1d631ad1