Back to Search
Start Over
Machine Learning Yield Prediction from NiCOlit, a Small-Size Literature Data Set of Nickel Catalyzed C–O Couplings
- Source :
- Journal of the American Chemical Society, Journal of the American Chemical Society, 2022, 144 (32), pp.14722-14730. ⟨10.1021/jacs.2c05302⟩
- Publication Year :
- 2022
- Publisher :
- HAL CCSD, 2022.
-
Abstract
- International audience; Synthetic yield prediction using machine learning is intensively studied. Previous work focused on two categories of datasets: High-Throughput Experimentation data, as an ideal case study and datasets extracted from proprietary databases, which are known to have a strong reporting bias towards high yields. However, predicting yields using published reaction data remains elusive. To fill the gap, we built a dataset on 1 nickel-catalyzed cross-couplings extracted from organic reaction publications, including scope and optimization information. We demonstrate the importance of including optimization data as a source of failed experiments and emphasize how publication constraints shape the exploration of the chemical space by the synthetic community. While machine learning models still fail to perform out-of-sample predictions, this work shows that adding chemical knowledge enables fair predictions in a low-data regime. Eventually, we hope that this unique public database will foster further improvements of machine learning methods for reaction yield prediction in a more realistic context.
- Subjects :
- [CHIM.ORGA]Chemical Sciences/Organic chemistry
[CHIM.GENI] Chemical Sciences/Chemical engineering
[CHIM.CATA] Chemical Sciences/Catalysis
General Chemistry
[CHIM.CATA]Chemical Sciences/Catalysis
[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG]
[CHIM.ORGA] Chemical Sciences/Organic chemistry
Biochemistry
Catalysis
Machine Learning
Colloid and Surface Chemistry
[CHIM.GENI]Chemical Sciences/Chemical engineering
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Nickel
[CHIM.CHEM] Chemical Sciences/Cheminformatics
[CHIM.CHEM]Chemical Sciences/Cheminformatics
Subjects
Details
- Language :
- English
- ISSN :
- 00027863 and 15205126
- Database :
- OpenAIRE
- Journal :
- Journal of the American Chemical Society, Journal of the American Chemical Society, 2022, 144 (32), pp.14722-14730. ⟨10.1021/jacs.2c05302⟩
- Accession number :
- edsair.doi.dedup.....4d3c26b7ba60bd2f74329d2a1d631ad1