23 results on '"Vaucher AC"'
Search Results
2. Leveraging infrared spectroscopy for automated structure elucidation.
- Author
-
Alberts M, Laino T, and Vaucher AC
- Abstract
The application of machine learning models in chemistry has made remarkable strides in recent years. While analytical chemistry has received considerable interest from machine learning practitioners, its adoption into everyday use remains limited. Among the available analytical methods, Infrared (IR) spectroscopy stands out in terms of affordability, simplicity, and accessibility. However, its use has been limited to the identification of a selected few functional groups, as most peaks lie beyond human interpretation. We present a transformer model that enables chemists to leverage the complete information contained within an IR spectrum to directly predict the molecular structure. To cover a large chemical space, we pretrain the model using 634,585 simulated IR spectra and fine-tune it on 3,453 experimental spectra. Our approach achieves a top-1 accuracy of 44.4% and top-10 accuracy of 69.8% on compounds containing 6 to 13 heavy atoms. When solely predicting scaffolds, the model accurately predicts the top-1 scaffold in 84.5% and among the top-10 in 93.0% of cases., Competing Interests: Competing interests Dr. Teodoro Laino is an Editorial Board Member for Communications Chemistry, but was not involved in the editorial review of, or the decision to publish this article. All other authors declare no competing interests., (© 2024. The Author(s).)
- Published
- 2024
- Full Text
- View/download PDF
3. Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis.
- Author
-
Suvarna M, Vaucher AC, Mitchell S, Laino T, and Pérez-Ramírez J
- Abstract
Synthesis protocol exploration is paramount in catalyst discovery, yet keeping pace with rapid literature advances is increasingly time intensive. Automated synthesis protocol analysis is attractive for swiftly identifying opportunities and informing predictive models, however such applications in heterogeneous catalysis remain limited. In this proof-of-concept, we introduce a transformer model for this task, exemplified using single-atom heterogeneous catalysts (SACs), a rapidly expanding catalyst family. Our model adeptly converts SAC protocols into action sequences, and we use this output to facilitate statistical inference of their synthesis trends and applications, potentially expediting literature review and analysis. We demonstrate the model's adaptability across distinct heterogeneous catalyst families, underscoring its versatility. Finally, our study highlights a critical issue: the lack of standardization in reporting protocols hampers machine-reading capabilities. Embracing digital advances in catalysis demands a shift in data reporting norms, and to this end, we offer guidelines for writing protocols, significantly improving machine-readability. We release our model as an open-source web application, inviting a fresh approach to accelerate heterogeneous catalysis synthesis planning., (© 2023. The Author(s).)
- Published
- 2023
- Full Text
- View/download PDF
4. Fast Customization of Chemical Language Models to Out-of-Distribution Data Sets.
- Author
-
Toniato A, Vaucher AC, Lehmann MM, Luksch T, Schwaller P, Stenta M, and Laino T
- Abstract
The world is on the verge of a new industrial revolution, and language models are poised to play a pivotal role in this transformative era. Their ability to offer intelligent insights and forecasts has made them a valuable asset for businesses seeking a competitive advantage. The chemical industry, in particular, can benefit significantly from harnessing their power. Since 2016 already, language models have been applied to tasks such as predicting reaction outcomes or retrosynthetic routes. While such models have demonstrated impressive abilities, the lack of publicly available data sets with universal coverage is often the limiting factor for achieving even higher accuracies. This makes it imperative for organizations to incorporate proprietary data sets into their model training processes to improve their performance. So far, however, these data sets frequently remain untapped as there are no established criteria for model customization. In this work, we report a successful methodology for retraining language models on reaction outcome prediction and single-step retrosynthesis tasks, using proprietary, nonpublic data sets. We report a considerable boost in accuracy by combining patent and proprietary data in a multidomain learning formulation. This exercise, inspired by a real-world use case, enables us to formulate guidelines that can be adopted in different corporate settings to customize chemical language models easily., Competing Interests: The authors declare no competing financial interest., (© 2023 The Authors and Syngenta. Published by American Chemical Society.)
- Published
- 2023
- Full Text
- View/download PDF
5. Fuelling the Digital Chemistry Revolution with Language Models.
- Author
-
Cardinale A, Castrogiovanni A, Gaudin T, Geluykens J, Laino T, Manica M, Probst D, Schwaller P, Sobczyk A, Toniato A, Vaucher AC, Wolf H, and Zipoli F
- Abstract
The RXN for Chemistry project, initiated by IBM Research Europe - Zurich in 2017, aimed to develop a series of digital assets using machine learning techniques to promote the use of data-driven methodologies in synthetic organic chemistry. This research adopts an innovative concept by treating chemical reaction data as language records, treating the prediction of a synthetic organic chemistry reaction as a translation task between precursor and product languages. Over the years, the IBM Research team has successfully developed language models for various applications including forward reaction prediction, retrosynthesis, reaction classification, atom-mapping, procedure extraction from text, inference of experimental protocols and its use in programming commercial automation hardware to implement an autonomous chemical laboratory. Furthermore, the project has recently incorporated biochemical data in training models for greener and more sustainable chemical reactions. The remarkable ease of constructing prediction models and continually enhancing them through data augmentation with minimal human intervention has led to the widespread adoption of language model technologies, facilitating the digitalization of chemistry in diverse industrial sectors such as pharmaceuticals and chemical manufacturing. This manuscript provides a concise overview of the scientific components that contributed to the prestigious Sandmeyer Award in 2022., (Copyright 2023 Antonio Cardinale, Alessandro Castrogiovanni, Theophile Gaudin, Joppe Geluykens, Teodoro Laino, Matteo Manica, Daniel Probst, Philippe Schwaller, Aleksandros Sobczyk, Alessandra Toniato, Alain C. Vaucher, Heiko Wolf, Federico Zipoli. License: This work is licensed under a Creative Commons Attribution 4.0 International License.)
- Published
- 2023
- Full Text
- View/download PDF
6. Unbiasing Retrosynthesis Language Models with Disconnection Prompts.
- Author
-
Thakkar A, Vaucher AC, Byekwaso A, Schwaller P, Toniato A, and Laino T
- Abstract
Data-driven approaches to retrosynthesis are limited in user interaction, diversity of their predictions, and recommendation of unintuitive disconnection strategies. Herein, we extend the notions of prompt-based inference in natural language processing to the task of chemical language modeling. We show that by using a prompt describing the disconnection site in a molecule we can steer the model to propose a broader set of precursors, thereby overcoming training data biases in retrosynthetic recommendations and achieving a 39% performance improvement over the baseline. For the first time, the use of a disconnection prompt empowers chemists by giving them greater control over the disconnection predictions, which results in more diverse and creative recommendations. In addition, in place of a human-in-the-loop strategy, we propose a two-stage schema consisting of automatic identification of disconnection sites, followed by prediction of reactant sets, thereby achieving a considerable improvement in class diversity compared with the baseline. The approach is effective in mitigating prediction biases derived from training data. This provides a wider variety of usable building blocks and improves the end user's digital experience. We demonstrate its application to different chemistry domains, from traditional to enzymatic reactions, in which substrate specificity is critical., Competing Interests: The authors declare no competing financial interest., (© 2023 The Authors. Published by American Chemical Society.)
- Published
- 2023
- Full Text
- View/download PDF
7. Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planning.
- Author
-
Toniato A, Unsleber JP, Vaucher AC, Weymuth T, Probst D, Laino T, and Reiher M
- Abstract
Data-driven synthesis planning has seen remarkable successes in recent years by virtue of modern approaches of artificial intelligence that efficiently exploit vast databases with experimental data on chemical reactions. However, this success story is intimately connected to the availability of existing experimental data. It may well occur in retrosynthetic and synthesis design tasks that predictions in individual steps of a reaction cascade are affected by large uncertainties. In such cases, it will, in general, not be easily possible to provide missing data from autonomously conducted experiments on demand. However, first-principles calculations can, in principle, provide missing data to enhance the confidence of an individual prediction or for model retraining. Here, we demonstrate the feasibility of such an ansatz and examine resource requirements for conducting autonomous first-principles calculations on demand., Competing Interests: There are no conflicts to declare., (This journal is © The Royal Society of Chemistry.)
- Published
- 2023
- Full Text
- View/download PDF
8. Enhancing diversity in language based models for single-step retrosynthesis.
- Author
-
Toniato A, Vaucher AC, Schwaller P, and Laino T
- Abstract
Over the past four years, several research groups demonstrated the combination of domain-specific language representation with recent NLP architectures to accelerate innovation in a wide range of scientific fields. Chemistry is a great example. Among the various chemical challenges addressed with language models, retrosynthesis demonstrates some of the most distinctive successes and limitations. Single-step retrosynthesis, the task of identifying reactions able to decompose a complex molecule into simpler structures, can be cast as a translation problem, in which a text-based representation of the target molecule is converted into a sequence of possible precursors. A common issue is a lack of diversity in the proposed disconnection strategies. The suggested precursors typically fall in the same reaction family, which limits the exploration of the chemical space. We present a retrosynthesis Transformer model that increases the diversity of the predictions by prepending a classification token to the language representation of the target molecule. At inference, the use of these prompt tokens allows us to steer the model towards different kinds of disconnection strategies. We show that the diversity of the predictions improves consistently, which enables recursive synthesis tools to circumvent dead ends and consequently, suggests synthesis pathways for more complex molecules., Competing Interests: There are no conflicts to declare., (This journal is © The Royal Society of Chemistry.)
- Published
- 2023
- Full Text
- View/download PDF
9. Inferring experimental procedures from text-based representations of chemical reactions.
- Author
-
Vaucher AC, Schwaller P, Geluykens J, Nair VH, Iuliano A, and Laino T
- Abstract
The experimental execution of chemical reactions is a context-dependent and time-consuming process, often solved using the experience collected over multiple decades of laboratory work or searching similar, already executed, experimental protocols. Although data-driven schemes, such as retrosynthetic models, are becoming established technologies in synthetic organic chemistry, the conversion of proposed synthetic routes to experimental procedures remains a burden on the shoulder of domain experts. In this work, we present data-driven models for predicting the entire sequence of synthesis steps starting from a textual representation of a chemical equation, for application in batch organic chemistry. We generated a data set of 693,517 chemical equations and associated action sequences by extracting and processing experimental procedure text from patents, using state-of-the-art natural language models. We used the attained data set to train three different models: a nearest-neighbor model based on recently-introduced reaction fingerprints, and two deep-learning sequence-to-sequence models based on the Transformer and BART architectures. An analysis by a trained chemist revealed that the predicted action sequences are adequate for execution without human intervention in more than 50% of the cases.
- Published
- 2021
- Full Text
- View/download PDF
10. Automated extraction of chemical synthesis actions from experimental procedures.
- Author
-
Vaucher AC, Zipoli F, Geluykens J, Nair VH, Schwaller P, and Laino T
- Abstract
Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.
- Published
- 2020
- Full Text
- View/download PDF
11. Training Neural Nets To Learn Reactive Potential Energy Surfaces Using Interactive Quantum Chemistry in Virtual Reality.
- Author
-
Amabilino S, Bratholm LA, Bennie SJ, Vaucher AC, Reiher M, and Glowacki DR
- Abstract
While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation-that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.
- Published
- 2019
- Full Text
- View/download PDF
12. GuacaMol: Benchmarking Models for de Novo Molecular Design.
- Author
-
Brown N, Fiscato M, Segler MHS, and Vaucher AC
- Subjects
- Drug Design, Isomerism, Models, Molecular, Molecular Structure, Monte Carlo Method, Quantitative Structure-Activity Relationship, Benchmarking methods, Deep Learning, Pharmaceutical Preparations chemistry
- Abstract
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol .
- Published
- 2019
- Full Text
- View/download PDF
13. Exploration of Reaction Pathways and Chemical Transformation Networks.
- Author
-
Simm GN, Vaucher AC, and Reiher M
- Abstract
For the investigation of chemical reaction networks, the identification of all relevant intermediates and elementary reactions is mandatory. Many algorithmic approaches exist that perform explorations efficiently and in an automated fashion. These approaches differ in their application range, the level of completeness of the exploration, and the amount of heuristics and human intervention required. Here, we describe and compare the different approaches based on these criteria. Future directions leveraging the strengths of chemical heuristics, human interaction, and physical rigor are discussed.
- Published
- 2019
- Full Text
- View/download PDF
14. Minimum Energy Paths and Transition States by Curve Optimization.
- Author
-
Vaucher AC and Reiher M
- Abstract
Transition states and minimum energy paths are essential to understand and predict chemical reactivity. Double-ended methods represent a standard approach for their determination. We introduce a new double-ended method that optimizes reaction paths described by curves. Unlike other methods, our approach optimizes the curve parameters rather than distinct structures along the path. With molecular paths represented as continuous curves, the optimization can benefit from the advantages of an integral-based formulation. We call this approach ReaDuct and demonstrate its applicability for molecular paths parametrized by B-spline curves.
- Published
- 2018
- Full Text
- View/download PDF
15. Integrated Reaction Path Processing from Sampled Structure Sequences.
- Author
-
Heuer MA, Vaucher AC, Haag MP, and Reiher M
- Abstract
Sampled structure sequences obtained, for instance, from real-time reactivity explorations or first-principles molecular dynamics simulations contain valuable information about chemical reactivity. Eventually, such sequences allow for the construction of reaction networks that are required for the kinetic analysis of chemical systems. For this purpose, however, the sampled information must be processed to obtain stable chemical structures and associated transition states. The manual extraction of valuable information from such reaction paths is straightforward but unfeasible for large and complex reaction networks. For real-time quantum chemistry, this implies automatization of the extraction and relaxation process while maintaining immersion in the virtual chemical environment. Here, we describe an efficient path processing scheme for the on-the-fly construction of an exploration network by approximating the explored paths as continuous basis-spline curves.
- Published
- 2018
- Full Text
- View/download PDF
16. Steering Orbital Optimization out of Local Minima and Saddle Points Toward Lower Energy.
- Author
-
Vaucher AC and Reiher M
- Abstract
The general procedure underlying Hartree-Fock and Kohn-Sham density functional theory calculations consists in optimizing orbitals for a self-consistent solution of the Roothaan-Hall equations in an iterative process. It is often ignored that multiple self-consistent solutions can exist, several of which may correspond to minima of the energy functional. In addition to the difficulty sometimes encountered to converge the calculation to a self-consistent solution, one must ensure that the correct self-consistent solution was found, typically the one with the lowest electronic energy. Convergence to an unwanted solution is in general not trivial to detect and will deliver incorrect energy and molecular properties and accordingly a misleading description of chemical reactivity. Wrong conclusions based on incorrect self-consistent field convergence are particularly cumbersome in automated calculations met in high-throughput virtual screening, structure optimizations, ab initio molecular dynamics, and in real-time explorations of chemical reactivity, where the vast amount of data can hardly be manually inspected. Here, we introduce a fast and automated approach to detect and cure incorrect orbital convergence, which is especially suited for electronic structure calculations on sequences of molecular structures. Our approach consists of a randomized perturbation of the converged electron density (matrix) intended to push orbital convergence to solutions that correspond to another stationary point (of potentially lower electronic energy) in the variational parameter space of an electronic wave function approximation.
- Published
- 2017
- Full Text
- View/download PDF
17. One Bronze Medal for Switzerland at the 48 th International Chemistry Olympiad in Tbilisi, Georgia.
- Author
-
Vaucher AC
- Abstract
Four Swiss high school students participated in the 48th International Chemistry Olympiad (IChO), which took place from July 23 to August 1 in Tbilisi, Georgia. Dominic Egger, Nicolà Gantenbein, Simone Heimgartner and Diego Zenhäusern competed against 260 other students from 71 countries. Dominic Egger brought home a well-deserved bronze medal.
- Published
- 2016
- Full Text
- View/download PDF
18. Molecular Propensity as a Driver for Explorative Reactivity Studies.
- Author
-
Vaucher AC and Reiher M
- Subjects
- Cycloaddition Reaction, Epoxy Compounds chemistry, Ferric Compounds chemistry, Hydrogen chemistry, Molecular Conformation, Oxidation-Reduction, Photochemical Processes, Protons, Thermodynamics, Models, Molecular, Quantum Theory
- Abstract
Quantum chemical studies of reactivity involve calculations on a large number of molecular structures and the comparison of their energies. Already the setup of these calculations limits the scope of the results that one will obtain, because several system-specific variables such as the charge and spin need to be set prior to the calculation. For a reliable exploration of reaction mechanisms, a considerable number of calculations with varying global parameters must be taken into account, or important facts about the reactivity of the system under consideration can remain undetected. For example, one could miss crossings of potential energy surfaces for different spin states or might not note that a molecule is prone to oxidation. Here, we introduce the concept of molecular propensity to account for the predisposition of a molecular system to react across different electronic states in certain nuclear configurations or with other reactants present in the reaction liquor. Within our real-time quantum chemistry framework, we developed an algorithm that automatically detects and flags such a propensity of a system under consideration.
- Published
- 2016
- Full Text
- View/download PDF
19. Real-time feedback from iterative electronic structure calculations.
- Author
-
Vaucher AC, Haag MP, and Reiher M
- Abstract
Real-time feedback from iterative electronic structure calculations requires to mediate between the inherently unpredictable execution times of the iterative algorithm used and the necessity to provide data in fixed and short time intervals for real-time rendering. We introduce the concept of a mediator as a component able to deal with infrequent and unpredictable reference data to generate reliable feedback. In the context of real-time quantum chemistry, the mediator takes the form of a surrogate potential that has the same local shape as the first-principles potential and can be evaluated efficiently to deliver atomic forces as real-time feedback. The surrogate potential is updated continuously by electronic structure calculations and guarantees to provide a reliable response to the operator for any molecular structure. To demonstrate the application of iterative electronic structure methods in real-time reactivity exploration, we implement self-consistent semiempirical methods as the data source and apply the surrogate-potential mediator to deliver reliable real-time feedback., (© 2015 Wiley Periodicals, Inc.)
- Published
- 2016
- Full Text
- View/download PDF
20. Accelerating Wave Function Convergence in Interactive Quantum Chemical Reactivity Studies.
- Author
-
Mühlbach AH, Vaucher AC, and Reiher M
- Abstract
The inherently high computational cost of iterative self-consistent field (SCF) methods proves to be a critical issue delaying visual and haptic feedback in real-time quantum chemistry. In this work, we introduce two schemes for SCF acceleration. They provide a guess for the initial density matrix of the SCF procedure generated by extrapolation techniques. SCF optimizations then converge in fewer iterations, which decreases the execution time of the SCF optimization procedure. To benchmark the proposed propagation schemes, we developed a test bed for performing quantum chemical calculations on sequences of molecular structures mimicking real-time quantum chemical explorations. Explorations of a set of six model reactions employing the semi-empirical methods PM6 and DFTB3 in this testing environment showed that the proposed propagation schemes achieved speedups of up to 30% as a consequence of a reduced number of SCF iterations.
- Published
- 2016
- Full Text
- View/download PDF
21. Two Bronze Medals for Switzerland at the 46 th International Chemistry Olympiad in Hanoi, Vietnam.
- Author
-
Ludwig PE, Vaucher AC, Lê TP, and Suter Y
- Published
- 2015
- Full Text
- View/download PDF
22. Two Bronze Medals for Switzerland at the 46th International Chemistry Olympiad in Hanoi, Vietnam.
- Author
-
Ludwig PE, Vaucher AC, Lê TP, and Suter Y
- Published
- 2015
- Full Text
- View/download PDF
23. Interactive chemical reactivity exploration.
- Author
-
Haag MP, Vaucher AC, Bosson M, Redon S, and Reiher M
- Abstract
Elucidating chemical reactivity in complex molecular assemblies of a few hundred atoms is, despite the remarkable progress in quantum chemistry, still a major challenge. Black-box search methods to find intermediates and transition-state structures might fail in such situations because of the high-dimensionality of the potential energy surface. Here, we propose the concept of interactive chemical reactivity exploration to effectively introduce the chemist's intuition into the search process. We employ a haptic pointer device with force feedback to allow the operator the direct manipulation of structures in three dimensions along with simultaneous perception of the quantum mechanical response upon structure modification as forces. We elaborate on the details of how such an interactive exploration should proceed and which technical difficulties need to be overcome. All reactivity-exploration concepts developed for this purpose have been implemented in the samson programming environment., (© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.)
- Published
- 2014
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.