1. Mapping Materials and Molecules
- Author
-
Noam Bernstein, Volker L. Deringer, Tamas Stenczel, Gábor Csányi, Karsten Reuter, Bonan Zhu, Simon Wengert, Ryan-Rhys Griffiths, Bingqing Cheng, Johannes T. Margraf, Christian Kunkel, Cheng, Bingqing [0000-0002-3584-9632], Deringer, Volker L [0000-0001-6873-0278], Reuter, Karsten [0000-0001-8473-8659], and Apollo - University of Cambridge Repository
- Subjects
Structure (mathematical logic) ,34 Chemical Sciences ,010405 organic chemistry ,business.industry ,Big data ,General Medicine ,General Chemistry ,Construct (python library) ,010402 general chemistry ,01 natural sciences ,Data science ,0104 chemical sciences ,Variety (cybernetics) ,Visualization ,Data point ,Scatter plot ,Generic health relevance ,3404 Medicinal and Biomolecular Chemistry ,Representation (mathematics) ,business - Abstract
Conspectus The visualization of data is indispensable in scientific research, from the early stages when human insight forms to the final step of communicating results. In computational physics, chemistry and materials science, it can be as simple as making a scatter plot or as straightforward as looking through the snapshots of atomic positions manually. However, as a result of the “big data” revolution, these conventional approaches are often inadequate. The widespread adoption of high-throughput computation for materials discovery and the associated community-wide repositories have given rise to data sets that contain an enormous number of compounds and atomic configurations. A typical data set contains thousands to millions of atomic structures, along with a diverse range of properties such as formation energies, band gaps, or bioactivities. It would thus be desirable to have a data-driven and automated framework for visualizing and analyzing such structural data sets. The key idea is to construct a low-dimensional representation of the data, which facilitates navigation, reveals underlying patterns, and helps to identify data points with unusual attributes. Such data-intensive maps, often employing machine learning methods, are appearing more and more frequently in the literature. However, to the wider community, it is not always transparent how these maps are made and how they should be interpreted. Furthermore, while these maps undoubtedly serve a decorative purpose in academic publications, it is not always apparent what extra information can be garnered from reading or making them. This Account attempts to answer such questions. We start with a concise summary of the theory of representing chemical environments, followed by the introduction of a simple yet practical conceptual approach for generating structure maps in a generic and automated manner. Such analysis and mapping is made nearly effortless by employing the newly developed software tool ASAP. To showcase the applicability to a wide variety of systems in chemistry and materials science, we provide several illustrative examples, including crystalline and amorphous materials, interfaces, and organic molecules. In these examples, the maps not only help to sift through large data sets but also reveal hidden patterns that could be easily missed using conventional analyses. The explosion in the amount of computed information in chemistry and materials science has made visualization into a science in itself. Not only have we benefited from exploiting these visualization methods in previous works, we also believe that the automated mapping of data sets will in turn stimulate further creativity and exploration, as well as ultimately feed back into future advances in the respective fields.
- Published
- 2020