Back to Search Start Over

A look inside the black box: Using graph-theoretical descriptors to interpret a Continuous-Filter Convolutional Neural Network (CF-CNN) trained on the global and local minimum energy structures of neutral water clusters.

Authors :
Bilbrey, Jenna A.
Heindel, Joseph P.
Schram, Malachi
Bandyopadhyay, Pradipta
Xantheas, Sotiris S.
Choudhury, Sutanay
Source :
Journal of Chemical Physics. 7/14/2020, Vol. 153 Issue 2, p1-15. 15p. 3 Diagrams, 1 Chart, 8 Graphs.
Publication Year :
2020

Abstract

We describe a method for the post-hoc interpretation of a neural network (NN) trained on the global and local minima of neutral water clusters. We use the structures recently reported in a newly published database containing over 5 × 106 unique water cluster networks (H2O)N of size N = 3–30. The structural properties were first characterized using chemical descriptors derived from graph theory, identifying important trends in topology, connectivity, and polygon structure of the networks associated with the various minima. The code to generate the molecular graphs and compute the descriptors is available at https://github.com/exalearn/molecular-graph-descriptors, and the graphs are available alongside the original database at https://sites.uw.edu/wdbase/. A Continuous-Filter Convolutional Neural Network (CF-CNN) was trained on a subset of 500 000 networks to predict the potential energy, yielding a mean absolute error of 0.002 ± 0.002 kcal/mol per water molecule. Clusters of sizes not included in the training set exhibited errors of the same magnitude, indicating that the CF-CNN protocol accurately predicts energies of networks for both smaller and larger sizes than those used during training. The graph-theoretical descriptors were further employed to interpret the predictive power of the CF-CNN. Topological measures, such as the Wiener index, the average shortest path length, and the similarity index, suggested that all networks from the test set were within the range of values as the ones from the training set. The graph analysis suggests that larger errors appear when the mean degree and the number of polygons in the cluster lie further from the mean of the training set. This indicates that the structural space, and not just the chemical space, is an important factor to consider when designing training sets, as predictive errors can result when the structural composition is sufficiently different from the bulk of those in the training set. To this end, the developed descriptors are quite effective in explaining the results of the CF-CNN (a.k.a. the "black box") model. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00219606
Volume :
153
Issue :
2
Database :
Academic Search Index
Journal :
Journal of Chemical Physics
Publication Type :
Academic Journal
Accession number :
144565143
Full Text :
https://doi.org/10.1063/5.0009933