1. Probabilistic Data-Driven Sampling via Multi-Criteria Importance Analysis
- Author
-
Earl Lawrence, Soumya Dutta, John Patchett, James Ahrens, Ayan Biswas, and Jon Calhoun
- Subjects
business.industry ,Computer science ,Probabilistic logic ,Sampling (statistics) ,computer.software_genre ,Computer Graphics and Computer-Aided Design ,Visualization ,Data-driven ,Data modeling ,Orders of magnitude (bit rate) ,Data visualization ,Signal Processing ,Post-hoc analysis ,Fraction (mathematics) ,Computer Vision and Pattern Recognition ,Data mining ,business ,computer ,Software - Abstract
Although supercomputers are becoming increasingly powerful, their components have thus far not scaled proportionately. Compute power is growing enormously and is enabling finely resolved simulations that produce never-before-seen features. However, I/O capabilities lag by orders of magnitude, which means only a fraction of the simulation data can be stored for post hoc analysis. Prespecified plans for saving features and quantities of interest do not work for features that have not been seen before. Data-driven intelligent sampling schemes are needed to detect and save important parts of the simulation while it is running. Here, we propose a novel sampling scheme that reduces the size of the data by orders-of-magnitude while still preserving important regions. The approach we develop selects points with unusual data values and high gradients. We demonstrate that our approach outperforms traditional sampling schemes on a number of tasks.
- Published
- 2021