1. Data Augmentation for Regression Machine Learning Problems in High Dimensions
- Author
-
Clara Guilhaumon, Nicolas Hascoët, Francisco Chinesta, Marc Lavarde, and Fatima Daim
- Subjects
active learning ,design of experiments ,regression ,s-PGD ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Machine learning approaches are currently used to understand or model complex physical systems. In general, a substantial number of samples must be collected to create a model with reliable results. However, collecting numerous data is often relatively time-consuming or expensive. Moreover, the problems of industrial interest tend to be more and more complex, and depend on a high number of parameters. High-dimensional problems intrinsically involve the need for large amounts of data through the curse of dimensionality. That is why new approaches based on smart sampling techniques have been investigated to minimize the number of samples to be given to train the model, such as active learning methods. Here, we propose a technique based on a combination of the Fisher information matrix and sparse proper generalized decomposition that enables the definition of a new active learning informativeness criterion in high dimensions. We provide examples proving the performances of this technique on a theoretical 5D polynomial function and on an industrial crash simulation application. The results prove that the proposed strategy outperforms the usual ones.
- Published
- 2024
- Full Text
- View/download PDF