1. High-dimensional variable clustering based on sub-asymptotic maxima of a weakly dependent random process
- Author
-
Boulin, Alexis, Di Bernardino, Elena, Laloë, Thomas, Toulemonde, Gwladys, Littoral, Environment: MOdels and Numerics (LEMON), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Montpelliérain Alexander Grothendieck (IMAG), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Hydrosciences Montpellier (HSM), Institut de Recherche pour le Développement (IRD)-Institut national des sciences de l'Univers (INSU - CNRS)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Institut de Recherche pour le Développement (IRD)-Institut national des sciences de l'Univers (INSU - CNRS)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Jean Alexandre Dieudonné (LJAD), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Institut Montpelliérain Alexander Grothendieck (IMAG), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Université de Montpellier (UM), Centre National de la Recherche Scientifique (CNRS), ANR-20-CE23-0011,McLaren,Apprentissage Statistique et Evaluation du Risque(2020), ANR-19-P3IA-0002,3IA@cote d'azur,3IA Côte d'Azur(2019), Université Côte d'Azur (UCA), and IMAG
- Subjects
FOS: Computer and information sciences ,60G70, 62H05, 62M99 ,Consistent estimation ,Extreme value theory ,Mathematics - Statistics Theory ,Machine Learning (stat.ML) ,Statistics Theory (math.ST) ,Variable clustering ,Methodology (stat.ME) ,Statistics - Machine Learning ,High dimensional models ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,FOS: Mathematics ,MSC Codes: 60G70 ,62H05 ,62M99 ,Statistics - Methodology - Abstract
We propose a new class of models for variable clustering called Asymptotic Independent block (AI-block) models, which defines population-level clusters based on the independence of the maxima of a multivariate stationary mixing random process among clusters. This class of models is identifiable, meaning that there exists a maximal element with a partial order between partitions, allowing for statistical inference. We also present an algorithm for recovering the clusters of variables without specifying the number of clusters \emph{a priori}. Our work provides some theoritical insights into the consistency of our algorithm, demonstrating that under certain conditions it can effectively identify clusters in the data with a computational complexity that is polynomial in the dimension. This implies that groups can be learned nonparametrically in which block maxima of a dependent process are only sub-asymptotic., Comment: 50 pages, 6 figures
- Published
- 2023
- Full Text
- View/download PDF