1. Estimating minimum effect with outlier selection
- Author
-
Alexandra Carpentier, Sylvain Delattre, Nicolas Verzelen, Etienne Roquain, Verzelen, Nicolas, Otto-von-Guericke-Universität Magdeburg, Sorbonne Université (SU), Mathématiques, Informatique et STatistique pour l'Environnement et l'Agronomie (MISTEA), Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut National de la Recherche Agronomique (INRA), Otto-von-Guericke University [Magdeburg] (OVGU), Laboratoire de Probabilités et Modèles Aléatoires (LPMA), Centre National de la Recherche Scientifique (CNRS)-Université Paris Diderot - Paris 7 (UPD7)-Université Pierre et Marie Curie - Paris 6 (UPMC), Laboratoire de Probabilités, Statistiques et Modélisations (LPSM (UMR_8001)), Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Université de Paris (UP), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national d’études supérieures agronomiques de Montpellier (Montpellier SupAgro), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), ANR-16-CE40-0019,SansSouci,Approches post hoc pour les tests multiples à grande échelle(2016), and ANR-17-CE40-0001,BASICS,Bayésien non-paramétrique, quantification de l'incertitude et structures aléatoires(2017)
- Subjects
Statistics and Probability ,False discovery rate ,FOS: Computer and information sciences ,minimax rate ,selective inference ,equicorrelation ,moment matching ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,01 natural sciences ,Methodology (stat.ME) ,010104 statistics & probability ,Contamination ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,0502 economics and business ,Convergence (routing) ,Statistics ,FOS: Mathematics ,[MATH]Mathematics [math] ,0101 mathematics ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Selection (genetic algorithm) ,Statistics - Methodology ,050205 econometrics ,Mathematics ,Hermite polynomials ,62C20 ,multiple testing ,sparsity ,05 social sciences ,Estimator ,Minimax ,[MATH.MATH-PR]Mathematics [math]/Probability [math.PR] ,Multiple comparisons problem ,Outlier ,false discovery rate ,Statistics, Probability and Uncertainty ,Null hypothesis ,post hoc ,62G10 - Abstract
We introduce one-sided versions of Huber's contamination model, in which corrupted samples tend to take larger values than uncorrupted ones. Two intertwined problems are addressed: estimation of the mean of uncorrupted samples (minimum effect) and selection of corrupted samples (outliers). Regarding the minimum effect estimation, we derive the minimax risks and introduce adaptive estimators to the unknown number of contaminations. Interestingly, the optimal convergence rate highly differs from that in classical Huber's contamination model. Also, our analysis uncovers the effect of particular structural assumptions on the distribution of the contaminated samples. As for the problem of selecting the outliers, we formulate the problem in a multiple testing framework for which the location/scaling of the null hypotheses are unknown. We rigorously prove how estimating the null hypothesis is possible while maintaining a theoretical guarantee on the amount of the falsely selected outliers, both through false discovery rate (FDR) or post hoc bounds. As a by-product, we address a long-standing open issue on FDR control under equi-correlation, which reinforces the interest of removing dependency when making multiple testing., Comment: 70 pages; 7 figures
- Published
- 2018
- Full Text
- View/download PDF