125 results on '"POGGI, JEAN-MICHEL"'
Search Results
2. Variable Selection
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Gentleman, Robert, Series Editor, Hornik, Kurt, Series Editor, Parmigiani, Giovanni, Series Editor, Genuer, Robin, and Poggi, Jean-Michel
- Published
- 2020
- Full Text
- View/download PDF
3. Variable Importance
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Gentleman, Robert, Series Editor, Hornik, Kurt, Series Editor, Parmigiani, Giovanni, Series Editor, Genuer, Robin, and Poggi, Jean-Michel
- Published
- 2020
- Full Text
- View/download PDF
4. CART
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Gentleman, Robert, Series Editor, Hornik, Kurt, Series Editor, Parmigiani, Giovanni, Series Editor, Genuer, Robin, and Poggi, Jean-Michel
- Published
- 2020
- Full Text
- View/download PDF
5. Random Forests
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Gentleman, Robert, Series Editor, Hornik, Kurt, Series Editor, Parmigiani, Giovanni, Series Editor, Genuer, Robin, and Poggi, Jean-Michel
- Published
- 2020
- Full Text
- View/download PDF
6. Introduction to Random Forests with R
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Gentleman, Robert, Series Editor, Hornik, Kurt, Series Editor, Parmigiani, Giovanni, Series Editor, Genuer, Robin, and Poggi, Jean-Michel
- Published
- 2020
- Full Text
- View/download PDF
7. Spatial CART classification trees
- Author
-
Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
- Published
- 2021
- Full Text
- View/download PDF
8. Arbres CART et For\^ets al\'eatoires, Importance et s\'election de variables
- Author
-
Genuer, Robin and Poggi, Jean-Michel
- Subjects
Statistics - Methodology ,Mathematics - Statistics Theory - Abstract
Two algorithms proposed by Leo Breiman : CART trees (Classification And Regression Trees for) introduced in the first half of the 80s and random forests emerged, meanwhile, in the early 2000s, are the subject of this article. The goal is to provide each of the topics, a presentation, a theoretical guarantee, an example and some variants and extensions. After a preamble, introduction recalls objectives of classification and regression problems before retracing some predecessors of the Random Forests. Then, a section is devoted to CART trees then random forests are presented. Then, a variable selection procedure based on permutation variable importance is proposed. Finally the adaptation of random forests to the Big Data context is sketched., Comment: in French
- Published
- 2016
9. Random Forests for Big Data
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, and Villa-Vialaneix, Nathalie
- Subjects
Statistics - Machine Learning ,Computer Science - Learning ,Mathematics - Statistics Theory - Abstract
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations.
- Published
- 2015
10. Random forests for global sensitivity analysis: A selective review
- Author
-
Antoniadis, Anestis, Lambert-Lacroix, Sophie, and Poggi, Jean-Michel
- Published
- 2021
- Full Text
- View/download PDF
11. Clustering electricity consumers using high-dimensional regression mixture models
- Author
-
Devijver, Emilie, Goude, Yannig, and Poggi, Jean-Michel
- Subjects
Statistics - Applications - Abstract
Massive informations about individual (household, small and medium enterprise) consumption are now provided with new metering technologies and the smart grid. Two major exploitations of these data are load profiling and forecasting at different scales on the grid. Customer segmentation based on load classification is a natural approach for these purposes. We propose here a new methodology based on mixture of high-dimensional regression models. The novelty of our approach is that we focus on uncovering classes or clusters corresponding to different regression models. As a consequence, these classes could then be exploited for profiling as well as forecasting in each class or for bottom-up forecasts in a unified view. We consider a real dataset of Irish individual consumers of 4,225 meters, each with 48 half-hourly meter reads per day over 1 year: from 1st January 2010 up to 31st December 2010, to demonstrate the feasibility of our approach.
- Published
- 2015
12. A prediction interval for a function-valued forecast model
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, and Poggi, Jean-Michel
- Subjects
Statistics - Methodology ,Mathematics - Statistics Theory - Abstract
Starting from the information contained in the shape of the load curves, we have proposed a flexible nonparametric function-valued fore-cast model called KWF (Kernel+Wavelet+Functional) well suited to handle nonstationary series. The predictor can be seen as a weighted average of futures of past situations, where the weights increase with the similarity between the past situations and the actual one. In addi-tion, this strategy provides with a simultaneous multiple horizon pre-diction. These weights induce a probability distribution that can be used to produce bootstrap pseudo predictions. Prediction intervals are constructed after obtaining the corresponding bootstrap pseudo pre-diction residuals. We develop two propositions following directly the KWF strategy and compare it to two alternative ways coming from proposals of econometricians. They construct simultaneous prediction intervals using multiple comparison corrections through the control of the family wise error (FWE) or the false discovery rate. Alternatively, such prediction intervals can be constructed bootstrapping joint prob-ability regions. In this work we propose to obtain prediction intervals for the KWF model that are simultaneously valid for the H predic-tion horizons that corresponds with the corresponding path forecast, making a connection between functional time series and the econome-tricians' framework.
- Published
- 2014
13. An analytic journey in an industrial classification problem: How to use models to sharpen your questions.
- Author
-
Kenett, Ron S., Gotwalt, Chris, and Poggi, Jean‐Michel
- Subjects
RANDOM forest algorithms ,ELECTRONIC systems ,PARSIMONIOUS models ,TEST systems ,DATA analysis - Abstract
The mathematician and bio‐scientist Sam Karlin is quoted stating that "The purpose of models is not to fit the data but to sharpen the question". In this paper, we describe a journey between questions, models and data analysis to reach specific goals. This journey is typical in industrial, engineering, biology and social science applications. It contrasts regulated clinical research where a statistical analysis plan is declared before data collection. We consider random forests, ridge regression, lasso and elastic nets. To make our point, we use a case study of 63 sensors collected in the testing of an electronic system. The paper lists a sequence of questions and how they were tackled by statistical analysis to meet the analysis goal. Eventually, we were able to provide a robust parsimonious and effective model for predicting the system condition using a subset of the 63 sensors. In handling this problem, we develop and apply several innovative methods and insights that can prove useful in other contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Boosting diversity in regression ensembles.
- Author
-
Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, and Poggi, Jean‐Michel
- Subjects
BOOSTING algorithms ,ELECTRIC power consumption ,REGRESSION trees ,RANDOM forest algorithms ,ECONOMETRICS - Abstract
Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting‐based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade‐off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas‐Agnan, Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real‐world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. An analytic journey in an industrial classification problem: How to use models to sharpen your questions
- Author
-
Kenett, Ron S., primary, Gotwalt, Chris, additional, and Poggi, Jean‐Michel, additional
- Published
- 2023
- Full Text
- View/download PDF
16. Random Forests with R
- Author
-
Genuer, Robin, primary and Poggi, Jean-Michel, additional
- Published
- 2020
- Full Text
- View/download PDF
17. Random forest-based approach for physiological functional variable selection for driver’s stress level classification
- Author
-
El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, and Jaïdane, Mériem
- Published
- 2019
- Full Text
- View/download PDF
18. Random Forests for Big Data
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, and Villa-Vialaneix, Nathalie
- Published
- 2017
- Full Text
- View/download PDF
19. Automatic Component Selection in Additive Modeling of French National Electricity Load Forecasting
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Goude, Yannig, Poggi, Jean-Michel, Thouvenot, Vincent, Cao, Ricardo, editor, González Manteiga, Wenceslao, editor, and Romo, Juan, editor
- Published
- 2016
- Full Text
- View/download PDF
20. Sequential aggregation of heterogeneous experts for PM10 forecasting
- Author
-
Auder, Benjamin, Bobbia, Michel, Poggi, Jean-Michel, and Portier, Bruno
- Published
- 2016
- Full Text
- View/download PDF
21. A prediction interval for a function-valued forecast model: Application to load forecasting
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, and Poggi, Jean-Michel
- Published
- 2016
- Full Text
- View/download PDF
22. Influence measures and stability for graphical models
- Author
-
Bar-Hen, Avner and Poggi, Jean-Michel
- Published
- 2016
- Full Text
- View/download PDF
23. Spatial correction of low‐cost sensors observations for fusion of air quality measurements
- Author
-
Bobbia, Michel, primary, Poggi, Jean‐Michel, additional, and Portier, Bruno, additional
- Published
- 2022
- Full Text
- View/download PDF
24. Bodi: Boosting Diversity in Regression Ensembles
- Author
-
Goude, Yannig, primary, Bourel, Mathias, additional, Cugliari, Jairo, additional, and Poggi, Jean-Michel, additional
- Published
- 2022
- Full Text
- View/download PDF
25. Discussion of “Analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan”
- Author
-
Antoniadis, Anestis and Poggi, Jean-Michel
- Published
- 2015
- Full Text
- View/download PDF
26. Influence Measures for CART Classification Trees
- Author
-
Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
- Published
- 2015
- Full Text
- View/download PDF
27. A review of electric vehicle charging session open data
- Author
-
Amara-Ouali, Yvenn, primary, Massart, Pascal, additional, Poggi, Jean-Michel, additional, Goude, Yannig, additional, and Yan, Hui, additional
- Published
- 2021
- Full Text
- View/download PDF
28. Modelling the Intensity of Electric Vehicle Arrivals at Charging Points
- Author
-
Amara-Ouali, Yvenn, Goude, Yannig, and Poggi, Jean-Michel
- Abstract
With Electric Vehicles’ (EV) market adoption surging in recent years, the smart grid paradigm requires accurate forecasts of EV arrivals at charging points. One efficient way to model these arrivals is to use Point Processes. This study introduces an additive model using both spline and wavelet effects for fitting the intensity of a non-homogeneous Poisson process applied to EV arrivals at charging points. The key contribution of this work is a novel estimation procedure inspired from backfitting which is illustrated by a case study on real-world EV arrivals at charging points. The results obtained show that this approach can help better capturing EV arrival peaks.
- Published
- 2023
- Full Text
- View/download PDF
29. Air quality low-cost sensors and monitoring stations NO2 raw dataset in Rouen (France)
- Author
-
Thulliez, Emma, Portier, Bruno, Bobbia, Michel, and Poggi, Jean-Michel
- Published
- 2023
- Full Text
- View/download PDF
30. Boosting Diversity in Regression Ensembles
- Author
-
Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, Poggi, Jean-Michel, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, and Cugliari, Jairo
- Subjects
Diversity ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,Ensemble ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] ,Regression ,Boosting ,Trees - Abstract
The practical interest of using ensemble methods has been highlighted in several works. Aggregating predictors leads very often to improve the performance of a single one. A fruitful recipe is to generate several predictors from a single one by perturbing the learning set and, instead of selecting the best one, to aggregate them. Bagging, boosting and Random forests are examples of such strategies useful both for classification and regression problems. A key ingredient to properly analyse the improvement of prediction performance is the diversity of the predictors ensemble. In the regression case, aggregation is mainly interested on how to generate individual predictors to improve quadratic prediction performance. We look for enhancing these methods by using the concept of diversity (also known as negative correlation learning). We propose an algorithm to enrich the set of original individual predictors using a gradient boosting-based method by incorporating a diversity term to guide the gradient boosting iterations. The idea is to progressively generate predictors by boosting diversity, this modification induces some kind of suboptimality of the individual learners but improve the ensemble. Then, we establish a convergence result ensuring that the associated optimisation strategy converges to a global optimum. Finally, we show by means of numerical experiments the appropriateness of our procedure and examine not only the final predictor or the aggregated one but also the generated sequence. First, on a simulated dataset, we illustrate and study the method with respect to the family of predictors as well the parameters to be tuned (diversity weight and gradient step). Second, real-world electricity demand datasets are considered opening the application of such ideas to the forecasting context.
- Published
- 2020
31. A Review of Electric Vehicle Load Open Data and Models
- Author
-
Amara-Ouali, Yvenn, primary, Goude, Yannig, additional, Massart, Pascal, additional, Poggi, Jean-Michel, additional, and Yan, Hui, additional
- Published
- 2021
- Full Text
- View/download PDF
32. Random Forests with R
- Author
-
GENUER, Robin, POGGI, Jean-Michel, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Statistical Theory and Methods ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,021105 building & construction ,0211 other engineering and technologies ,02 engineering and technology ,010501 environmental sciences ,01 natural sciences ,0105 earth and related environmental sciences - Abstract
International audience; This book offers an application-oriented guide to random forests: a statistical learning method extensively used in many fields of application, thanks to its excellent predictive performance, but also to its flexibility, which places few restrictions on the nature of the data used. Indeed, random forests can be adapted to both supervised classification problems and regression problems. In addition, they allow us to consider qualitative and quantitative explanatory variables together, without pre-processing. Moreover, they can be used to process standard data for which the number of observations is higher than the number of variables, while also performing very well in the high dimensional case, where the number of variables is quite large in comparison to the number of observations. Consequently, they are now among the preferred methods in the toolbox of statisticians and data scientists. The book is primarily intended for students in academic fields such as statistical education, but also for practitioners in statistics and machine learning. A scientific undergraduate degree is quite sufficient to take full advantage of the concepts, methods, and tools discussed. In terms of computer science skills, little background knowledge is required, though an introduction to the R language is recommended.Random forests are part of the family of tree-based methods; accordingly, after an introductory chapter, Chapter 2 presents CART trees. The next three chapters are devoted to random forests. They focus on their presentation (Chapter 3), on the variable importance tool (Chapter 4), and on the variable selection problem (Chapter 5), respectively. After discussing the concepts and methods, we illustrate their implementation on a running example. Then, various complements are provided before examining additional examples. Throughout the book, each result is given together with the code (in R) that can be used to reproduce it. Thus, the book offers readers essential information and concepts, together with examples and the software tools needed to analyse data using random forests.
- Published
- 2020
33. Aggregation of Multi-Scale Experts for Bottom-Up Load Forecasting
- Author
-
Goehry, Benjamin, primary, Goude, Yannig, additional, Massart, Pascal, additional, and Poggi, Jean-Michel, additional
- Published
- 2020
- Full Text
- View/download PDF
34. Les forêts aléatoires avec R
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Université Paris Descartes - Paris 5 (UPD5), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Genuer, Robin
- Subjects
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2019
35. Self‐similarity analysis of vehicle driver's electrodermal activity
- Author
-
El Haouij, Neska, primary, Ghozi, Raja, additional, Poggi, Jean‐Michel, additional, Sevestre‐Ghalila, Sylvie, additional, and Jaïdane, Mériem, additional
- Published
- 2019
- Full Text
- View/download PDF
36. Clustering electricity consumers using high‐dimensional regression mixture models
- Author
-
Devijver, Emilie, primary, Goude, Yannig, additional, and Poggi, Jean‐Michel, additional
- Published
- 2019
- Full Text
- View/download PDF
37. Electricity Demand Forecasting
- Author
-
Cugliari, Jairo, primary and Poggi, Jean‐Michel, additional
- Published
- 2018
- Full Text
- View/download PDF
38. The ENBIS-17 Quality and Reliability Engineering International Special Issue
- Author
-
Krebs, Kristina, primary and Poggi, Jean-Michel, additional
- Published
- 2018
- Full Text
- View/download PDF
39. Arbres CART et Forêts aléatoires,Importance et sélection de variables
- Author
-
Genuer , Robin, Poggi , Jean-Michel, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Université Paris Descartes - Paris 5 (UPD5), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2 - Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED) - Institut National de la Santé et de la Recherche Médicale (INSERM) - Université Bordeaux Segalen - Bordeaux 2 - Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED) - Institut National de la Santé et de la Recherche Médicale (INSERM) - Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria) - Institut National de Recherche en Informatique et en Automatique (Inria), Université Paris Saclay, Statistics In System biology and Translational Medicine ( SISTM ), Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale ( INSERM ) -Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ), Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ), and Université Paris Descartes - Paris 5 ( UPD5 )
- Subjects
Big data ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,CART ,Forêts aléatoires ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,Sélection de variables ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Importance des variables - Abstract
Two algorithms proposed by Leo Breiman : CART trees (Classification And Regression Trees for) introduced in the first half of the 80s and random forests emerged, meanwhile, in the early 2000s, are the subject of this article. The goal is to provide each of the topics, a presentation, a theoretical guarantee, an example and some variants and extensions. After a preamble, introduction recalls objectives of classification and regression problems before retracing some predecessors of the Random Forests. Then, a section is devoted to CART trees then random forests are presented. Then, a variable selection procedure based on permutation variable importance is proposed. Finally the adaptation of random forests to the Big Data context is sketched.; Deux des algorithmes proposés par Leo Breiman : les arbres CART (pour Classification And Regression Trees) introduits dans la première moitié des années 80 et les forêts aléatoires apparues, quant à elles, au début des années 2000, font l'objet de cet article. L'objectif est de proposer sur chacun des thèmes abordés, un exposé, une garantie théorique, un exemple et signaler variantes et extensions. Après un préambule, l'introduction rappelle les objectifs des problèmes de classification et de régression avant de retracer quelques prédécesseurs des forêts aléatoires. Ensuite, une section est consa-crée aux arbres CART puis les forêts aléatoires sont présentées. Ensuite, une procédure de sélection de variables basée sur la quantification de l'importance des variables est proposée. Enfin l'adaptation des forêts aléatoires au contexte du Big Data est esquissée.
- Published
- 2017
40. Random Forest-Based Approach for Physiological Functional Variable Selection: Towards Driver's Stress Level Classification
- Author
-
El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, Jaïdane, Mériem, CEA-LinkLab, Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Unité de recherche Signaux et Systèmes [Tunis] (UR-U2S-ENIT), Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Université Paris Descartes - Paris 5 (UPD5), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Unité Signaux et Systèmes, Université de Tunis El Manar, 2092, Tunisia.-Ecole Nationale d'Ingénieurs de Tunis ( ENIT ), Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Université Paris Descartes - Paris 5 ( UPD5 ), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Unité Signaux et Systèmes de l'Ecole Nationale d'Ingénieurs de Tunis
- Subjects
Recursive feature elimination ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[ STAT.AP ] Statistics [stat]/Applications [stat.AP] ,Grouped variable importance ,[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processing ,Random forests ,Wavelets ,Physiological signals ,Functional data ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,[ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML] - Abstract
This paper is devoted to a statistical physiological functional variable selection for driver's stress level classification using random forests. Indeed, this study focuses on humans physiological changes, produced when driving in different urban routes, captured using portable sensors. Specifically, the electrodermal activity measured on two different locations: hand and foot, electromyogram, heart rate and respiration of ten driving experiments in three types of routes: rest area, city, and highway driving issued from drivedb database, available online on the PhysioNet website. Several studies were achieved on driver's stress level recognition using physiological signals. Classically, researchers extract expert-based features from physiological signals and select the most relevant ones for stress level recognition. This work provides a random forest-based method for the selection of physiological functional variables in order to classify the driver's stress level. On the methodological side, the contributions of this work are to consider physiological signals as functional variables, decomposed on wavelet basis and to offer a procedure of variable selection. On the applied side, the proposed method provides a " blind " procedure of driver's stress level classification performing as the expert-based study in terms of misclassification rate. It offers moreover a ranking of physiological variables according to their importance in stress level classification. The obtained results suggest that electromyogram and heart rate signals are not very relevant when compared to the electro-dermal and the respiration signals.
- Published
- 2017
41. Clustering electricity consumers using high‐dimensional regression mixture models.
- Author
-
Devijver, Emilie, Goude, Yannig, and Poggi, Jean‐Michel
- Subjects
REGRESSION analysis ,TIME series analysis ,SMART meters ,ELECTRICITY ,LOAD forecasting (Electric power systems) ,CONSUMERS - Abstract
A massive amount of data about individual electrical consumptions are now provided with new metering technologies and smart grids. These new data are especially useful for load profiling and load modeling at different scales of the electrical network. A new methodology based on mixture of high‐dimensional regression models is used to perform clustering of individual customers. It leads to uncovering clusters corresponding to different regression models. Temporal information is incorporated in order to prepare the next step, the fit of a forecasting model in each cluster. Only the electrical signal is involved, slicing the electrical signal into consecutive curves to consider it as a discrete time series of curves. Interpretation of the models is given on a real smart meter dataset of Irish customers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
42. Scalable Clustering of Individual Electrical Curves for Profiling and Bottom-Up Forecasting
- Author
-
Auder, Benjamin, primary, Cugliari, Jairo, additional, Goude, Yannig, additional, and Poggi, Jean-Michel, additional
- Published
- 2018
- Full Text
- View/download PDF
43. AffectiveROAD system and database to assess driver's attention
- Author
-
Haouij, Neska El, primary, Poggi, Jean-Michel, additional, Sevestre-Ghalila, Sylvie, additional, Ghozi, Raja, additional, and Jaïdane, Mériem, additional
- Published
- 2018
- Full Text
- View/download PDF
44. Random forest-based approach for physiological functional variable selection for driver’s stress level classification
- Author
-
El Haouij, Neska, primary, Poggi, Jean-Michel, additional, Ghozi, Raja, additional, Sevestre-Ghalila, Sylvie, additional, and Jaïdane, Mériem, additional
- Published
- 2018
- Full Text
- View/download PDF
45. A diploma of university (DU) ' Big Data Analyst ' in lifelong training, at level L3
- Author
-
Poggi, Jean-Michel, Bouveyron, Charles, Hebrail, Georges, Jollois, François-Xavier, Bouveyron, Charles, Université Paris Descartes - Paris 5 (UPD5), Laboratoire de Mathématiques d'Orsay (LMO), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), EDF R&D (EDF R&D), EDF (EDF), Université Paris Descartes - Paris 5 ( UPD5 ), Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), EDF R&D ( EDF R&D ), EDF ( EDF ), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Big Data ,formation continue ,[SHS.EDU]Humanities and Social Sciences/Education ,[ STAT.OT ] Statistics [stat]/Other Statistics [stat.ML] ,[SHS.EDU] Humanities and Social Sciences/Education ,[ SHS.EDU ] Humanities and Social Sciences/Education ,licence ,[STAT.OT]Statistics [stat]/Other Statistics [stat.ML] ,[STAT.OT] Statistics [stat]/Other Statistics [stat.ML] ,bachelor ,lifelong training ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] - Abstract
We present the diploma of university (DU) " Big Data Analyst " , starting this year and delivered by the STID department of IUT Paris Descartes. This 150-class-hour diploma is available for learners in lifelong training with at least an undergraduate level (L3 in France). It introduces an innovative way to certify essential skills in the emergent domain of Big Data. The diploma contains 5 modules. It is organized in two modules dedicated to computing methods, two models focused on statistical techniques, which give a good place to open data and social network analysis, and one module concerns with the crucial stakes of data quality and privacy. Another originality of this diploma is the strong incorporation of implementation tools, such that at least half of the teachers come from industry, Nous présentons le diplôme d'université (DU) Analyste Big Data, délivré depuis cette année par le département STID de l'IUT de l'université Paris Descartes. D'un volume global de 150h, réservé aux apprenants en formation continue courte, au niveau L3, il constitue une voie de diplomation originale dans ce domaine émergent. Constitué de 5 modules, le DU est articulé autour de deux modules plutôt dédiés aux méthodes informatiques, deux plutôt statistiques qui font la part belle aux données de type « open data » et à la fouille des réseaux sociaux, et un dernier module dédié aux enjeux cruciaux concernant la qualité et la confidentialité des données. Il s'agit d'orienter fortement vers la mise en oeuvre des outils liés à ce sujet émergent. Ainsi plus d'une moitié des inter-venants sont issus du monde économique et industriel, en collaboration avec une équipe acadé-mique mélangeant statisticiens et informaticiens..
- Published
- 2016
46. Random forests and big data
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, Villa-Vialaneix, Nathalie, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Santé Publique, d'Epidémiologie et de Développement (ISPED), Université Bordeaux Segalen - Bordeaux 2, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Jean Alexandre Dieudonné (LJAD), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA), Institut National de la Recherche Agronomique (INRA), Société Française de Statistique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Vialaneix, Nathalie, Institut National de Recherche en Informatique et en Automatique (Inria), Institut National de la Santé et de la Recherche Médicale (Inserm)-Université de Bordeaux, Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Sud - Paris 11 (UP11), Université Paris Descartes - Paris 5 (UPD5), COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), and Société Française de Statistique (SFdS). FRA.
- Subjects
Big Data ,forêts aléatoires ,big data ,data stream ,random forest ,flux de données ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Statistiques (Mathématiques) ,Random forests ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Data streams - Abstract
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involves massive data but it also often includes data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests, introduced by Breiman in 2001, are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems as well as two-class or multi-class classification problems. This paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks and sketch some alternative directions for random forests in the Big Data context., Le Big Data est un des grands défis que doit relever la statistique et a de nombreuses conséquences sur les plans théorique et algorithmique. Le Big Data implique toujours le caractère massif des donn ées mais comprend bien souvent aussi des données en flux (en ligne) et implique le traitement de données hétérogènes. Récemment certaines méthodes statistiques ont été adapt ées pour traiter le Big Data, par exemple les modèles de r égression linéaire, les méthodes de classification et les schémas de ré echantillonnage. Basées sur des arbres de d écision et exploitant les id ées d'agrégation et de bootstrap, les forêts al éatoires introduites par Breiman en 2001, sont une méthode statistique non paramétrique puissante et versatile permettant de prendre en compte dans un cadre unique tant les problemes de régression que les problèmes de classification binaire ou multi-classes. Ce papier examine les propositions disponibles de forêts aléatoires en environnement parallèle ainsi que sur les forêts aléatoires en ligne. Ensuite, nous formulons diverses remarques avant d'esquisser quelques directions alternatives pour les forêts aléatoires dans le contexte du Big Data.
- Published
- 2015
47. Electricity Forecasting Using Multi-Stage Estimators of Nonlinear Additive Models
- Author
-
Thouvenot, Vincent, primary, Pichavant, Audrey, additional, Goude, Yannig, additional, Antoniadis, Anestis, additional, and Poggi, Jean-Michel, additional
- Published
- 2016
- Full Text
- View/download PDF
48. Sélection de variables dans les modèles additifs avec des estimateurs en plusieurs étapes
- Author
-
Antoniadis, Anestis, Goude, Yannig, Poggi, Jean-Michel, Thouvenot, Vincent, Statistique Apprentissage Machine (SAM), Laboratoire Jean Kuntzmann (LJK), Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS), EDF R&D (EDF R&D), EDF (EDF), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Université d'Orsay, EDF R&D, Université Joseph Fourier, Université Cap Town, and Université Paris Descartes
- Subjects
Group LASSO ,Variable selection ,Sparse additive model ,OLS ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,Multi-step estimator ,[STAT]Statistics [stat] ,P-Spline ,B-Splines approximation ,GCV ,AIC ,BIC ,Consistency ,[MATH.MATH-NA]Mathematics [math]/Numerical Analysis [math.NA] - Abstract
In this document, we present some multi-step nonparametric estimators used for additive models, whose components are approximated by their series developments in B-splines. We assume that the number of covariates can be larger than the number of observations, but that the number of influent covariates is less than the number of observations. In our work, the fact that a covariate has a significant effect does not mean that the norm of the corresponding component is bounded below by a constant positive bound as it is usually assumed in this context, since we only request that norms of significant components to be bounded below by a bound that may decrease to zero at an appropriate speed. We focus on selection and estimation of sparse additive models in this asymptotic context. Our multi-step estimators combine least squares or P-Splines estimators with Group LASSO. We discuss several model selection criteria (AIC, GCV or BIC) and we establish the proofs of selection and estimation consistency of one of our estimators. The behaviour of the resulting estimators is illustrated via simulations.; Dans ce document, nous présentons des méthodes d'estimation non paramétrique en plusieurs étapes de modèles additifs dont les composantes sont approchées par leurs développements dans des bases de B-splines. Nous nous plaçons dans un contexte asymptotique dans lequel le nombre d'observations tend vers l'infini et le nombre de covariables candidates pour expliquer le modèle peut éventuellement être plus élevé que le nombre d'observations disponibles, mais pour lequel on suppose qu'il y a moins de covariables "influentes" que d'observations. Pour notre travail, la notion d'effet significatif d'une variable ne se traduit pas, comme il est habituel dans ce contexte, par une norme de la variable bornée inférieurement par une constante strictement positive, car nous supposons que la norme de chaque composante significative est minorée par une suite décroissante dépendant du nombre d'observations et pouvant tendre vers 0 asymptotiquement. Nous étudions ainsi les problèmes de sélection et d'estimation de modèles additifs creux. Nous combinons les techniques des moindres carrés ordinaires (MCO) ou les P-Splines avec le Group LASSO. Nous discutons aussi du choix du critère de sélection de modèle (AIC, GCV ou BIC). Nous établissons la la consistance en sélection et en estimation d'un de nos estimateurs, puis illustrons le bien fondé des méthodes développées par des simulations.
- Published
- 2015
49. Disaggregated electricity forecasting using wavelet-based clustering of individual consumers
- Author
-
Cugliari, Jairo, primary, Goude, Yannig, additional, and Poggi, Jean-Michel, additional
- Published
- 2016
- Full Text
- View/download PDF
50. Joint estimation and variable selection for mean and dispersion in proper dispersion models
- Author
-
Antoniadis, Anestis, primary, Gijbels, Irène, additional, Lambert-Lacroix, Sophie, additional, and Poggi, Jean-Michel, additional
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.