106 results on '"POGGI, JEAN-MICHEL"'
Search Results
2. Spatial CART classification trees
- Author
-
Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
- Published
- 2021
- Full Text
- View/download PDF
3. Random forests for global sensitivity analysis: A selective review
- Author
-
Antoniadis, Anestis, Lambert-Lacroix, Sophie, and Poggi, Jean-Michel
- Published
- 2021
- Full Text
- View/download PDF
4. An analytic journey in an industrial classification problem: How to use models to sharpen your questions.
- Author
-
Kenett, Ron S., Gotwalt, Chris, and Poggi, Jean‐Michel
- Subjects
RANDOM forest algorithms ,ELECTRONIC systems ,PARSIMONIOUS models ,TEST systems ,DATA analysis - Abstract
The mathematician and bio‐scientist Sam Karlin is quoted stating that "The purpose of models is not to fit the data but to sharpen the question". In this paper, we describe a journey between questions, models and data analysis to reach specific goals. This journey is typical in industrial, engineering, biology and social science applications. It contrasts regulated clinical research where a statistical analysis plan is declared before data collection. We consider random forests, ridge regression, lasso and elastic nets. To make our point, we use a case study of 63 sensors collected in the testing of an electronic system. The paper lists a sequence of questions and how they were tackled by statistical analysis to meet the analysis goal. Eventually, we were able to provide a robust parsimonious and effective model for predicting the system condition using a subset of the 63 sensors. In handling this problem, we develop and apply several innovative methods and insights that can prove useful in other contexts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Boosting diversity in regression ensembles.
- Author
-
Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, and Poggi, Jean‐Michel
- Subjects
BOOSTING algorithms ,ELECTRIC power consumption ,REGRESSION trees ,RANDOM forest algorithms ,ECONOMETRICS - Abstract
Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting‐based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade‐off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas‐Agnan, Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real‐world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Random forest-based approach for physiological functional variable selection for driver’s stress level classification
- Author
-
El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, and Jaïdane, Mériem
- Published
- 2019
- Full Text
- View/download PDF
7. Random Forests for Big Data
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, and Villa-Vialaneix, Nathalie
- Published
- 2017
- Full Text
- View/download PDF
8. Sequential aggregation of heterogeneous experts for PM10 forecasting
- Author
-
Auder, Benjamin, Bobbia, Michel, Poggi, Jean-Michel, and Portier, Bruno
- Published
- 2016
- Full Text
- View/download PDF
9. A prediction interval for a function-valued forecast model: Application to load forecasting
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, and Poggi, Jean-Michel
- Published
- 2016
- Full Text
- View/download PDF
10. Influence measures and stability for graphical models
- Author
-
Bar-Hen, Avner and Poggi, Jean-Michel
- Published
- 2016
- Full Text
- View/download PDF
11. Deployment of low-cost air quality sensors in Rouen: A dataset of one year of hourly concentrations of gas pollutants
- Author
-
Thulliez, Emma, Portier, Bruno, Bobbia, Michel, and Poggi, Jean-Michel
- Published
- 2024
- Full Text
- View/download PDF
12. Discussion of “Analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan”
- Author
-
Antoniadis, Anestis and Poggi, Jean-Michel
- Published
- 2015
- Full Text
- View/download PDF
13. Influence Measures for CART Classification Trees
- Author
-
Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
- Published
- 2015
- Full Text
- View/download PDF
14. SMOOTHING NON-EQUISPACED HEAVY NOISY DATA WITH WAVELETS
- Author
-
Antoniadis, Anestis, Gijbels, Iréne, and Poggi, Jean-Michel
- Published
- 2009
15. Spatial correction of low‐cost sensors observations for fusion of air quality measurements.
- Author
-
Bobbia, Michel, Poggi, Jean‐Michel, and Portier, Bruno
- Subjects
AIR quality ,AIR quality monitoring ,DETECTORS - Abstract
The context for this article is the statistical fusion of several pollutant measurement networks: a reference one of fixed sensors of high quality and others of fixed or mobile micro‐sensors of heterogeneous quality. The challenge is to use together the measurements of such different networks to obtain a better air quality map. Since pollution maps are often obtained from the correction of numerical model outputs by the measurements provided by the monitoring stations of air quality networks, the quality of the reconstructed map may be improved by increasing the density of sensors by adding low‐cost micro‐sensors. A geostatistical approach is very often used for the fusion of measurements. But the first step is to correct micro‐sensors measures using those given by the reference sensors. Usually, this preprocessing is performed during an offline preliminary study for which reference and micro‐sensor are located at the same position, which does not allow to adapt quickly to changes and to cope with time‐related nonstationarities. We propose in this article to complement these approaches by a simple online spatial correction of micro‐sensors. The principle is to use the reference measurements to correct the network of micro‐sensors. More precisely, by kriging only the measurements from micro‐sensors, the reference measurements are estimated; allowing to calculate a correction by kriging the differences, finally applied to the micro‐sensors. Then one can iterate this fundamental sequence of steps. Numerical experiments exploring the proposed algorithm by simulation and an application to a real‐world dataset are provided. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
16. Air quality low-cost sensors and monitoring stations NO2 raw dataset in Rouen (France)
- Author
-
Thulliez, Emma, Portier, Bruno, Bobbia, Michel, and Poggi, Jean-Michel
- Published
- 2023
- Full Text
- View/download PDF
17. Boosting Diversity in Regression Ensembles
- Author
-
Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, Poggi, Jean-Michel, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, and Cugliari, Jairo
- Subjects
Diversity ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,Ensemble ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] ,Regression ,Boosting ,Trees - Abstract
The practical interest of using ensemble methods has been highlighted in several works. Aggregating predictors leads very often to improve the performance of a single one. A fruitful recipe is to generate several predictors from a single one by perturbing the learning set and, instead of selecting the best one, to aggregate them. Bagging, boosting and Random forests are examples of such strategies useful both for classification and regression problems. A key ingredient to properly analyse the improvement of prediction performance is the diversity of the predictors ensemble. In the regression case, aggregation is mainly interested on how to generate individual predictors to improve quadratic prediction performance. We look for enhancing these methods by using the concept of diversity (also known as negative correlation learning). We propose an algorithm to enrich the set of original individual predictors using a gradient boosting-based method by incorporating a diversity term to guide the gradient boosting iterations. The idea is to progressively generate predictors by boosting diversity, this modification induces some kind of suboptimality of the individual learners but improve the ensemble. Then, we establish a convergence result ensuring that the associated optimisation strategy converges to a global optimum. Finally, we show by means of numerical experiments the appropriateness of our procedure and examine not only the final predictor or the aggregated one but also the generated sequence. First, on a simulated dataset, we illustrate and study the method with respect to the family of predictors as well the parameters to be tuned (diversity weight and gradient step). Second, real-world electricity demand datasets are considered opening the application of such ideas to the forecasting context.
- Published
- 2020
18. Partial and Recombined Estimators for Nonlinear Additive Models
- Author
-
Chèze, Nathalie, Poggi, Jean-Michel, and Portier, Bruno
- Published
- 2003
- Full Text
- View/download PDF
19. Multivariate denoising using wavelets and principal component analysis
- Author
-
Aminghafari, Mina, Cheze, Nathalie, and Poggi, Jean-Michel
- Published
- 2006
- Full Text
- View/download PDF
20. Boosting and instability for regression trees
- Author
-
Gey, Servane and Poggi, Jean-Michel
- Published
- 2006
- Full Text
- View/download PDF
21. Random Forest-Based Approach for Physiological Functional Variable Selection: Towards Driver's Stress Level Classification
- Author
-
El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, Jaïdane, Mériem, CEA-LinkLab, Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Unité de recherche Signaux et Systèmes [Tunis] (UR-U2S-ENIT), Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Université Paris Descartes - Paris 5 (UPD5), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Unité Signaux et Systèmes, Université de Tunis El Manar, 2092, Tunisia.-Ecole Nationale d'Ingénieurs de Tunis ( ENIT ), Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Université Paris Descartes - Paris 5 ( UPD5 ), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Unité Signaux et Systèmes de l'Ecole Nationale d'Ingénieurs de Tunis
- Subjects
Recursive feature elimination ,[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[ STAT.AP ] Statistics [stat]/Applications [stat.AP] ,Grouped variable importance ,[ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processing ,Random forests ,Wavelets ,Physiological signals ,Functional data ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,[ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML] - Abstract
This paper is devoted to a statistical physiological functional variable selection for driver's stress level classification using random forests. Indeed, this study focuses on humans physiological changes, produced when driving in different urban routes, captured using portable sensors. Specifically, the electrodermal activity measured on two different locations: hand and foot, electromyogram, heart rate and respiration of ten driving experiments in three types of routes: rest area, city, and highway driving issued from drivedb database, available online on the PhysioNet website. Several studies were achieved on driver's stress level recognition using physiological signals. Classically, researchers extract expert-based features from physiological signals and select the most relevant ones for stress level recognition. This work provides a random forest-based method for the selection of physiological functional variables in order to classify the driver's stress level. On the methodological side, the contributions of this work are to consider physiological signals as functional variables, decomposed on wavelet basis and to offer a procedure of variable selection. On the applied side, the proposed method provides a " blind " procedure of driver's stress level classification performing as the expert-based study in terms of misclassification rate. It offers moreover a ranking of physiological variables according to their importance in stress level classification. The obtained results suggest that electromyogram and heart rate signals are not very relevant when compared to the electro-dermal and the respiration signals.
- Published
- 2017
22. Aggregation of Multi-Scale Experts for Bottom-Up Load Forecasting.
- Author
-
Goehry, Benjamin, Goude, Yannig, Massart, Pascal, and Poggi, Jean-Michel
- Abstract
The development of smart grid and new advanced metering infrastructures induces new opportunities and challenges for utilities. Exploiting smart meters information for forecasting stands as a key point for energy providers who have to deal with time varying portfolio of customers as well as grid managers who needs to improve accuracy of local forecasts to face with distributed renewable energy generation development. We propose a new machine learning approach to forecast the system load of a group of customers exploiting individual load measurements in real time and/or exogenous information like weather and survey data. Our approach consists in building experts using random forests trained on some subsets of customers then normalise their predictions and aggregate them with a convex expert aggregation algorithm to forecast the system load. We propose new aggregation methods and compare two strategies for building subsets of customers: 1) hierarchical clustering based on survey data and/or load features and 2) random clustering strategy. These approaches are evaluated on a real data set of residential Irish customers load at a half hourly resolution. We show that our approaches achieve a significant gain in short term load forecasting accuracy of around 25 percent of RMSE. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
23. Clustering electricity consumers using high‐dimensional regression mixture models.
- Author
-
Devijver, Emilie, Goude, Yannig, and Poggi, Jean‐Michel
- Subjects
REGRESSION analysis ,TIME series analysis ,SMART meters ,ELECTRICITY ,LOAD forecasting (Electric power systems) ,CONSUMERS - Abstract
A massive amount of data about individual electrical consumptions are now provided with new metering technologies and smart grids. These new data are especially useful for load profiling and load modeling at different scales of the electrical network. A new methodology based on mixture of high‐dimensional regression models is used to perform clustering of individual customers. It leads to uncovering clusters corresponding to different regression models. Temporal information is incorporated in order to prepare the next step, the fit of a forecasting model in each cluster. Only the electrical signal is involved, slicing the electrical signal into consecutive curves to consider it as a discrete time series of curves. Interpretation of the models is given on a real smart meter dataset of Irish customers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
24. Clustering electricity consumers using high-dimensional regression mixture models
- Author
-
Devijver, Emilie, Goude, Yannig, Poggi, Jean-Michel, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), EDF R&D (EDF R&D), EDF (EDF), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Applications (stat.AP) ,Statistics - Applications - Abstract
Massive informations about individual (household, small and medium enterprise) consumption are now provided with new metering technologies and the smart grid. Two major exploitations of these data are load profiling and forecasting at different scales on the grid. Customer segmentation based on load classification is a natural approach for these purposes. We propose here a new methodology based on mixture of high-dimensional regression models. The novelty of our approach is that we focus on uncovering classes or clusters corresponding to different regression models. As a consequence, these classes could then be exploited for profiling as well as forecasting in each class or for bottom-up forecasts in a unified view. We consider a real dataset of Irish individual consumers of 4,225 meters, each with 48 half-hourly meter reads per day over 1 year: from 1st January 2010 up to 31st December 2010, to demonstrate the feasibility of our approach.
- Published
- 2015
25. Mixture of linear regression models for short term PM10 forecasting in Haute Normandie (France)
- Author
-
Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, Portier, Bruno, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Institut Universitaire de Technologie Paris Descartes (IUT - Paris Descartes), Université Paris Descartes - Paris 5 (UPD5), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Institut Universitaire de Technologie Paris Descartes ( IUT - Paris Descartes ), Université Paris Descartes - Paris 5 ( UPD5 ), Laboratoire de Mathématiques de l'INSA de Rouen Normandie ( LMI ), Institut national des sciences appliquées Rouen Normandie ( INSA Rouen Normandie ), Normandie Université ( NU ) -Normandie Université ( NU ), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Poggi, Jean-Michel, Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
- Subjects
PM10 ,[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Mixture of linear models ,62M10, 62P12, 62H30 ,Air quality ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,Particulate matter ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH] ,Forecasting - Abstract
Mixture of linear regression models is used for the short-term statistical forecasting of the daily mean PM10 concentration. Hourly concentrations of PM10 have been measured in three cities in Haute-Normandie (France): Rouen, Le Havre and Dieppe. The Haute-Normandie region is located at northwest of Paris, near the south side of Manche sea and is heavily industrialized. We consider six monitoring stations reflecting the diversity of situations: urban background, traffic, rural and industrial stations. We have focused our attention on recent data from 2007 to 2011. We forecast the daily mean PM10 concentration by modeling it as a mixture of linear regression models involving meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models but the corresponding predictions are considered for the test data, leading to realistic evaluations of forecasting performances, which are calculated through a leave-one-out scheme on the four years. We discuss in this paper several methodological issues including estimation schemes, introduction of the deterministic predictions of meteorological models and how to handle the forecasting at various horizons from some hours to one day ahead.
- Published
- 2013
26. Random forests and big data
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, Villa-Vialaneix, Nathalie, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Santé Publique, d'Epidémiologie et de Développement (ISPED), Université Bordeaux Segalen - Bordeaux 2, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Jean Alexandre Dieudonné (LJAD), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA), Institut National de la Recherche Agronomique (INRA), Société Française de Statistique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Vialaneix, Nathalie, Institut National de Recherche en Informatique et en Automatique (Inria), Institut National de la Santé et de la Recherche Médicale (Inserm)-Université de Bordeaux, Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Sud - Paris 11 (UP11), Université Paris Descartes - Paris 5 (UPD5), COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), and Société Française de Statistique (SFdS). FRA.
- Subjects
Big Data ,forêts aléatoires ,big data ,data stream ,random forest ,flux de données ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Statistiques (Mathématiques) ,Random forests ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Data streams - Abstract
Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involves massive data but it also often includes data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests, introduced by Breiman in 2001, are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems as well as two-class or multi-class classification problems. This paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks and sketch some alternative directions for random forests in the Big Data context., Le Big Data est un des grands défis que doit relever la statistique et a de nombreuses conséquences sur les plans théorique et algorithmique. Le Big Data implique toujours le caractère massif des donn ées mais comprend bien souvent aussi des données en flux (en ligne) et implique le traitement de données hétérogènes. Récemment certaines méthodes statistiques ont été adapt ées pour traiter le Big Data, par exemple les modèles de r égression linéaire, les méthodes de classification et les schémas de ré echantillonnage. Basées sur des arbres de d écision et exploitant les id ées d'agrégation et de bootstrap, les forêts al éatoires introduites par Breiman en 2001, sont une méthode statistique non paramétrique puissante et versatile permettant de prendre en compte dans un cadre unique tant les problemes de régression que les problèmes de classification binaire ou multi-classes. Ce papier examine les propositions disponibles de forêts aléatoires en environnement parallèle ainsi que sur les forêts aléatoires en ligne. Ensuite, nous formulons diverses remarques avant d'esquisser quelques directions alternatives pour les forêts aléatoires dans le contexte du Big Data.
- Published
- 2015
27. Self‐similarity analysis of vehicle driver's electrodermal activity.
- Author
-
El Haouij, Neska, Ghozi, Raja, Poggi, Jean‐Michel, Sevestre‐Ghalila, Sylvie, and Jaïdane, Mériem
- Subjects
FOREST measurement ,BROWNIAN motion ,PUBLIC spaces - Abstract
This paper characterizes stress levels via a self‐similarity analysis of the electrodermal activity (EDA) collected in a real‐world driving context. To characterize the EDA richness over scales, the fractional Brownian motion (FBM) process and its corresponding exponent H, estimated via a wavelet‐based approach, are used. Specifically, an automatic scale range selection is proposed in order to detect the linearity in a log scale diagram. The procedure is applied to the EDA signals, from the open database drivedb, originally captured on the foot and the hand of the drivers during a real‐world driving experiment, designed to evoke different levels of arousal and stress. The estimated Hurst exponent H offers a distinction in stress levels when driving in highway versus city, with a reference to restful state of minimal stress level. Specifically, the estimated H values tend to decrease when the driving environmental complexity increases. In addition, the estimated H values on the foot EDA signals allow a better characterization of the driving task than that of hand EDA. The self‐similarity analysis was applied to various physiological signals in literature but not to the EDA so far, a signal which was found to correlate most with human affect. The proposed analysis could be useful in real‐time monitoring of stress levels in urban driving spaces, among other applications. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
28. Influence functions for CART
- Author
-
Bar Hen, Avner, Gey, Servane, Poggi, Jean-Michel, Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH] - Abstract
Preprint HAL; This paper deals with measuring the influence of observations on the results obtained with CART classification trees. To define the influence of individuals on the analysis, we use influence functions to propose some general criterions to measure the sensitivity of the CART analysis and its robustness. The proposals, based on jakknife trees, are organized around two lines: influence on predictions and influence on partitions. In addition, the analysis is extended to the pruned sequences of CART trees to produce a CART specific notion of influence. A numerical example, the well known spam dataset, is presented to illustrate the notions developed throughout the paper. A real dataset relating the administrative classification of cities surrounding Paris, France, to the characteristics of their tax revenues distribution, is finally analyzed using the new influence-based tools.
- Published
- 2014
29. Non parametric forecasting of a function-valued non stationary processes. Application to the electricity demand
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, Poggi, Jean-Michel, Université Joseph Fourier - Grenoble 1 (UJF), EDF (EDF), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Le Pennec, Erwan
- Subjects
[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; no abstract
- Published
- 2012
30. Functional Clustering using Wavelets
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, Poggi, Jean-Michel, Université Joseph Fourier - Grenoble 1 (UJF), EDF R&D (EDF R&D), EDF (EDF), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Le Pennec, Erwan
- Subjects
ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2012
31. Multistep Forecasting Non-Stationary Time Series using Wavelets and Kernel Smoothing
- Author
-
Aminghafari, Mina, Poggi, Jean-Michel, Amirkabir University of Technology (AUT), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
Time series ,MathematicsofComputing_NUMERICALANALYSIS ,Kernel smoothing ,[INFO]Computer Science [cs] ,Wavelets ,Forecasting ,Nonstationary - Abstract
International audience; The authors deal with forecasting nonstationary time series using wavelets and kernel smoothing. Starting from a basic forecasting procedure based on the regression of the process on the nondecimated Haar wavelet coefficients of the past, the procedure was extended in various directions, including the use of an arbitrary wavelet or polynomial fitting for extrapolating low-frequency components. The authors study a further generalization of the prediction procedure dealing with multistep forecasting and combining kernel smoothing and wavelets. They finally illustrate the proposed procedure on nonstationary simulated and real data and then compare it to well-known competitors.
- Published
- 2012
32. PM10 forecasting using mixture linear regression models
- Author
-
Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, Portier, Bruno, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Le Pennec, Erwan
- Subjects
[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; no abstract
- Published
- 2012
33. Forecasting time series through reconstructed multiple seasonal patterns using Empirical Mode Decomposition
- Author
-
Mhamdi, Farouk, Jaidane, Mériem, Poggi, Jean-Michel, Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM), Unité Signaux et Systèmes, Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Le Pennec, Erwan, Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; no abstract
- Published
- 2012
34. Detecting Influent Observations using CART Classification Trees. Application to the classification of the cities of Paris area
- Author
-
Bar Hen, Avner, Gey, Servane, Poggi, Jean-Michel, Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Le Pennec, Erwan, and Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay
- Subjects
[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,[ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; no abstract
- Published
- 2011
35. Quantifying local and background contributions to PM10 concentrations in Haute-Normandie, using random forests
- Author
-
Bobbia, Michel, Jollois, François-Xavier, Poggi, Jean-Michel, Portier, Bruno, Université Paris Descartes - Paris 5 (UPD5), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Université Paris Descartes - Paris 5 ( UPD5 ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques de l'INSA de Rouen Normandie ( LMI ), Institut national des sciences appliquées Rouen Normandie ( INSA Rouen Normandie ), Normandie Université ( NU ) -Normandie Université ( NU ), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,[ STAT.AP ] Statistics [stat]/Applications [stat.AP] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2011
36. Optimized Clusters for Disaggregated Electricity Load Forecasting
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, Poggi, Jean-Michel, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Le Pennec, Erwan
- Subjects
[STAT.TH] Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,ComputingMilieux_MISCELLANEOUS - Abstract
To account for the variation of EDF’s (the French electrical company) portfolio following the liberalization of the electrical market, it is essential to disaggregate the global load curve. The idea is to disaggregate the global signal in such a way that the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The strategy is to optimize, a preliminary clustering of individual load curves with respect to a predictability index. The optimized clustering procedure is controlled by a forecasting performance via a cross-prediction dissimilarity index. It can be assimilated to a discrete gradient type algorithm., REVSTAT-Statistical Journal, Vol. 8 No. 2 (2010): REVSTAT-Statistical Journal
- Published
- 2010
37. Empirical Mode Decomposition for Trend Extraction. Application to Electrical Data
- Author
-
Mhamdi, Farouk, Jaidane, Meriem, Poggi, Jean-Michel, Unité Signaux et Systèmes, Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
- Subjects
[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience; no abstract
- Published
- 2010
38. Random Forests: some methodological insights
- Author
-
Genuer, Robin, Poggi, Jean-Michel, Tuleau, Christine, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), INRIA, Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Université Nice Sophia Antipolis (1965 - 2019) (UNS)
- Subjects
Random Forests ,Variable Importance ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,Variable Selection ,[STAT.TH]Statistics [stat]/Statistics Theory [stat.TH] ,Classification ,Regression - Abstract
This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.
- Published
- 2008
39. Disaggregated electricity forecasting using wavelet-based clustering of individual consumers.
- Author
-
Cugliari, Jairo, Goude, Yannig, and Poggi, Jean-Michel
- Published
- 2016
- Full Text
- View/download PDF
40. Automatic Component Selection in Additive Modeling of French National Electricity Load Forecasting.
- Author
-
Antoniadis, Anestis, Brossat, Xavier, Goude, Yannig, Poggi, Jean-Michel, and Thouvenot, Vincent
- Published
- 2016
- Full Text
- View/download PDF
41. Variable selection using random forests
- Author
-
Genuer, Robin, Poggi, Jean-Michel, and Tuleau-Malot, Christine
- Published
- 2010
- Full Text
- View/download PDF
42. The ENBIS‐17 Quality and Reliability Engineering International Special Issue.
- Author
-
Krebs, Kristina and Poggi, Jean‐Michel
- Subjects
- *
BAYES' estimation , *K-means clustering , *CONTINGENCY tables - Abstract
An introduction is presented in which the editor discusses various articles within the issue on topics including Bayesian estimation approach for degradation process model; problem of clustering data streams using the k‐means algorithm; and detection of outlying rows in a contingency table.
- Published
- 2018
- Full Text
- View/download PDF
43. Electricity Forecasting Using Multi-Stage Estimators of Nonlinear Additive Models.
- Author
-
Thouvenot, Vincent, Pichavant, Audrey, Goude, Yannig, Antoniadis, Anestis, and Poggi, Jean-Michel
- Subjects
LOAD forecasting (Electric power systems) ,ELECTRIC utilities ,NONLINEAR statistical models ,ENERGY economics ,TIME series analysis - Published
- 2016
- Full Text
- View/download PDF
44. VSURF: An R Package for Variable Selection Using Random Forests.
- Author
-
Genuer, Robin, Poggi, Jean-Michel, and Tuleau-Malot, Christine
- Subjects
- *
PROGRAMMING languages , *REGRESSION analysis , *SUBSET selection - Abstract
This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
45. Mixture of linear regression models for short term PM10 forecasting in Haute Normandie (France).
- Author
-
Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, and Portier, Bruno
- Subjects
REGRESSION analysis ,PARTICULATE matter ,MATHEMATICAL models of forecasting - Abstract
Forecasting PM10 concentrations accurately will all for improved early warning procedures, useful for safety reasons and opens for example the possibility to restrict circulation or to decide free public transportation. So the need of a statistical pollution forecasting tool from particulate matter is an important issue for the public authorities. Hourly concentrations of PM
10 have been measured in three cities of Haute-Normandie (France): Rouen, Le Havre and Dieppe. The Haute-Normandie region is located at northwest of Paris, near the south side of Manche sea and is heavily industrialized. We consider six monitoring stations reflecting the diversity of situations. We have focused our attention on recent data from 2007 to 2011. We forecast the daily mean PM10 concentration by modeling it as a mixture of linear regression models involving meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models while the corresponding predictions are considered for the test data, leading to realistic evaluations of forecasting performances, which are calculated through a leave-one-out scheme on the four years. We discuss in this paper several methodological issues including estimation schemes, introduction of the deterministic predictions of meteorological models and how to handle the forecasting at various horizons from some hours to one day ahead. [ABSTRACT FROM AUTHOR]- Published
- 2015
46. A Guided Tour.
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
- Published
- 2007
- Full Text
- View/download PDF
47. From Wavelet Bases to the Fast Algorithm.
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
- Published
- 2007
- Full Text
- View/download PDF
48. The EZW Algorithm.
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
- Published
- 2007
- Full Text
- View/download PDF
49. Image Processing with Wavelets.
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
- Published
- 2007
- Full Text
- View/download PDF
50. Signal Denoising and Compression.
- Author
-
Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.