Author: "POGGI, JEAN-MICHEL" / Language: english - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"POGGI, JEAN-MICHEL"' showing total 106 results

Start Over Author "POGGI, JEAN-MICHEL" Language english

106 results on '"POGGI, JEAN-MICHEL"'

1. Random Forests for Time Series

Author: Goehry, Benjamin, Yan, Hui, Goude, Yannig, Massart, Pascal, and Poggi, Jean-Michel
Published: 2023
Full Text: View/download PDF

2. Spatial CART classification trees

Author: Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
Published: 2021
Full Text: View/download PDF

3. Random forests for global sensitivity analysis: A selective review

Author: Antoniadis, Anestis, Lambert-Lacroix, Sophie, and Poggi, Jean-Michel
Published: 2021
Full Text: View/download PDF

4. An analytic journey in an industrial classification problem: How to use models to sharpen your questions.

Author: Kenett, Ron S., Gotwalt, Chris, and Poggi, Jean‐Michel
Subjects: RANDOM forest algorithms, ELECTRONIC systems, PARSIMONIOUS models, TEST systems, DATA analysis
Abstract: The mathematician and bio‐scientist Sam Karlin is quoted stating that "The purpose of models is not to fit the data but to sharpen the question". In this paper, we describe a journey between questions, models and data analysis to reach specific goals. This journey is typical in industrial, engineering, biology and social science applications. It contrasts regulated clinical research where a statistical analysis plan is declared before data collection. We consider random forests, ridge regression, lasso and elastic nets. To make our point, we use a case study of 63 sensors collected in the testing of an electronic system. The paper lists a sequence of questions and how they were tackled by statistical analysis to meet the analysis goal. Eventually, we were able to provide a robust parsimonious and effective model for predicting the system condition using a subset of the 63 sensors. In handling this problem, we develop and apply several innovative methods and insights that can prove useful in other contexts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Boosting diversity in regression ensembles.

Author: Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, and Poggi, Jean‐Michel
Subjects: BOOSTING algorithms, ELECTRIC power consumption, REGRESSION trees, RANDOM forest algorithms, ECONOMETRICS
Abstract: Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting‐based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade‐off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas‐Agnan, Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real‐world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Random forest-based approach for physiological functional variable selection for driver’s stress level classification

Author: El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, and Jaïdane, Mériem
Published: 2019
Full Text: View/download PDF

7. Random Forests for Big Data

Author: Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, and Villa-Vialaneix, Nathalie
Published: 2017
Full Text: View/download PDF

8. Sequential aggregation of heterogeneous experts for PM10 forecasting

Author: Auder, Benjamin, Bobbia, Michel, Poggi, Jean-Michel, and Portier, Bruno
Published: 2016
Full Text: View/download PDF

9. A prediction interval for a function-valued forecast model: Application to load forecasting

Author: Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, and Poggi, Jean-Michel
Published: 2016
Full Text: View/download PDF

10. Influence measures and stability for graphical models

Author: Bar-Hen, Avner and Poggi, Jean-Michel
Published: 2016
Full Text: View/download PDF

11. Deployment of low-cost air quality sensors in Rouen: A dataset of one year of hourly concentrations of gas pollutants

Author: Thulliez, Emma, Portier, Bruno, Bobbia, Michel, and Poggi, Jean-Michel
Published: 2024
Full Text: View/download PDF

12. Discussion of “Analysis of spatio-temporal mobile phone data: a case study in the metropolitan area of Milan”

Author: Antoniadis, Anestis and Poggi, Jean-Michel
Published: 2015
Full Text: View/download PDF

13. Influence Measures for CART Classification Trees

Author: Bar-Hen, Avner, Gey, Servane, and Poggi, Jean-Michel
Published: 2015
Full Text: View/download PDF

14. SMOOTHING NON-EQUISPACED HEAVY NOISY DATA WITH WAVELETS

Author: Antoniadis, Anestis, Gijbels, Iréne, and Poggi, Jean-Michel
Published: 2009

15. Spatial correction of low‐cost sensors observations for fusion of air quality measurements.

Author: Bobbia, Michel, Poggi, Jean‐Michel, and Portier, Bruno
Subjects: AIR quality, AIR quality monitoring, DETECTORS
Abstract: The context for this article is the statistical fusion of several pollutant measurement networks: a reference one of fixed sensors of high quality and others of fixed or mobile micro‐sensors of heterogeneous quality. The challenge is to use together the measurements of such different networks to obtain a better air quality map. Since pollution maps are often obtained from the correction of numerical model outputs by the measurements provided by the monitoring stations of air quality networks, the quality of the reconstructed map may be improved by increasing the density of sensors by adding low‐cost micro‐sensors. A geostatistical approach is very often used for the fusion of measurements. But the first step is to correct micro‐sensors measures using those given by the reference sensors. Usually, this preprocessing is performed during an offline preliminary study for which reference and micro‐sensor are located at the same position, which does not allow to adapt quickly to changes and to cope with time‐related nonstationarities. We propose in this article to complement these approaches by a simple online spatial correction of micro‐sensors. The principle is to use the reference measurements to correct the network of micro‐sensors. More precisely, by kriging only the measurements from micro‐sensors, the reference measurements are estimated; allowing to calculate a correction by kriging the differences, finally applied to the micro‐sensors. Then one can iterate this fundamental sequence of steps. Numerical experiments exploring the proposed algorithm by simulation and an application to a real‐world dataset are provided. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

16. Air quality low-cost sensors and monitoring stations NO2 raw dataset in Rouen (France)

Author: Thulliez, Emma, Portier, Bruno, Bobbia, Michel, and Poggi, Jean-Michel
Published: 2023
Full Text: View/download PDF

17. Boosting Diversity in Regression Ensembles

Author: Bourel, Mathias, Cugliari, Jairo, Goude, Yannig, Poggi, Jean-Michel, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, and Cugliari, Jairo
Subjects: Diversity, [STAT.ME] Statistics [stat]/Methodology [stat.ME], Ensemble, [STAT.ME]Statistics [stat]/Methodology [stat.ME], Regression, Boosting, Trees
Abstract: The practical interest of using ensemble methods has been highlighted in several works. Aggregating predictors leads very often to improve the performance of a single one. A fruitful recipe is to generate several predictors from a single one by perturbing the learning set and, instead of selecting the best one, to aggregate them. Bagging, boosting and Random forests are examples of such strategies useful both for classification and regression problems. A key ingredient to properly analyse the improvement of prediction performance is the diversity of the predictors ensemble. In the regression case, aggregation is mainly interested on how to generate individual predictors to improve quadratic prediction performance. We look for enhancing these methods by using the concept of diversity (also known as negative correlation learning). We propose an algorithm to enrich the set of original individual predictors using a gradient boosting-based method by incorporating a diversity term to guide the gradient boosting iterations. The idea is to progressively generate predictors by boosting diversity, this modification induces some kind of suboptimality of the individual learners but improve the ensemble. Then, we establish a convergence result ensuring that the associated optimisation strategy converges to a global optimum. Finally, we show by means of numerical experiments the appropriateness of our procedure and examine not only the final predictor or the aggregated one but also the generated sequence. First, on a simulated dataset, we illustrate and study the method with respect to the family of predictors as well the parameters to be tuned (diversity weight and gradient step). Second, real-world electricity demand datasets are considered opening the application of such ideas to the forecasting context.
Published: 2020

18. Partial and Recombined Estimators for Nonlinear Additive Models

Author: Chèze, Nathalie, Poggi, Jean-Michel, and Portier, Bruno
Published: 2003
Full Text: View/download PDF

19. Multivariate denoising using wavelets and principal component analysis

Author: Aminghafari, Mina, Cheze, Nathalie, and Poggi, Jean-Michel
Published: 2006
Full Text: View/download PDF

20. Boosting and instability for regression trees

Author: Gey, Servane and Poggi, Jean-Michel
Published: 2006
Full Text: View/download PDF

21. Random Forest-Based Approach for Physiological Functional Variable Selection: Towards Driver's Stress Level Classification

Author: El Haouij, Neska, Poggi, Jean-Michel, Ghozi, Raja, Sevestre-Ghalila, Sylvie, Jaïdane, Mériem, CEA-LinkLab, Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Unité de recherche Signaux et Systèmes [Tunis] (UR-U2S-ENIT), Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Université Paris Descartes - Paris 5 (UPD5), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Unité Signaux et Systèmes, Université de Tunis El Manar, 2092, Tunisia.-Ecole Nationale d'Ingénieurs de Tunis ( ENIT ), Telnet Innovation Labs, Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Université Paris Descartes - Paris 5 ( UPD5 ), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Unité Signaux et Systèmes de l'Ecole Nationale d'Ingénieurs de Tunis
Subjects: Recursive feature elimination, [STAT.AP]Statistics [stat]/Applications [stat.AP], [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [ STAT.AP ] Statistics [stat]/Applications [stat.AP], Grouped variable importance, [ SPI.SIGNAL ] Engineering Sciences [physics]/Signal and Image processing, Random forests, Wavelets, Physiological signals, Functional data, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, [ STAT.ML ] Statistics [stat]/Machine Learning [stat.ML]
Abstract: This paper is devoted to a statistical physiological functional variable selection for driver's stress level classification using random forests. Indeed, this study focuses on humans physiological changes, produced when driving in different urban routes, captured using portable sensors. Specifically, the electrodermal activity measured on two different locations: hand and foot, electromyogram, heart rate and respiration of ten driving experiments in three types of routes: rest area, city, and highway driving issued from drivedb database, available online on the PhysioNet website. Several studies were achieved on driver's stress level recognition using physiological signals. Classically, researchers extract expert-based features from physiological signals and select the most relevant ones for stress level recognition. This work provides a random forest-based method for the selection of physiological functional variables in order to classify the driver's stress level. On the methodological side, the contributions of this work are to consider physiological signals as functional variables, decomposed on wavelet basis and to offer a procedure of variable selection. On the applied side, the proposed method provides a " blind " procedure of driver's stress level classification performing as the expert-based study in terms of misclassification rate. It offers moreover a ranking of physiological variables according to their importance in stress level classification. The obtained results suggest that electromyogram and heart rate signals are not very relevant when compared to the electro-dermal and the respiration signals.
Published: 2017

22. Aggregation of Multi-Scale Experts for Bottom-Up Load Forecasting.

Author: Goehry, Benjamin, Goude, Yannig, Massart, Pascal, and Poggi, Jean-Michel
Abstract: The development of smart grid and new advanced metering infrastructures induces new opportunities and challenges for utilities. Exploiting smart meters information for forecasting stands as a key point for energy providers who have to deal with time varying portfolio of customers as well as grid managers who needs to improve accuracy of local forecasts to face with distributed renewable energy generation development. We propose a new machine learning approach to forecast the system load of a group of customers exploiting individual load measurements in real time and/or exogenous information like weather and survey data. Our approach consists in building experts using random forests trained on some subsets of customers then normalise their predictions and aggregate them with a convex expert aggregation algorithm to forecast the system load. We propose new aggregation methods and compare two strategies for building subsets of customers: 1) hierarchical clustering based on survey data and/or load features and 2) random clustering strategy. These approaches are evaluated on a real data set of residential Irish customers load at a half hourly resolution. We show that our approaches achieve a significant gain in short term load forecasting accuracy of around 25 percent of RMSE. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

23. Clustering electricity consumers using high‐dimensional regression mixture models.

Author: Devijver, Emilie, Goude, Yannig, and Poggi, Jean‐Michel
Subjects: REGRESSION analysis, TIME series analysis, SMART meters, ELECTRICITY, LOAD forecasting (Electric power systems), CONSUMERS
Abstract: A massive amount of data about individual electrical consumptions are now provided with new metering technologies and smart grids. These new data are especially useful for load profiling and load modeling at different scales of the electrical network. A new methodology based on mixture of high‐dimensional regression models is used to perform clustering of individual customers. It leads to uncovering clusters corresponding to different regression models. Temporal information is incorporated in order to prepare the next step, the fit of a forecasting model in each cluster. Only the electrical signal is involved, slicing the electrical signal into consecutive curves to consider it as a discrete time series of curves. Interpretation of the models is given on a real smart meter dataset of Irish customers. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

24. Clustering electricity consumers using high-dimensional regression mixture models

Author: Devijver, Emilie, Goude, Yannig, Poggi, Jean-Michel, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), EDF R&D (EDF R&D), EDF (EDF), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
Subjects: FOS: Computer and information sciences, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], Applications (stat.AP), Statistics - Applications
Abstract: Massive informations about individual (household, small and medium enterprise) consumption are now provided with new metering technologies and the smart grid. Two major exploitations of these data are load profiling and forecasting at different scales on the grid. Customer segmentation based on load classification is a natural approach for these purposes. We propose here a new methodology based on mixture of high-dimensional regression models. The novelty of our approach is that we focus on uncovering classes or clusters corresponding to different regression models. As a consequence, these classes could then be exploited for profiling as well as forecasting in each class or for bottom-up forecasts in a unified view. We consider a real dataset of Irish individual consumers of 4,225 meters, each with 48 half-hourly meter reads per day over 1 year: from 1st January 2010 up to 31st December 2010, to demonstrate the feasibility of our approach.
Published: 2015

25. Mixture of linear regression models for short term PM10 forecasting in Haute Normandie (France)

Author: Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, Portier, Bruno, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Institut Universitaire de Technologie Paris Descartes (IUT - Paris Descartes), Université Paris Descartes - Paris 5 (UPD5), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Institut Universitaire de Technologie Paris Descartes ( IUT - Paris Descartes ), Université Paris Descartes - Paris 5 ( UPD5 ), Laboratoire de Mathématiques de l'INSA de Rouen Normandie ( LMI ), Institut national des sciences appliquées Rouen Normandie ( INSA Rouen Normandie ), Normandie Université ( NU ) -Normandie Université ( NU ), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Poggi, Jean-Michel, Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
Subjects: PM10, [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], Mixture of linear models, 62M10, 62P12, 62H30, Air quality, [ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], Particulate matter, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], [ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH], Forecasting
Abstract: Mixture of linear regression models is used for the short-term statistical forecasting of the daily mean PM10 concentration. Hourly concentrations of PM10 have been measured in three cities in Haute-Normandie (France): Rouen, Le Havre and Dieppe. The Haute-Normandie region is located at northwest of Paris, near the south side of Manche sea and is heavily industrialized. We consider six monitoring stations reflecting the diversity of situations: urban background, traffic, rural and industrial stations. We have focused our attention on recent data from 2007 to 2011. We forecast the daily mean PM10 concentration by modeling it as a mixture of linear regression models involving meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models but the corresponding predictions are considered for the test data, leading to realistic evaluations of forecasting performances, which are calculated through a leave-one-out scheme on the four years. We discuss in this paper several methodological issues including estimation schemes, introduction of the deterministic predictions of meteorological models and how to handle the forecasting at various horizons from some hours to one day ahead.
Published: 2013

26. Random forests and big data

Author: Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, Villa-Vialaneix, Nathalie, Statistics In System biology and Translational Medicine (SISTM), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)- Bordeaux population health (BPH), Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Bordeaux (UB)-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Santé Publique, d'Epidémiologie et de Développement (ISPED), Université Bordeaux Segalen - Bordeaux 2, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Jean Alexandre Dieudonné (LJAD), Université Nice Sophia Antipolis (1965 - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS)-Université Côte d'Azur (UCA), Unité de Mathématiques et Informatique Appliquées de Toulouse (MIAT INRA), Institut National de la Recherche Agronomique (INRA), Société Française de Statistique, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), Université Côte d'Azur (UCA)-Université Côte d'Azur (UCA)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Université Côte d'Azur (UCA)-Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), Vialaneix, Nathalie, Institut National de Recherche en Informatique et en Automatique (Inria), Institut National de la Santé et de la Recherche Médicale (Inserm)-Université de Bordeaux, Epidémiologie et Biostatistique [Bordeaux], Université Bordeaux Segalen - Bordeaux 2-Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)-Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris-Sud - Paris 11 (UP11), Université Paris Descartes - Paris 5 (UPD5), COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015 - 2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), and Société Française de Statistique (SFdS). FRA.
Subjects: Big Data, forêts aléatoires, big data, data stream, random forest, flux de données, [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], Statistiques (Mathématiques), Random forests, [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], Data streams
Abstract: Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involves massive data but it also often includes data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests, introduced by Breiman in 2001, are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems as well as two-class or multi-class classification problems. This paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks and sketch some alternative directions for random forests in the Big Data context., Le Big Data est un des grands défis que doit relever la statistique et a de nombreuses conséquences sur les plans théorique et algorithmique. Le Big Data implique toujours le caractère massif des donn ées mais comprend bien souvent aussi des données en flux (en ligne) et implique le traitement de données hétérogènes. Récemment certaines méthodes statistiques ont été adapt ées pour traiter le Big Data, par exemple les modèles de r égression linéaire, les méthodes de classification et les schémas de ré echantillonnage. Basées sur des arbres de d écision et exploitant les id ées d'agrégation et de bootstrap, les forêts al éatoires introduites par Breiman en 2001, sont une méthode statistique non paramétrique puissante et versatile permettant de prendre en compte dans un cadre unique tant les problemes de régression que les problèmes de classification binaire ou multi-classes. Ce papier examine les propositions disponibles de forêts aléatoires en environnement parallèle ainsi que sur les forêts aléatoires en ligne. Ensuite, nous formulons diverses remarques avant d'esquisser quelques directions alternatives pour les forêts aléatoires dans le contexte du Big Data.
Published: 2015

27. Self‐similarity analysis of vehicle driver's electrodermal activity.

Author: El Haouij, Neska, Ghozi, Raja, Poggi, Jean‐Michel, Sevestre‐Ghalila, Sylvie, and Jaïdane, Mériem
Subjects: FOREST measurement, BROWNIAN motion, PUBLIC spaces
Abstract: This paper characterizes stress levels via a self‐similarity analysis of the electrodermal activity (EDA) collected in a real‐world driving context. To characterize the EDA richness over scales, the fractional Brownian motion (FBM) process and its corresponding exponent H, estimated via a wavelet‐based approach, are used. Specifically, an automatic scale range selection is proposed in order to detect the linearity in a log scale diagram. The procedure is applied to the EDA signals, from the open database drivedb, originally captured on the foot and the hand of the drivers during a real‐world driving experiment, designed to evoke different levels of arousal and stress. The estimated Hurst exponent H offers a distinction in stress levels when driving in highway versus city, with a reference to restful state of minimal stress level. Specifically, the estimated H values tend to decrease when the driving environmental complexity increases. In addition, the estimated H values on the foot EDA signals allow a better characterization of the driving task than that of hand EDA. The self‐similarity analysis was applied to various physiological signals in literature but not to the EDA so far, a signal which was found to correlate most with human affect. The proposed analysis could be useful in real‐time monitoring of stress levels in urban driving spaces, among other applications. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

28. Influence functions for CART

Author: Bar Hen, Avner, Gey, Servane, Poggi, Jean-Michel, Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)
Subjects: [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST], [ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH]
Abstract: Preprint HAL; This paper deals with measuring the influence of observations on the results obtained with CART classification trees. To define the influence of individuals on the analysis, we use influence functions to propose some general criterions to measure the sensitivity of the CART analysis and its robustness. The proposals, based on jakknife trees, are organized around two lines: influence on predictions and influence on partitions. In addition, the analysis is extended to the pruned sequences of CART trees to produce a CART specific notion of influence. A numerical example, the well known spam dataset, is presented to illustrate the notions developed throughout the paper. A real dataset relating the administrative classification of cities surrounding Paris, France, to the characteristics of their tax revenues distribution, is finally analyzed using the new influence-based tools.
Published: 2014

29. Non parametric forecasting of a function-valued non stationary processes. Application to the electricity demand

Author: Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, Poggi, Jean-Michel, Université Joseph Fourier - Grenoble 1 (UJF), EDF (EDF), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Le Pennec, Erwan
Subjects: [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2012

30. Functional Clustering using Wavelets

Author: Antoniadis, Anestis, Brossat, Xavier, Cugliari, Jairo, Poggi, Jean-Michel, Université Joseph Fourier - Grenoble 1 (UJF), EDF R&D (EDF R&D), EDF (EDF), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Le Pennec, Erwan
Subjects: ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2012

31. Multistep Forecasting Non-Stationary Time Series using Wavelets and Kernel Smoothing

Author: Aminghafari, Mina, Poggi, Jean-Michel, Amirkabir University of Technology (AUT), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: Time series, MathematicsofComputing_NUMERICALANALYSIS, Kernel smoothing, [INFO]Computer Science [cs], Wavelets, Forecasting, Nonstationary
Abstract: International audience; The authors deal with forecasting nonstationary time series using wavelets and kernel smoothing. Starting from a basic forecasting procedure based on the regression of the process on the nondecimated Haar wavelet coefficients of the past, the procedure was extended in various directions, including the use of an arbitrary wavelet or polynomial fitting for extrapolating low-frequency components. The authors study a further generalization of the prediction procedure dealing with multistep forecasting and combining kernel smoothing and wavelets. They finally illustrate the proposed procedure on nonstationary simulated and real data and then compare it to well-known competitors.
Published: 2012

32. PM10 forecasting using mixture linear regression models

Author: Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, Portier, Bruno, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), and Le Pennec, Erwan
Subjects: [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2012

33. Forecasting time series through reconstructed multiple seasonal patterns using Empirical Mode Decomposition

Author: Mhamdi, Farouk, Jaidane, Mériem, Poggi, Jean-Michel, Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM), Unité Signaux et Systèmes, Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Le Pennec, Erwan, Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), and Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)
Subjects: [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2012

34. Detecting Influent Observations using CART Classification Trees. Application to the classification of the cities of Paris area

Author: Bar Hen, Avner, Gey, Servane, Poggi, Jean-Michel, Mathématiques Appliquées Paris 5 (MAP5 - UMR 8145), Université Paris Descartes - Paris 5 (UPD5)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Mathématiques Appliquées à Paris 5 ( MAP5 - UMR 8145 ), Université Paris Descartes - Paris 5 ( UPD5 ) -Institut National des Sciences Mathématiques et de leurs Interactions-Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Le Pennec, Erwan, and Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay
Subjects: [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], [ STAT.TH ] Statistics [stat]/Statistics Theory [stat.TH], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2011

35. Quantifying local and background contributions to PM10 concentrations in Haute-Normandie, using random forests

Author: Bobbia, Michel, Jollois, François-Xavier, Poggi, Jean-Michel, Portier, Bruno, Université Paris Descartes - Paris 5 (UPD5), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire de Mathématiques de l'INSA de Rouen Normandie (LMI), Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU), Université Paris Descartes - Paris 5 ( UPD5 ), Laboratoire de Mathématiques d'Orsay ( LM-Orsay ), Université Paris-Sud - Paris 11 ( UP11 ) -Centre National de la Recherche Scientifique ( CNRS ), Model selection in statistical learning ( SELECT ), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Laboratoire de Mathématiques d'Orsay ( LMO ), Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Université Paris-Saclay-Centre National de la Recherche Scientifique ( CNRS ) -Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques de l'INSA de Rouen Normandie ( LMI ), Institut national des sciences appliquées Rouen Normandie ( INSA Rouen Normandie ), Normandie Université ( NU ) -Normandie Université ( NU ), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), and Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
Subjects: [STAT.AP]Statistics [stat]/Applications [stat.AP], [ STAT.AP ] Statistics [stat]/Applications [stat.AP], ComputingMilieux_MISCELLANEOUS
Abstract: International audience
Published: 2011

36. Optimized Clusters for Disaggregated Electricity Load Forecasting

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, Poggi, Jean-Michel, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Le Pennec, Erwan
Subjects: [STAT.TH] Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], [MATH.MATH-ST] Mathematics [math]/Statistics [math.ST], ComputingMilieux_MISCELLANEOUS
Abstract: To account for the variation of EDF’s (the French electrical company) portfolio following the liberalization of the electrical market, it is essential to disaggregate the global load curve. The idea is to disaggregate the global signal in such a way that the sum of disaggregated forecasts significantly improves the prediction of the whole global signal. The strategy is to optimize, a preliminary clustering of individual load curves with respect to a predictability index. The optimized clustering procedure is controlled by a forecasting performance via a cross-prediction dissimilarity index. It can be assimilated to a discrete gradient type algorithm., REVSTAT-Statistical Journal, Vol. 8 No. 2 (2010): REVSTAT-Statistical Journal
Published: 2010

37. Empirical Mode Decomposition for Trend Extraction. Application to Electrical Data

Author: Mhamdi, Farouk, Jaidane, Meriem, Poggi, Jean-Michel, Unité Signaux et Systèmes, Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM)-Université de Tunis El Manar (UTM), Laboratoire de Mathématiques d'Orsay (LM-Orsay), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Model selection in statistical learning (SELECT), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, and Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Subjects: [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], ComputingMilieux_MISCELLANEOUS
Abstract: International audience; no abstract
Published: 2010

38. Random Forests: some methodological insights

Author: Genuer, Robin, Poggi, Jean-Michel, Tuleau, Christine, Laboratoire de Mathématiques d'Orsay (LM-Orsay), Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11), Model selection in statistical learning (SELECT), Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire Jean Alexandre Dieudonné (JAD), Université Nice Sophia Antipolis (... - 2019) (UNS), COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-COMUE Université Côte d'Azur (2015-2019) (COMUE UCA)-Centre National de la Recherche Scientifique (CNRS), INRIA, Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), Inria Saclay - Ile de France, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire de Mathématiques d'Orsay (LMO), Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-Centre National de la Recherche Scientifique (CNRS), and Université Nice Sophia Antipolis (1965 - 2019) (UNS)
Subjects: Random Forests, Variable Importance, [STAT.ML]Statistics [stat]/Machine Learning [stat.ML], [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST], Variable Selection, [STAT.TH]Statistics [stat]/Statistics Theory [stat.TH], Classification, Regression
Abstract: This paper examines from an experimental perspective random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001. It first aims at confirming, known but sparse, advice for using random forests and at proposing some complementary remarks for both standard problems as well as high dimensional ones for which the number of variables hugely exceeds the sample size. But the main contribution of this paper is twofold: to provide some insights about the behavior of the variable importance index based on random forests and in addition, to propose to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The strategy involves a ranking of explanatory variables using the random forests score of importance and a stepwise ascending variable introduction strategy.
Published: 2008

39. Disaggregated electricity forecasting using wavelet-based clustering of individual consumers.

Author: Cugliari, Jairo, Goude, Yannig, and Poggi, Jean-Michel
Published: 2016
Full Text: View/download PDF

40. Automatic Component Selection in Additive Modeling of French National Electricity Load Forecasting.

Author: Antoniadis, Anestis, Brossat, Xavier, Goude, Yannig, Poggi, Jean-Michel, and Thouvenot, Vincent
Published: 2016
Full Text: View/download PDF

41. Variable selection using random forests

Author: Genuer, Robin, Poggi, Jean-Michel, and Tuleau-Malot, Christine
Published: 2010
Full Text: View/download PDF

42. The ENBIS‐17 Quality and Reliability Engineering International Special Issue.

Author: Krebs, Kristina and Poggi, Jean‐Michel
Subjects: *BAYES' estimation, *K-means clustering, *CONTINGENCY tables
Abstract: An introduction is presented in which the editor discusses various articles within the issue on topics including Bayesian estimation approach for degradation process model; problem of clustering data streams using the k‐means algorithm; and detection of outlying rows in a contingency table.
Published: 2018
Full Text: View/download PDF

43. Electricity Forecasting Using Multi-Stage Estimators of Nonlinear Additive Models.

Author: Thouvenot, Vincent, Pichavant, Audrey, Goude, Yannig, Antoniadis, Anestis, and Poggi, Jean-Michel
Subjects: LOAD forecasting (Electric power systems), ELECTRIC utilities, NONLINEAR statistical models, ENERGY economics, TIME series analysis
Published: 2016
Full Text: View/download PDF

44. VSURF: An R Package for Variable Selection Using Random Forests.

Author: Genuer, Robin, Poggi, Jean-Michel, and Tuleau-Malot, Christine
Subjects: *PROGRAMMING languages, *REGRESSION analysis, *SUBSET selection
Abstract: This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the prediction objective. The two-stage strategy is based on a preliminary ranking of the explanatory variables using the random forests permutation-based score of importance and proceeds using a stepwise forward strategy for variable introduction. The two proposals can be obtained automatically using data-driven default values, good enough to provide interesting results, but strategy can also be tuned by the user. The algorithm is illustrated on a simulated example and its applications to real datasets are presented. [ABSTRACT FROM AUTHOR]
Published: 2015
Full Text: View/download PDF

45. Mixture of linear regression models for short term PM10 forecasting in Haute Normandie (France).

Author: Misiti, Michel, Misiti, Yves, Poggi, Jean-Michel, and Portier, Bruno
Subjects: REGRESSION analysis, PARTICULATE matter, MATHEMATICAL models of forecasting
Abstract: Forecasting PM10 concentrations accurately will all for improved early warning procedures, useful for safety reasons and opens for example the possibility to restrict circulation or to decide free public transportation. So the need of a statistical pollution forecasting tool from particulate matter is an important issue for the public authorities. Hourly concentrations of PM10 have been measured in three cities of Haute-Normandie (France): Rouen, Le Havre and Dieppe. The Haute-Normandie region is located at northwest of Paris, near the south side of Manche sea and is heavily industrialized. We consider six monitoring stations reflecting the diversity of situations. We have focused our attention on recent data from 2007 to 2011. We forecast the daily mean PM10 concentration by modeling it as a mixture of linear regression models involving meteorological predictors and the average concentration measured on the previous day. The values of observed meteorological variables are used for fitting the models while the corresponding predictions are considered for the test data, leading to realistic evaluations of forecasting performances, which are calculated through a leave-one-out scheme on the four years. We discuss in this paper several methodological issues including estimation schemes, introduction of the deterministic predictions of meteorological models and how to handle the forecasting at various horizons from some hours to one day ahead. [ABSTRACT FROM AUTHOR]
Published: 2015

46. A Guided Tour.

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
Published: 2007
Full Text: View/download PDF

47. From Wavelet Bases to the Fast Algorithm.

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
Published: 2007
Full Text: View/download PDF

48. The EZW Algorithm.

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
Published: 2007
Full Text: View/download PDF

49. Image Processing with Wavelets.

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
Published: 2007
Full Text: View/download PDF

50. Signal Denoising and Compression.

Author: Misiti, Michel, Misiti, Yves, Oppenheim, Georges, and Poggi, Jean-Michel
Published: 2007
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Publication Type

Journal

Region

Database

Publisher

106 results on '"POGGI, JEAN-MICHEL"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources