7 results on '"Petitjean, François"'
Search Results
2. Analyzing concept drift and shift from sample data.
- Author
-
Webb, Geoffrey I., Lee, Loong Kuan, Goethals, Bart, and Petitjean, François
- Subjects
MACHINE learning ,MARGINAL distributions ,AIRLINE schedules ,PRODUCTION scheduling ,DATA mining - Abstract
Concept drift and shift are major issues that greatly affect the accuracy and reliability of many real-world applications of machine learning. We propose a new data mining task, concept drift mapping—the description and analysis of instances of concept drift or shift. We argue that concept drift mapping is an essential prerequisite for tackling concept drift and shift. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift and shift in marginal distributions. We present quantitative concept drift mapping techniques, along with methods for visualizing their results. We illustrate their effectiveness for real-world applications across energy-pricing, vegetation monitoring and airline scheduling. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
3. Optimizing dynamic time warping’s window width for time series data mining applications.
- Author
-
Dau, Hoang Anh, Silva, Diego Furtado, Petitjean, François, Forestier, Germain, Bagnall, Anthony, Mueen, Abdullah, and Keogh, Eamonn
- Subjects
DATA mining ,TIME series analysis ,SUPERVISED learning ,DATA structures ,ALGORITHMS - Abstract
Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that “the larger the better”, and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a naïve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW’s warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated “low hanging fruit” in optimizing DTW’s performance can produce improvements that make it an even stronger baseline, closing most or all the improvement gap of the more sophisticated methods proposed in recent years. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
4. Faster and more accurate classification of time series by exploiting a novel dynamic time warping averaging algorithm.
- Author
-
Petitjean, François, Forestier, Germain, Webb, Geoffrey, Nicholson, Ann, Chen, Yanping, and Keogh, Eamonn
- Subjects
TIME series analysis ,DATA mining ,MATHEMATICAL models of time-varying systems ,ALGORITHM research ,CENTROID - Abstract
A concerted research effort over the past two decades has heralded significant improvements in both the efficiency and effectiveness of time series classification. The consensus that has emerged in the community is that the best solution is a surprisingly simple one. In virtually all domains, the most accurate classifier is the nearest neighbor algorithm with dynamic time warping as the distance measure. The time complexity of dynamic time warping means that successful deployments on resource-constrained devices remain elusive. Moreover, the recent explosion of interest in wearable computing devices, which typically have limited computational resources, has greatly increased the need for very efficient classification algorithms. A classic technique to obtain the benefits of the nearest neighbor algorithm, without inheriting its undesirable time and space complexity, is to use the nearest centroid algorithm. Unfortunately, the unique properties of (most) time series data mean that the centroid typically does not resemble any of the instances, an unintuitive and underappreciated fact. In this paper we demonstrate that we can exploit a recent result by Petitjean et al. to allow meaningful averaging of 'warped' time series, which then allows us to create super-efficient nearest 'centroid' classifiers that are at least as accurate as their more computationally challenged nearest neighbor relatives. We demonstrate empirically the utility of our approach by comparing it to all the appropriate strawmen algorithms on the ubiquitous UCR Benchmarks and with a case study in supporting insect classification on resource-constrained sensors. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
5. Spatio-temporal reasoning for the classification of satellite image time series
- Author
-
Petitjean, François, Kurtz, Camille, Passat, Nicolas, and Gançarski, Pierre
- Subjects
- *
REMOTE-sensing images , *IMAGE analysis , *TIME series analysis , *CLASSIFICATION , *SPATIAL analysis (Statistics) , *PIXELS , *SPATIOTEMPORAL processes - Abstract
Abstract: Satellite image time series (SITS) analysis is an important domain with various applications in land study. In the coming years, both high temporal and high spatial resolution SITS will become available. In the classical methodologies, SITS are studied by analyzing the radiometric evolution of the pixels with time. When dealing with high spatial resolution images, object-based approaches are generally used in order to exploit the spatial relationships of the data. However, these approaches require a segmentation step to provide contextual information about the pixels. Even if the segmentation of single images is widely studied, its generalization to series of images remains an open-issue. This article aims at providing both temporal and spatial analysis of SITS. We propose first segmenting each image of the series, and then using these segmentations in order to characterize each pixel of the data with a spatial dimension (i.e., with contextual information). Providing spatially characterized pixels, pixel-based temporal analysis can be performed. Experiments carried out with this methodology show the relevance of this approach and the significance of the resulting extracted patterns in the context of the analysis of SITS. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
6. DISCOVERING SIGNIFICANT EVOLUTION PATTERNS FROM SATELLITE IMAGE TIME SERIES.
- Author
-
PETITJEAN, FRANÇOIS, MASSEGLIA, FLORENT, GANÇARSKI, PIERRE, and FORESTIER, GERMAIN
- Subjects
- *
TIME series analysis , *REMOTE-sensing images , *DATA mining , *SEQUENTIAL pattern mining , *LAND cover , *ALGORITHMS , *EXPERIMENTS - Abstract
Satellite Image Time Series (SITS) provide us with precious information on land cover evolution. By studying these series of images we can both understand the changes of specific areas and discover global phenomena that spread over larger areas. Changes that can occur throughout the sensing time can spread over very long periods and may have different start time and end time depending on the location, which complicates the mining and the analysis of series of images. This work focuses on frequent sequential pattern mining (FSPM) methods, since this family of methods fits the above-mentioned issues. This family of methods consists of finding the most frequent evolution behaviors, and is actually able to extract long-term changes as well as short term ones, whenever the change may start and end. However, applying FSPM methods to SITS implies confronting two main challenges, related to the characteristics of SITS and the domain's constraints. First, satellite images associate multiple measures with a single pixel (the radiometric levels of different wavelengths corresponding to infra-red, red, etc.), which makes the search space multi-dimensional and thus requires specific mining algorithms. Furthermore, the non evolving regions, which are the vast majority and overwhelm the evolving ones, challenge the discovery of these patterns. We propose a SITS mining framework that enables discovery of these patterns despite these constraints and characteristics. Our proposal is inspired from FSPM and provides a relevant visualization principle. Experiments carried out on 35 images sensed over 20 years show the proposed approach makes it possible to extract relevant evolution behaviors. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
7. A global averaging method for dynamic time warping, with applications to clustering
- Author
-
Petitjean, François, Ketterlin, Alain, and Gançarski, Pierre
- Subjects
- *
AVERAGING method (Differential equations) , *GLOBAL analysis (Mathematics) , *CLUSTER analysis (Statistics) , *DATA mining , *SEQUENTIAL analysis , *ALGORITHMS , *ITERATIVE methods (Mathematics) , *PATTERN perception - Abstract
Abstract: Mining sequential data is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. Most works in this field are centred on the definition and use of a distance (or, at least, a similarity measure) between sequences of elements. A measure called dynamic time warping (DTW) seems to be currently the most relevant for a large panel of applications. This article is about the use of DTW in data mining algorithms, and focuses on the computation of an average of a set of sequences. Averaging is an essential tool for the analysis of data. For example, the K-means clustering algorithm repeatedly computes such an average, and needs to provide a description of the clusters it forms. Averaging is here a crucial step, which must be sound in order to make algorithms work accurately. When dealing with sequences, especially when sequences are compared with DTW, averaging is not a trivial task. Starting with existing techniques developed around DTW, the article suggests an analysis framework to classify averaging techniques. It then proceeds to study the two major questions lifted by the framework. First, we develop a global technique for averaging a set of sequences. This technique is original in that it avoids using iterative pairwise averaging. It is thus insensitive to ordering effects. Second, we describe a new strategy to reduce the length of the resulting average sequence. This has a favourable impact on performance, but also on the relevance of the results. Both aspects are evaluated on standard datasets, and the evaluation shows that they compare favourably with existing methods. The article ends by describing the use of averaging in clustering. The last section also introduces a new application domain, namely the analysis of satellite image time series, where data mining techniques provide an original approach. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.