32 results on '"Mozharovskyi, Pavlo"'
Search Results
2. Approximate computation of projection depths
- Author
-
Dyckerhoff, Rainer, Mozharovskyi, Pavlo, and Nagy, Stanislav
- Published
- 2021
- Full Text
- View/download PDF
3. On Exact Computation of Tukey Depth Central Regions.
- Author
-
Fojtík, Vít, Laketa, Petra, Mozharovskyi, Pavlo, and Nagy, Stanislav
- Subjects
POINT set theory ,ALGORITHMS ,COMPUTATIONAL geometry ,C++ ,QUANTILES ,K-means clustering - Abstract
The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the d-dimensional space whose Tukey depth exceeds given thresholds k. We address the problem of fast and exact computation of those central regions. First, we analyze an efficient Algorithm (A) from Liu, Mosler, and Mozharovskyi, and prove that it yields exact results in dimension d = 2, or for a low threshold k in arbitrary dimension. We provide examples where Algorithm (A) fails to recover the exact Tukey depth region for d > 2, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Statistical Process Monitoring of Artificial Neural Networks.
- Author
-
Malinovskaya, Anna, Mozharovskyi, Pavlo, and Otto, Philipp
- Subjects
- *
ARTIFICIAL neural networks , *QUALITY control charts , *ARTIFICIAL intelligence , *SUPERVISED learning , *MACHINE learning - Abstract
The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider artificial neural networks (ANNs), the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the ANN provides accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose considering the latent feature representation of the data (called "embedding") generated by the ANN to determine the time when the data stream starts being nonstationary. In particular, we monitor embeddings by applying multivariate control charts based on the data depth calculation and normalized ranks. The performance of the introduced method is compared with benchmark approaches for various ANN architectures and different underlying data formats. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Exact computation of the halfspace depth
- Author
-
Dyckerhoff, Rainer and Mozharovskyi, Pavlo
- Published
- 2016
- Full Text
- View/download PDF
6. On exact computation of Tukey depth central regions
- Author
-
Fojtík, Vít, Laketa, Petra, Mozharovskyi, Pavlo, and Nagy, Stanislav
- Subjects
FOS: Computer and information sciences ,62-08, 62H12, 62G05 ,Statistics - Computation ,Computation (stat.CO) - Abstract
The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the $d$-dimensional space whose Tukey depth exceeds given thresholds $k$. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm A from Liu et al. (2019), and prove that it yields exact results in dimension $d=2$, or for a low threshold $k$ in arbitrary dimension. We provide examples where Algorithm A fails to recover the exact Tukey depth region for $d>2$, and propose a modification that is guaranteed to be exact. We express the problem of computing the exact central region in its dual formulation, and use that viewpoint to demonstrate that further substantial improvements to our algorithm are unlikely. An efficient C++ implementation of our exact algorithm is freely available in the R package TukeyRegion.
- Published
- 2022
7. Classifying real-world data with the D D α -procedure
- Author
-
Mozharovskyi, Pavlo, Mosler, Karl, and Lange, Tatjana
- Published
- 2015
- Full Text
- View/download PDF
8. Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications
- Author
-
Goibert, Morgane, Cl��men��on, St��phan, Irurozki, Ekhine, and Mozharovskyi, Pavlo
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i.e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say. As it sheds light onto only one aspect of $\Sigma$'s distribution $P$, it may neglect other informative features. It is the purpose of this paper to define analogs of quantiles, ranks and statistical procedures based on such quantities for the analysis of ranking data by means of a metric-based notion of depth function on the symmetric group. Overcoming the absence of vector space structure on $\mathfrak{S}_n$, the latter defines a center-outward ordering of the permutations in the support of $P$ and extends the classic metric-based formulation of consensus ranking (medians corresponding then to the deepest permutations). The axiomatic properties that ranking depths should ideally possess are listed, while computational and generalization issues are studied at length. Beyond the theoretical analysis carried out, the relevance of the novel concepts and methods introduced for a wide variety of statistical tasks are also supported by numerous numerical experiments.
- Published
- 2022
9. A Framework to Learn with Interpretation
- Author
-
Parekh, Jayneel, Mozharovskyi, Pavlo, d'Alché-Buc, Florence, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Institut Polytechnique de Paris (IP Paris), paper funded by DSAIDIS, paper funded by DSAIDIS chair, ANR-20-CE23-0028,LIMPID,Exploitation de machines interprétables pour l'amélioration des performances et la prise de décision(2020), Parekh, Jayneel, and Exploitation de machines interprétables pour l'amélioration des performances et la prise de décision - - LIMPID2020 - ANR-20-CE23-0028 - AAPG2020 - VALID
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,[INFO]Computer Science [cs] ,[INFO] Computer Science [cs] ,Machine Learning (cs.LG) - Abstract
International audience; To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.
- Published
- 2021
10. Fast nonparametric classification based on data depth
- Author
-
Lange, Tatjana, Mosler, Karl, and Mozharovskyi, Pavlo
- Published
- 2014
- Full Text
- View/download PDF
11. Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis
- Author
-
Staerman, Guillaume, Mozharovskyi, Pavlo, and Cl��men��on, St��phan
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,Machine Learning (cs.LG) - Abstract
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e.g. inference, hypothesis testing). Whereas many depth functions have been proposed \textit{ad-hoc} in the literature since the seminal contribution of \cite{Tukey75}, not all of them possess the properties desirable to emulate the notion of quantile function for univariate probability distributions. In this paper, we propose an extension of the \textit{integrated rank-weighted} statistical depth (IRW depth in abbreviated form) originally introduced in \cite{IRW}, modified in order to satisfy the property of \textit{affine-invariance}, fulfilling thus all the four key axioms listed in the nomenclature elaborated by \cite{ZuoS00a}. The variant we propose, referred to as the Affine-Invariant IRW depth (AI-IRW in short), involves the covariance/precision matrices of the (supposedly square integrable) $d$-dimensional random vector $X$ under study, in order to take into account the directions along which $X$ is most variable to assign a depth value to any point $x\in \mathbb{R}^d$. The accuracy of the sampling version of the AI-IRW depth is investigated from a nonasymptotic perspective. Namely, a concentration result for the statistical counterpart of the AI-IRW depth is proved. Beyond the theoretical analysis carried out, applications to anomaly detection are considered and numerical results are displayed, providing strong empirical evidence of the relevance of the depth function we propose here.
- Published
- 2021
12. Youthful and age-related matreotypes predict drugs promoting longevity
- Author
-
Statzer, Cyril, Jongsma, Elisabeth, Liu, Sean X., Dakhovnik, Alexander, Wandrey, Franziska, Mozharovskyi, Pavlo, Zülli, Fred, and Ewald, Collin Y.
- Subjects
Pharmacology ,Matrisome ,Aging ,CMap ,Longevity ,Drug repurposing ,Collagen ,Extracellular matrix ,GTEx ,Geroprotector - Abstract
The identification and validation of drugs that promote health during aging (‘geroprotectors’) is key to the retardation or prevention of chronic age-related diseases. Here we found that most of the established pro-longevity compounds shown to extend lifespan in model organisms also alter extracellular matrix gene expression (i.e., matrisome) in human cell lines. To harness this novel observation, we used age-stratified human transcriptomes to define the age-related matreotype, which represents the matrisome gene expression pattern associated with age. Using a ‘youthful’ matreotype, we screened in silico for geroprotective drug candidates. To validate drug candidates, we developed a novel tool using prolonged collagen expression as a non-invasive and in-vivo surrogate marker for C. elegans longevity. With this reporter, we were able to eliminate false positive drug candidates and determine the appropriate dose for extending the lifespan of C. elegans. We improved drug uptake for one of our predicted compounds, genistein, and reconciled previous contradictory reports of its effects on longevity. We identified and validated new compounds, tretinoin, chondroitin sulfate, and hyaluronic acid, for their ability to restore age-related decline of collagen homeostasis and increase lifespan. Thus, our innovative drug screening approach - employing extracellular matrix homeostasis - facilitates the discovery of pharmacological interventions promoting healthy aging., bioRxiv
- Published
- 2021
- Full Text
- View/download PDF
13. When OT meets MoM: Robust estimation of Wasserstein Distance
- Author
-
Staerman, Guillaume, Laforgue, Pierre, Mozharovskyi, Pavlo, d'Alch��-Buc, Florence, Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Télécom Paris, and Staerman, Guillaume
- Subjects
FOS: Computer and information sciences ,[STAT]Statistics [stat] ,Computer Science - Machine Learning ,Statistics - Machine Learning ,Machine Learning (stat.ML) ,[INFO]Computer Science [cs] ,[MATH] Mathematics [math] ,[INFO] Computer Science [cs] ,[MATH]Mathematics [math] ,Machine Learning (cs.LG) ,[STAT] Statistics [stat] - Abstract
International audience; Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations. In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage Medians of Means (MoM) estimators to robustify the estimation of Wasserstein distance. Exploiting the dual Kantorovitch formulation of Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. These MoM estimators enable to make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST. Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
- Published
- 2020
14. The Area of the Convex Hull of Sampled Curves: a Robust Functional Statistical Depth Measure
- Author
-
Staerman, Guillaume, Mozharovskyi, Pavlo, Clémençon, Stéphan, Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, and Institut Polytechnique de Paris (IP Paris)
- Subjects
[STAT]Statistics [stat] ,[INFO]Computer Science [cs] ,[MATH]Mathematics [math] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2020
15. Choosing among notions of multivariate depth statistics
- Author
-
Mosler, Karl and Mozharovskyi, Pavlo
- Subjects
Methodology (stat.ME) ,FOS: Computer and information sciences ,Statistics and Probability ,General Mathematics ,Primary 62H05, 62H30, secondary 62-07 ,Statistics, Probability and Uncertainty ,Statistics - Methodology - Abstract
Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed.
- Published
- 2020
16. Functional Isolation Forest
- Author
-
Staerman, Guillaume, Mozharovskyi, Pavlo, Clémençon, Stephan, d'Alché-Buc, Florence, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, and Institut Polytechnique de Paris (IP Paris)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,isolation forest ,Machine Learning (stat.ML) ,Anomaly detection ,unsupervised learning ,Machine Learning (cs.LG) ,functional data analysis - Abstract
International audience; For the purpose of monitoring the behavior of complex infrastructures (e.g. aircrafts, transport or energy networks), high-rate sensors are deployed to capture multivariate data, generally unlabeled, in quasi continuous-time to detect quickly the occurrence of anomalies that may jeopardize the smooth operation of the system of interest. The statistical analysis of such massive data of functional nature raises many challenging methodological questions. The primary goal of this paper is to extend the popular Isolation Forest (IF) approach to Anomaly Detection, originally dedicated to finite dimensional observations, to functional data. The major difficulty lies in the wide variety of topological structures that may equip a space of functions and the great variety of patterns that may characterize abnormal curves. We address the issue of (randomly) splitting the functional space in a flexible manner in order to isolate progressively any trajectory from the others, a key ingredient to the efficiency of the algorithm. Beyond a detailed description of the algorithm, computational complexity and stability issues are investigated at length. From the scoring function measuring the degree of abnormality of an observation provided by the proposed variant of the IF algorithm, a Functional Statistical Depth function is defined and discussed as well as a multivariate functional extension. Numerical experiments provide strong empirical evidence of the accuracy of the extension proposed.
- Published
- 2019
17. Depth for Curve Data and Applications.
- Author
-
de Micheaux, Pierre Lafaye, Mozharovskyi, Pavlo, and Vimond, Myriam
- Subjects
- *
DIFFUSION tensor imaging , *STATISTICS , *PROBABILITY measures , *COMPUTER-assisted image analysis (Medicine) , *HANDWRITING recognition (Computer science) , *NONPARAMETRIC statistics - Abstract
In 1975, John W. Tukey defined statistical data depth as a function that determines the centrality of an arbitrary point with respect to a data cloud or to a probability measure. During the last decades, this seminal idea of data depth evolved into a powerful tool proving to be useful in various fields of science. Recently, extending the notion of data depth to the functional setting attracted a lot of attention among theoretical and applied statisticians. We go further and suggest a notion of data depth suitable for data represented as curves, or trajectories, which is independent of the parameterization. We show that our curve depth satisfies theoretical requirements of general depth functions that are meaningful for trajectories. We apply our methodology to diffusion tensor brain images and also to pattern recognition of handwritten digits and letters. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
18. Composite marginal likelihood estimation of spatial autoregressive probit models feasible in very large samples
- Author
-
Mozharovskyi, Pavlo and Vogler, Jan
- Published
- 2016
- Full Text
- View/download PDF
19. Nonparametric Imputation by Data Depth.
- Author
-
Mozharovskyi, Pavlo, Josse, Julie, and Husson, François
- Subjects
- *
DATA distribution , *FORECASTING , *DATA - Abstract
We present single imputation method for missing values which borrows the idea of data depth—a measure of centrality defined for an arbitrary point of a space with respect to a probability distribution or data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. For each single iteration, imputation reverts to optimization of quadratic, linear, or quasiconcave functions that are solved analytically by linear programming or the Nelder–Mead method. As it accounts for the underlying data topology, the procedure is distribution free, allows imputation close to the data geometry, can make prediction in situations where local imputation (k-nearest neighbors, random forest) cannot, and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that a special case—when using the Mahalanobis depth—has direct connection to well-known methods for the multivariate normal model, such as iterated regression and regularized PCA. The methodology is extended to multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies show good results compared with existing popular alternatives. The method has been implemented as an R-package. for the article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
20. Statistical inference for the Russell measure of technical efficiency.
- Author
-
Badunenko, Oleg and Mozharovskyi, Pavlo
- Subjects
INFERENTIAL statistics ,DATA envelopment analysis - Abstract
Data envelopment analysis (DEA) has become a popular approach to nonparametric efficiency measurement. The statistical inference using bootstrap methods is readily available for the radial DEA estimator; however it is missing for the Russell measure, the nonradial DEA estimator. We propose a bootstrap based procedure for making statistical inference about the individual Russell measures of technical efficiency. We perform simulations to examine finite sample properties of the proposed estimator. Finally, we present an empirical study using proposed bootstrap procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
21. Tukey depth: linear programming and applications
- Author
-
Mozharovskyi, Pavlo, Institut de Recherche Mathématique de Rennes ( IRMAR ), Université de Rennes 1 ( UR1 ), Université de Rennes ( UNIV-RENNES ) -Université de Rennes ( UNIV-RENNES ) -AGROCAMPUS OUEST-École normale supérieure - Rennes ( ENS Rennes ) -Institut National de Recherche en Informatique et en Automatique ( Inria ) -Institut National des Sciences Appliquées ( INSA ) -Université de Rennes 2 ( UR2 ), Université de Rennes ( UNIV-RENNES ) -Centre National de la Recherche Scientifique ( CNRS ), Laboratoire de Mathématiques Appliquées Agrocampus ( LMA2 ), AGROCAMPUS OUEST, Institut de Recherche Mathématique de Rennes (IRMAR), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Université de Rennes 2 (UR2), Université de Rennes (UNIV-RENNES)-École normale supérieure - Rennes (ENS Rennes)-Centre National de la Recherche Scientifique (CNRS)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA), Laboratoire de Mathématiques Appliquées Agrocampus (LMA2), Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Guillemer, Marie-Annick, Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), and Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-École normale supérieure - Rennes (ENS Rennes)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-INSTITUT AGRO Agrocampus Ouest
- Subjects
FOS: Computer and information sciences ,Breadthfirst search algorithm ,[MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] ,cone segmentation ,exact computation ,Tukey depth ,linear programming ,simplex algorithm ,[ MATH.MATH-ST ] Mathematics [math]/Statistics [math.ST] ,[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST] ,Statistics - Computation ,Computation (stat.CO) - Abstract
Determining the representativeness of a point within a data cloud has recently become a desirable task in multivariate analysis. The concept of statistical depth function, which reflects centrality of an arbitrary point, appears to be useful and has been studied intensively during the last decades. Here the issue of exact computation of the classical Tukey data depth is addressed. The paper suggests an algorithm that exploits connection between the Tukey depth and linear separability and is based on iterative application of linear programming. The algorithm further develops the idea of the cone segmentation of the Euclidean space and allows for efficient implementation due to the special search structure. The presentation is complemented by relationship to similar concepts and examples of application.
- Published
- 2016
22. Classifying real-world data with the DDalpha-procedure
- Author
-
Mozharovskyi, Pavlo, Mosler, Karl, Lange, Tatjana, Universität zu Köln, and Hochschule Merseburg
- Subjects
[STAT.AP]Statistics [stat]/Applications [stat.AP] ,ComputingMilieux_MISCELLANEOUS - Abstract
International audience
- Published
- 2015
- Full Text
- View/download PDF
23. Fast Computation of Tukey Trimmed Regions and Median in Dimension p > 2.
- Author
-
Liu, Xiaohui, Mosler, Karl, and Mozharovskyi, Pavlo
- Subjects
DIMENSIONS ,COMPUTATIONAL geometry ,MULTIVARIATE analysis ,CENTROID ,POINT set theory ,ALGORITHMS - Abstract
Given data in , a Tukey κ-trimmed region is the set of all points that have at least Tukey depth κ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension p > 2. We construct two novel algorithms to compute a Tukey κ-trimmed region, a naïve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the naïve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median. for this article are available online. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
24. Fast computation of Tukey trimmed regions and median in dimension $p>2$
- Author
-
Liu, Xiaohui, Mosler, Karl, and Mozharovskyi, Pavlo
- Subjects
Statistics::Theory ,Mathematics::Logic ,62F10, 62F35 ,Statistics::Methodology ,Mathematics::General Topology ,Statistics - Computation ,Statistics::Computation - Abstract
Given data in $\mathbb{R}^{p}$, a Tukey $\kappa$-trimmed region is the set of all points that have at least Tukey depth $\kappa$ w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension $p > 2$. We construct two novel algorithms to compute a Tukey $\kappa$-trimmed region, a na\"{i}ve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the na\"{i}ve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median.
- Published
- 2014
25. Classifying real-world data with the $DD\alpha$-procedure
- Author
-
Mozharovskyi, Pavlo, Mosler, Karl, and Lange, Tatjana
- Subjects
Statistics - Applications ,Statistics - Methodology - Abstract
The $DD\alpha$-classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $\alpha$-procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions' shape best and is most robust, `outsiders', that is data points having zero depth in all classes, need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $DD\alpha$-procedure is available as an R-package.
- Published
- 2014
26. Fast nonparametric classification based on data depth
- Author
-
Lange, Tatjana, Mosler, Karl, and Mozharovskyi, Pavlo
- Subjects
Nichtparametrisches Verfahren ,Alpha-procedure ,pattern recognition ,ddc:330 ,DD-plot ,misclassification rate ,zonoid depth ,Clusteranalyse ,supervised learning ,Theorie - Abstract
A new procedure, called DD-procedure, is developed to solve the problem of classifying d-dimensional objects into q Ï 2 classes. The procedure is completely nonparametric; it uses q-dimensional depth plots and a very efficient algorithm for discrimination analysis in the depth space [0, 1]q . Specifically, the depth is the zonoid depth, and the algorithm is the procedure. In case of more than two classes several binary classifications are performed and a majority rule is applied. Special treatments are discussed for outsiders, that is, data having zero depth vector. The DD-classifier is applied to simulated as well as real data, and the results are compared with those of similar procedures that have been recently proposed. In most cases the new procedure has comparable error rates, but is much faster than other classification approaches, including the SVM.
- Published
- 2012
27. Fast DD-classification of functional data.
- Author
-
Mosler, Karl and Mozharovskyi, Pavlo
- Subjects
SUPERVISED learning ,BLOWING up (Algebraic geometry) ,NONPARAMETRIC statistics ,SIMULATION methods & models ,BAYES' estimation ,MATHEMATICAL models - Abstract
A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional space. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the DD-plot, which is a subset of the unit square. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification in $$[0,1]^2$$ . The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik-Chervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as by a benchmark study. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
28. Nonparametric frontier analysis using Stata.
- Author
-
Badunenko, Oleg and Mozharovskyi, Pavlo
- Subjects
- *
STOCHASTIC frontier analysis , *INFERENTIAL statistics , *LINEAR programming - Abstract
In this article, we describe five new Stata commands that fit and provide statistical inference in nonparametric frontier models. The tenonradial and teradial commands fit data envelopment models where nonradial and radial technical efficiency measures are computed (Färe, 1998, Fundamentals of Production Theory; Färe and Lovell, 1978, Journal of Economic Theory 19: 150-162; Färe, Grosskopf, and Lovell, 1994a, Production Frontiers). Technical efficiency measures are obtained by solving linear programming problems. The teradialbc, nptestind, and nptestrts commands provide tools for making statistical inference regarding radial technical efficiency measures (Simar and Wilson, 1998, Management Science 44: 49-61; 2000, Journal of Applied Statistics 27: 779-802; 2002, European Journal of Operational Research 139: 115-132). We provide a brief overview of the nonparametric efficiency measurement, and we describe the syntax and options of the new commands. Additionally, we provide an example showing the capabilities of the new commands. Finally, we perform a small empirical study of productivity growth. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
29. DDα-Classification of Asymmetric and Fat-Tailed Data.
- Author
-
Lange, Tatjana, Mosler, Karl, and Mozharovskyi, Pavlo
- Published
- 2014
- Full Text
- View/download PDF
30. The Alpha-Procedure: A Nonparametric Invariant Method for Automatic Classification of Multi-Dimensional Objects.
- Author
-
Lange, Tatjana and Mozharovskyi, Pavlo
- Published
- 2014
- Full Text
- View/download PDF
31. Classifying real-world data with the $${ DD}\alpha $$ -procedure.
- Author
-
Mozharovskyi, Pavlo, Mosler, Karl, and Lange, Tatjana
- Abstract
The $${ DD}\alpha $$ -classifier, a nonparametric fast and very robust procedure, is described and applied to fifty classification problems regarding a broad spectrum of real-world data. The procedure first transforms the data from their original property space into a depth space, which is a low-dimensional unit cube, and then separates them by a projective invariant procedure, called $$\alpha $$ -procedure. To each data point the transformation assigns its depth values with respect to the given classes. Several alternative depth notions (spatial depth, Mahalanobis depth, projection depth, and Tukey depth, the latter two being approximated by univariate projections) are used in the procedure, and compared regarding their average error rates. With the Tukey depth, which fits the distributions' shape best and is most robust, 'outsiders', that is data points having zero depth in all classes, appear. They need an additional treatment for classification. Evidence is also given about the dimension of the extended feature space needed for linear separation. The $${ DD}\alpha $$ -procedure is available as an R-package. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
32. The alpha-procedure: a nonparametric invariant method for automatic classification of multi-dimensional objects
- Author
-
Tatjana Lange, Pavlo Mozharovskyi, Hochschule Merseburg, Institut Polytechnique de Paris (IP Paris), Département Images, Données, Signal (IDS), Télécom ParisTech, Signal, Statistique et Apprentissage (S2A), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, and Mozharovskyi, Pavlo
- Subjects
Multivariate statistics ,021103 operations research ,[STAT.ME] Statistics [stat]/Methodology [stat.ME] ,Computer science ,business.industry ,Feature vector ,0211 other engineering and technologies ,Nonparametric statistics ,Pattern recognition ,02 engineering and technology ,Linear discriminant analysis ,01 natural sciences ,Linear subspace ,010104 statistics & probability ,Hyperplane ,Multi dimensional ,Artificial intelligence ,0101 mathematics ,Invariant (mathematics) ,business ,[STAT.ME]Statistics [stat]/Methodology [stat.ME] ,ComputingMilieux_MISCELLANEOUS - Abstract
A procedure, called α-procedure, for the efficient automatic classification of multivariate data is described. It is based on a geometric representation of two learning classes in a proper multi-dimensional rectifying feature space and the stepwise construction of a separating hyperplane in that space. The dimension of the space, i.e. the number of features that is necessary for a successful classification, is determined step by step using two-dimensional reperes (linear subspaces). In each step a repere and a feature are constructed in a way that they yield maximum discriminating power. Throughout the procedure the invariant, which is the object’s affiliation with a class, is preserved.
- Published
- 2012
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.