Author: "Opitz P" / Journal: selected contributions in data analysis & classification - Searchworks@Jio Institute Digital Library Search Results

1. PCR and PLS for Clusterwise Regression on Functional Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Clusterwise regression is applied to functional data, using PCR and PLS as regularization methods for the functional linear regression model. We compare these two approaches on simulated data as well as on stock-exchange data. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

2. Dynamic Features Extraction in Soybean Futures Market of China.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: By applying Symbolic Data Analysis (SDA), this paper investigates the dynamic features of soybean futures market of Dalian Commodity Exchange (DCE) of China during 2002 to 2004. First, interval data is created by classifying mass futures contracts by different years and different maturity dates; and then DIV clustering method is applied on these interval data which produces further simplified three-way interval symbolic data and greatly reduces the sample size. Based on that, factor analysis of interval data is adopted to extract dynamic principal characteristics of soybean futures, which reduces the dimension of the variable space. The results of the empirical research, which are rightly coincident with the realities, verify the application value of SDA in analyzing mass, dynamic and complex data. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

3. About Relational Correlations.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Using particular euclidean geometries called relational, one can go deeper into the usual concepts as well as the Data Analysis methods and even generalizes or proposes new ones. Inner products in these particular euclidean spaces are built using correlations between principal components of observed sets of variables. A summary of the main topics on an essay in process is proposed. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

4. Which Bootstrap for Principal Axes Methods?

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper deals with validation techniques in the context of exploratory techniques involving singular values decomposition, namely: Principal Components Analysis, Simple and Multiple Correspondence Analysis. We briefly show that, according to the purpose of the analysis, at least five types of resampling techniques could be carried out to assess the quality of the obtained visualisations: a) Partial bootstrap, that considers the replications as supplementary data, without diagonalization of the replicated moment-product matrices. b) Total bootstrap type 1, that performs a new diagonalization for each replicate, with corrections limited to possible changes of signs of the axes. c) Total bootstrap type 2, which adds to the preceding one a correction for the possible exchanges of axes. d) Total bootstrap type 3, that implies Procrustean transformations of all the replicates striving to take into account both rotations and exchanges of axes. e) Specific bootstrap, implying a resampling at a different level (case of a hierarchy of statistical units). An example is presented for each type of resampling. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

5. Prediction with Confidence.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The paper outlines an efficient way to complement predictions, produced by new and traditional machine-learning methods, with measures of their accuracy and reliability. These measures are not only valid and informative, but they also take full account of the special features of the object to be predicted. They are based on computable approximations of Kolmogorov's algorithmic notion of randomness. In using these measures it becomes possible to control the number of erroneous predictions by selecting a suitable confidence level. Further development of these ideas can lead to establishing useful links with the Diday's Symbolic Data Analysis. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

6. Divided Switzerland.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: On the 6th of December, 1992, the Swiss population voted against the "Adhesion of Switzerland to the European Economic Area". Swiss German cantons, except Basel-Stadt and Basel-Land, voted against, and all French speaking cantons voted in favour of adhesion. Shocked by this outcome, the media, the politicians, and the population itself took this date as the beginning of the divided Switzerland. The purpose of this article is to show that what happened on that day was not a new phenomenon but was in line with more than a century of votations. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

7. A New Method for Ranking n Statistical Units.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In many research problems it is useful to summarize some indices or indicators to express a synthetic, indirect measure of a concept which is revealed by p variables observed in each statistical unit. This is because the p variables are considered to be indirect measures of a complex (perhaps indefinable) concept. Within this context and for ranking the n statistical units the author suggests the index: $$ R_i = (\operatorname{sgn} {\mathbf{ }}c_i 1)(\sum\limits_r {c_{ir}^2 } )^{1/2} $$ where the cir (i = 1, 2,...,n; r = 1, 2,..., p) represent the values of the p principal components connected with the i-th statistical unit. This index is applied for ranking the 20 Italian Regions for quality of life for the years 2000-2002. The results are compared with those that are furnished by the single source method. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

8. Sanskrit Manuscript Comparison for Critical Edition and Classification.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: A critical edition takes into account all the different known versions of the same text in order to show the differences related to any two distinct versions. The construction of a critical edition is a long and, sometimes, tedious work. In order to make it easier, softwares helping the philologist are nowadays available for the European languages. Because of its complex graphical characteristics, which involve computationally expensive solutions to problems occurring in text comparisons, such softwares do not yet exist for Sanskrit language. This paper describes the Sanskrit characteristics that make text comparisons different, presents computationally feasible solutions for the elaboration of the computer assisted critical edition of Sanskrit texts, and provides, as a byproduct, a distance between two versions of the edited text. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

9. Locally Linear Regression and the Calibration Problem for Micro-Array Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: We review the concept of locally linear regression and its relationship to Diday's Nuées Dynamiques and to tree-structured linear regression. We describe the calibration problem in microarray analysis and propose a Bayesian approach based on tree-structured linear regression. Using the proposed approach, we analyze a subset of a large data set from an Affymetrix microarray calibration experiment. In this example, a tree-structured regression model outperforms a multiple regression model. We calculated 95% Credible Intervals for a sample of the data, obtaining reasonably good results. Future research will consider and compare several other approaches to locally linear regression. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

10. Indirect Blockmodeling of 3-Way Networks.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: An approach to the indirect blockmodeling of 3-way network data is presented for structural equivalence. This equivalence type is defined formally and expressed in terms of an interchangeability condition that is used to construct a compatible dissimilarity. Using Ward's method, the three dimensional partitioning is obtained via hierarchical clustering and represented diagrammatically. Artificial and real data are used to illustrate these methods. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

11. Beyond the Pyramids: Rigid Clustering Systems.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper is devoted to, more or less new extensions of the notion of pyramid introduced by Diday (1984, 1986) and Fichet (1984, 1986). It is related to the notion of "rigid clustering system" or "rigid hypergraph" (topics related to combinatorial theory). Pyramids are representations of clusterings systems whose classes are connected subgraphs of a path (or, in other words, intervals of some linear order). More generally, we shall consider clustering systems whose classes are connected components of some graph. After reviewing some classical results, we shall emphasize relations between rigidity and minimal spanning trees. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

12. Robinson Cubes.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: A square similarity matrix is called a Robinson matrix if the highest entries within each row and column are on the main diagonal and if, when moving away from this diagonal, the entries never increase. This paper formulates Robinson cubes as three-way generalizations of Robinson matrices. The first definition involves only those entries that are in a row, column or tube with an entry of the main diagonal. A stronger definition, called a regular Robinson cube, involves all entries. Several examples of the definitions are presented. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

13. Classification and Generalized Principal Component Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In previous papers, we propose a generalized principal component analysis (GPCA) aimed to display salient features of a multidimensional data set, in particular the existence of clusters. In the light of an example, this article evidences how GPCA and clustering methods are complementary. The projections provided by GPCA and the sequence of eigenvalues give useful indications on the number and the type of clusters to be expected; submitting GPCA principal components to a clustering algorithm instead of the raw data can improve the classification. The use of a convenient robustification of GPCA is also evoked. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

14. Relative and Absolute Contributions to Aid Strata Interpretation.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Strata generalisation by symbolic objects is presented when there is a class variable to be explained simultaneously in all strata. This is attained by a generalised recursive tree-building algorithm for populations partitioned into strata and described by symbolic data, that is, more complex data structures than classical data. Symbolic objects describe decisional nodes and strata. This paper presents some measures to interpret strata and nodes. The method is integrated into the SODAS Software (Symbolic Official Data Analysis System), partially supported by ESPRIT-20821 SODAS and IST-25161 ASSO. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

15. Density-Based Distances: a New Approach for Evaluating Proximities Between Objects. Applications in Clustering and Discriminant Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The aim of this paper is twofold. First it is shown that taking densities between objects into account to define proximities between them is intuitively a right way to process. Secondly, some new distances based on density estimates are defined and some properties are presented. Many algorithms in clustering or discriminant analysis require the choice of a dissimilarity: two applications are presented, one in clustering and the other in discriminant analysis, and illustrate the benefits of using these new distances. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

16. Adaptive Dissimilarity Index for Gene Expression Profiles Classification.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: DNA microarray technology allows to monitor simultaneously the expression levels of thousands of genes during important biological processes and across collections of related experiments. Clustering and classification techniques have proved to be helpful to understand gene function, gene regulation, and cellular processes. However the conventional proximity measures between genes expression data, used for clustering or classification purpose, do not fit gene expression specifications as they are based on the closeness of the expression magnitudes regardless of the overall gene expression profile (shape). We propose in this paper an adaptive dissimilarity index which would cover both values and behavior proximity. The effectiveness of the adaptive dissimilarity index is illustrated through a classification process for identification of genes cell cycle phases. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

17. Lower (Anti-)Robinson Rank Representations for Symmetric Proximity Matrices.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Edwin Diday, some two decades ago, was among the first few individuals to recognize the importance of the (anti-)Robinson form for representing a proximity matrix, and was the leader in suggesting how such matrices might be depicted graphically (as pyramids). We characterize the notions of an anti-Robinson (AR) and strongly anti-Robinson (SAR) matrix, and provide open-source M-files within a MATLAB environment to effect additive decompositions of a given proximity matrix into sums of AR (or SAR) matrices. We briefly introduce how the AR (or SAR) rank of a matrix might be specified. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

18. One-to-One Correspondence Between Indexed Cluster Structures and Weakly Indexed Closed Cluster Structures.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: We place ourselves in a setting where singletons are not all required to be clusters, and we show that the resulting cluster structures and their corresponding closure under finite nonempty intersections still have the same minimal members. Moreover, we show that indexed cluster structures and weakly indexed closed cluster structures correspond in a one-to-one way. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

19. A Note on Three-Way Dissimilarities and Their Relationship with Two-Way Dissimilarities.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This note is devoted to three-way dissimilarities defined on unordered triples. Some of them are derived from two-way dissimilarities via an Lp-transformation (1 ≤ p ≤ ∞). For p < ∞, a six-point condition of Menger type is established. Based on the definitions of Joly-Le Calvé and Heiser-Bennani Dosse, the concepts of three-way distances are also discussed. A particular attention is paid to three-way ultrametrics and three-way tree distances. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

20. On Lower-Maximal Paired-Ultrametrics.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The weakly indexed paired-hierarchies (shortly, p-hierarchies) provide a clustering model that allows overlapping clusters and extends the hierarchical model. There exists a bijection between weakly indexed p-hierarchies and the so-called paired-ultrametrics (shortly, p-ultrametrics), this correspondence being a restriction of the bijection between weakly indexed pyramids and Robinsonian dissimilarities. This paper proposes a generalization of the well-known HAC clustering method to compute a weakly indexed p-hierarchy from a given dissimilarity d. Moreover, the p-ultrametric associated to such a weakly indexed p-hierarchy is proved to be lower-maximal for d and larger than the sub-dominant ultrametric of d. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

21. Group Average Representations in Euclidean Distance Cones.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The set of Euclidean distance matrices has a well-known representation as a convex cone. The problems of representing the group averages of K distance matrices are discussed, but not fully resolved, in the context of SMACOF, Generalized Orthogonal Procrustes Analysis and Individual Differences Scaling. The polar (or dual) cone representation, corresponding to inner-products around a centroid, is also discussed. Some new characterisations of distance cones in terms of circumhyperspheres are presented. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

22. Clustering of Molecules: Influence of the Similarity Measures.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In this paper, we present the results of an experimental study to analyze the effect of various similarity (or distance) measures on the clustering quality of a set of molecules. We mainly focused on the clustering approaches able to directly deal with the 2D representation of the molecules (i.e., graphs). In such a context, we found that it seems relevant to use an approach based on asymmetrical measures of similarity. Our experiments are carried out on a dataset coming from the High Throughput Screening HTS domain. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

23. Induction Graphs for Data Mining.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Induction graphs, which are a generalization of decision trees, have a special place among the methods of Data Mining. Indeed, they generate lattice graphs instead of trees. They perform well, are capable of handling data in large volumes, are relatively easy for a non-specialist to interpret, and are applicable without restriction on data of any type (qualitative, quantitative). The explosion of softwares based on the paradigm of decision trees and more generally induction graphs is a rather strong evidence of their success. In this article, we present a complete method of induction graphs; the method SIPINA. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

24. Association Rules for Categorical and Tree Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The association rule mining problem is among the most popular data mining techniques. Association rules, whose significance is measured via quality indices, have been intensively studied for binary data. In this paper, we deal with association rules in the framework of categorical or tree-like-valued attributes. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

25. Mining Biological Data Using Pyramids.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper is a review of promising applications of pyramidal classification to biological data. We show that overlapping and ordering properties can give new insights that can not be achieved using more classical methods. We examplify our point using three applications: (i) a genome scale sequence analysis, (ii) a new progressive multiple sequence alignment method, (iii) a cluster analysis of transcriptomic data. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

26. Finding Rules in Data.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In the first year of my preparation for doctor thesis at INRIA in the group of Edwin, I worked on the construction of an inference engine and a knowledge base, by consulting various group members, for building an expert system guiding the data analysis package SICLA of the group. One day, Edwin asked me whether one can automatically generate rules for expert systems from data, and I started my new research direction. Since that time, my main work has been machine learning, especially finding rules in data. This paper briefly presents some learning methods we have developed. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

27. Mining Personal Banking Data to Detect Fraud.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Fraud detection in the retail banking sector poses some novel and challenging statistical problems. For example, the data sets are large, and yet each transaction must be examined and decisions must be made in real time, the transactions are often heterogeneous, differing substantially even within an individual account, and the data sets are typically very unbalanced, with only a tiny proportion of transactions belonging to the fraud class. We review the problem, its magnitude, and the various kinds of statistical tools have been developed for this application. The area is particularly unusual because the patterns to be detected change in response to the detection strategies which are developed: the very success of the statistical models leads to the need for new ones to be developed. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

28. Reduction of Redundant Rules in Statistical Implicative Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Quasi-implications, also called association rules in data mining, have become the major concept to represent implicative trends between itemset patterns. To make their interpretation easier, two problems have become crucial: filtering the most interestingness rules and structuring them to highlight their relationships. In this paper, we put ourselves in the Statistical Implicative Analysis framework, and we propose a new methodology for reducing rule sets by detecting redundant rules. We define two new measures based on the Shannon's entropy and the Gini's coefficient. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

29. Data Analysis and Operations Research.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Data Analysis and Operations Research are two overlapping sciences as there are, e.g., many data problems in which optimization techniques from Operations Research have to be applied to detect best fitting structures (under suitable constraints) in the underlying data. On the other hand, Operations Research is often based on model formulations for which some model parameters might be unknown or even unobservable. In such cases Operations Research problems consist of a data collection and analysis part and an optimization part in which solutions dependent on model parameters (derived from available information via Data Analysis techniques) are calculated. We give typical examples for research directions where Data Analysis and Operations Research overlap, start with the topic of pyramidal clustering as one of the fields of interest of Edwin Diday, and present methodology how selected problems can be tackled via a combination of techniques from both scientific areas. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

30. Unsupervised Learning Informational Limit in Case of Sparsely Described Examples.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper presents a model characterizing unsupervised learning from an information theoretic point of view. Under some hypothesis, it defines a theoretical quality criterion, which corresponds to the informational limit that bounds the learning ability of any clustering algorithm. This quality criterion depends on the information content of the learning set. It is relevant when examples are sparsely described, i.e. when most of the descriptors are missing. This theoretical limit of any unsupervised learning algorithm is then compared to the actual learning quality of different clustering algorithms (EM, COBWEB and PRESS). This empirical comparison is based on the use of artificial data sets, which are randomly degraded. Finally, the paper shows that the results of PRESS, an algorithm specifically designed to learn from sparsely described examples, are very closed to the theoretical upper bound quality. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

31. Knowledge Management in Environmental Sciences with $$ \mathcal{I}\mathcal{K}\mathcal{B}\mathcal{S}: $$ : Application to Systematics of Corals of the Mascarene Archipelago.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Systematics, the scientific discipline that deals with listing, describing, naming, classifying and identifying living organisms is a central point in environmental sciences. Expertise is becoming rare and for future biodiversity studies relying on species identification, environmental technicians will only be left with monographic descriptions and collections in museums. With the emergence of knowledge management, it is possible to enhance the use of systematician's expertise, by providing them with collaborative tools to widely manage, share and transmit their knowledge. Knowledge engineering in Systematics means to revise taxa and descriptions of specimens. We have designed an Iterative Knowledge Base System — $$ \mathcal{I}\mathcal{K}\mathcal{B}\mathcal{S}: $$ — for achieving these goals. It applies the scientific method in biology (conjecture and test) with a natural process of knowledge management. The product of such a tool is a collaborative knowledge base of a domain, that can evolve (by updating the knowledge) and be connected to distributed databases (bibliographic, photographic, geographic, taxonomic, etc.) that will yield information on species after the identification process of a new specimen. This paper presents an overview of the methodology, the methods (identification tree and case-based reasoning) and the validation process used to build knowledge bases in Systematics. An application on corals of the Mascarene Archipelago is given as a case study. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

32. Consensus of Star Tree Hypergraphs.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Popular methods for forming the consensus of several hypergraphs of a given type (e.g., hierarchies, weak hierarchies) place a cluster in the output if it appears sufficiently often among the input hypergraphs. The simplest type of tree hypergraph is one whose clusters are subtrees of a star. This paper considers the possibility of forming consensus by simply counting the frequency of occurances of clusters for star hypergraphs. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

33. Consensus from Frequent Groupings.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Let $$ \mathcal{D}^* = \left( {\mathcal{D}_1 \mathcal{D}_2 ...{\text{,}}{\mathbf{ }}\mathcal{D}_k } \right) $$ be a profile of classifications of a given set X. We aim to aggregate $$ \mathcal{D}^* $$ into a unique consensus classification $$ \mathcal{D} $$ Classifications considered here are sets of classes which are not included into each other. To any integer p comprised between 1 and k (both included), one makes correspond a frequent grouping consensus function Fp which returns the maximal subsets of X included in elements of at least p of the $$ \mathcal{D}_i 's $$ . We give some properties and three characterizations of such consensus rules. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

34. Average Consensus and Infinite Norm Consensus : Two Methods for Ultrametric Trees.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Bertrand, Patrice
Abstract: Consensus methods are widely used to combine hierarchies defined on a common set of n object. Many methods have been proposed during the last decade to combine hierarchies. One of these, the average consensus method, allows one to obtain a consensus solution that is representative of the initial profile of trees by minimizing the sum of the squared distances between this profile and the consensus solution. This problem is known to be NP-complete and one has to rely on heuristics to obtain a consensus result in such cases. As a consequence, the uniqueness and optimality of the solution is not guaranteed. The L∞-consensus that yields to a universal solution in a maximum of n2 steps is an alternative to the average consensus procedure. The two methods will be presented and compared on a numerical example. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

35. Symbolic Dynamics in Text: Application to Automated Construction of Concept Hierarchies.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Following a symbolic encoding of selected terms used in text, we determine symmetries that are furnished by local hierarchical structure. We develop this study so that hierarchical fragments can be used in a concept hierarchy, or ontology. By "letting the data speakrd in this way, we avoid the arbitrariness of approximately fitting a model to the data. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

36. Representation of Concept Description by Multivalued Taxonomic Preordonance Variables.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Mathematical representation of complex data knowledge is one of the most important problems in Classification and Data Mining. In this contribution we present an original and very general formalization of various types of knowledge. The specific data are endowed with biological descriptions of phlebotomine sandfly species. Relative to a descriptive categorical variable, subsets of categories values have to be distinguished. On the other hand, hierarchical dependencies between the descriptive variables, associated with the mother → daughter relation, have to be taken into account. Additionally, an ordinal similarity function on the modality set of each categorical variable. The knowledge description is formalized by means of a new type of descriptor that we call "Taxonomic preordonance variable with multiple choice". Probabilistic similarity index between concepts described by such variables can be built. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

37. Recent Advances in Conceptual Clustering: CLUSTER3.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Conceptual clustering is a form of unsupervised learning that seeks clusters in data that represent simple and understandable concepts, rather than groupings of entities with high intra-cluster and low inter-cluster similarity, as conventional clustering. Another difference from conventional clustering is that conceptual clustering produces not only clusters but also their generalized descriptions, and that the descriptions are used for cluster evaluation, interpretation, and classification of new, previously unseen entities. Basic methodology of conceptual clustering and program CLUSTER3 implementing recent advances are briefly described. One important novelty in CLUSTER3 is the ability to generate clusters according to the viewpoint from which clustering is to be performed. This is achieved through the view-relevant attribute subsetting (VAS) method. CLUSTER3's performance is illustrated by its application to clustering a database of automobile fatality accidents. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

38. Mining Description Logics Concepts with Relational Concept Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Symbolic objects were originally intended to bring both more structure in data and more intelligibility in final results to statistical data analysis. We present here a framework of similar motivation, i.e., combining a data analysis method, — the concept analysis (FCA) — with a knowledge description language inspired by description logic (DL) formalism. The focus is hence on proper handling of relations between individuals in the construction of formal concepts. We illustrate the relational concept analysis (RCA) framework which complements standard FCA with a dedicated data format, a set of scaling operators, an iterative process for lattice construction, and translations to and from a DL language. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

39. Concepts of a Discrete Random Variable.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: A formal concept is defined in the literature as a pair (extent, intent) with respect to a context which is usually empirical, as for example a sample of transactions. This is somewhat unsatisfying since concepts, though born from experiences, should not depend on them. In this paper we consider the above concepts as ‘empirical concepts' and we define the notion of concept, in a context-free framework, as a limit intent, by proving, applying the large number law, that : Given a random variable χ taking its value in a countable σ-semilattice, the random intents of empirical concepts, with respect to a sample of χ, converge almost everywhere to a fixed deterministic limit, called a concept, whose identification shows that it only depends on the distribution Pχ of χ. Moreover, the set of such concepts is the σ-semilattice generated by the support of χ and has even a structure of σ-lattice: the lattice concept of a random variable. We also compute the mean number of concepts and frequent itemsets for a hierarchical Bernoulli mixtures model. Last, we propose an algorithm to find out maximal frequent itemsets by using minimal winning coalitions of Pχ. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

40. Partitioning by Particle Swarm Optimization.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: We propose a clustering algorithm using particle swarm optimization (PSO) for partitioning a set of objects in K clusters, by defining a familiy of agents-partitions, each agent is defined by K centroids in a p-dimensional space; a centroid has an associated cluster, which is defined by the allocation of the objects to the nearest centroid. The agents move in the space according to PSO principles, that is, they move with random intensity in the direction of a vector called velocity, which results from the random sum of the best past position of this agent, the best overall agent, and the last direction. We compare the performance of the method with other heuristics also proposed by the authors, and with two classical methods. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

41. Hybrid k-Means: Combining Regression-Wise and Centroid-Based Criteria for QSAR.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper further extends the ‘kernel'-based approach to clustering proposed by E. Diday in early 70s. According to this approach, a cluster's centroid can be represented by parameters of any analytical model, such as linear regression equation, built over the cluster. We address the problem of producing regression-wise clusters to be separated in the input variable space by building a hybrid clustering criterion that combines the regression-wise clustering criterion with the conventional centroid-based one. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

42. Cluster Analysis Based on Posets.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: When dissimilarities are measured in a space other than the reals, it is argued that previous models for cluster analysis are not adequate. Possible new models will be explored. It is also shown that formal concept analysis may be viewed as a special case of a Boolean dissimilarity coefficient. A persistent underlying theme involves generalized notions of adjoints of order preserving mappings between posets. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

43. Block Bernoulli Parsimonious Clustering Models.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: When the data consists of a set of objects described by a set of binary variables, we have embedded the block clustering problem of binary table in the mixture approach. In using a Bernoulli model and adopting the classification maximum likelihood principle we perform an adapted version of the block CEM algorithm. In this paper, we propose different parsimonious models by imposing constraints on the Bernoulli parameter. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

44. Looking for High Density Zones in a Graph.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: The aim of this paper is to introduce new methods to build dense classes of vertices in a graph. These classes correspond to connected parts having a proportion of inner edges which is higher than the average on the whole graph. They are progressively built; a kernel of each class is first established, then they are extended to connected elements and finally to a partition. Several density fonctions are compared. A Monte-Carlo validation of this method is made from random graphs fulfilling some density conditions. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

45. Species Clustering via Classical and Interval Data Representation.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Consider a data table where n objects are described by p numerical variables and a qualitative variable with m categories. Interval data representation and interval data clustering methods are useful for clustering the m categories. We study in this paper a data set of fish contaminated with mercury. We will see how classical or interval data representation can be used for clustering the species of fish and not the fishes themselves. We will compare the results obtained with the two approaches (classical or interval) in the particular case of this application in Ecotoxicology. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

46. Overlapping Clustering in a Graph Using k-Means and Application to Protein Interactions Networks.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In this article, we design an overlapping clustering method in a graph in order to deal with a biological issue: the proteins annotation. Given an unweighted and undirected graph G, we search for subgraphs of G that are dense in edges. The method consists in three steps. First we determine some intial kernels of the classes by means of a local density function; then we improve these kernels using a k-means process; last the kernels are extended to overlapping classes. The method is tested on random graphs and finally applied to a protein interactions network. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

47. Clustering Methods: A History of k-Means Algorithms.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: This paper surveys some historical issues related to the well-known k-means algorithm in cluster analysis. It shows to which authors the different versions of this algorithm can be traced back, and which were the underlying applications. We sketch various generalizations (with references also to Diday's work) and thereby underline the usefulness of the k-means approach in data analysis. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

48. Dynamic Clustering of Histogram Data: Using the Right Metric.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: In this paper we present a review of some metrics to be proposed as allocation functions in the Dynamic Clustering Algorithm (DCA) when data are distribution or histograms of values. The choice of the most suitable distance plays a central role in the DCA because it is related to the criterion function that is optimized. Moreover, it has to be consistent with the prototype which represents the cluster. In such a way, for each proposed metric, we identify the corresponding prototype according to the minimization of the criterion function and then to the best fitting between the partition and the best representation of the clusters. Finally, we focus our attention on a Wassertein based distance showing its optimality in partitioning a set of histogram data with respect to a representation of the clusters by means of their barycenter expressed in terms of distributions. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

49. Symbolic Markov Chains.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Stochastic processes have, since a long time, large applications in quite different domains. The standard theory considers discrete or continuous state space. We consider here the concept of Stochastic Process associated to all the cases of symbolic variables: quantitative, categorical single and multiple, interval, modal. More particularly, we adapt the definition of Markov Chain and give the equivalent of the Chapman-Kolmogorov theorem in all cases. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

50. Quality Issues in Symbolic Data Analysis.

Author: Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Brito, Paula, and Cucumel, Guy
Abstract: Symbolic Data Analysis is an extension of Classical Data Analysis to more complex data types and tables through the application of certain conditions, where underlying concepts are vital for their further processing. Therefore, the assessment of the quality of Symbolic Data depends extensively on the quality of the collected classical data. However, even though various criteria and indicators have been established to assess quality in classsical statistics, the specificities of Symbolic Data construction challenge the efficacy of the classical quality assessment components. In this paper we initially refer to the quality dimensions that can be considered for the classical data and then emphasize on the extent that these can be applied to symbolic data, taking into account the peculiarities of symbolic approach. [ABSTRACT FROM AUTHOR]
Published: 2007
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Publication Type

Database

59 results on '"Opitz P"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources