37 results on '"Arabie, Ph."'
Search Results
2. A New Efficient Method for Assessing Missing Nucleotides in DNA Sequences in the Framework of a Generic Evolutionary Model.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The problem of phylogenetic inference from datasets including incomplete characters is among the most relevant issues in systematic biology. In this paper, we propose a new probabilistic method for estimating unknown nucleotides before computing evolutionary distances. It is developed in the framework of the Tamura-Nei evolutionary model (Tamura and Nei (1993)). The proposed strategy is compared, through simulations, to existing methods "Ignoring Missing Sites" (IMS) and "Proportional Distribution of Missing and Ambiguous Bases" (PDMAB) included in the PAUP package (Swofford (2001)). [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
3. Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The gene expression data is characterized by its considerably great amount of features in comparison to the number of observations. The direct use of traditional statistics techniques of supervised classification can give poor results in gene expression data. Therefore, dimension reduction is recommendable prior to the application of a classifier. In this work, we propose a method that combines two types of dimension reduction techniques: feature selection and feature extraction. First, one of the following feature selection procedures: a univariate ranking based on the Kruskal-Wallis statistic test, the Relief, and recursive feature elimination (RFE) is applied on the dataset. After that, principal components are formed with the selected features. Experiments carried out on eight gene expression datasets using three classifiers: logistic regression, k-nn and rpart, gave good results for the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
4. Local Models in Register Classification by Timbre.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Investigating a data set containing different sounds of several instruments suggests that local modelling may be a promising approach to take into account different timbre characteristics of different instruments. For this reason, some basic ideas towards such a local modelling are realized in this paper yielding a framework for further studies. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
5. Using MCMC as a Stochastic Optimization Procedure for Musical Time Series.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Based on a model of Davy and Godsill (2002) we describe a general model for time series from monophonic musical sound to estimate the pitch. The model is a hierarchical Bayes Model which will be estimated with MCMC methods. All the parameters and their prior distributions are motivated individually. For parameter estimation an MCMC based stochastic optimization is introduced. In a simulation study it will be looked for the best implementation of the optimization procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
6. Evaluating Different Approaches to Measuring the Similarity of Melodies.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper describes an empirical approach to evaluating similarity measures for the comparision of two note sequences or melodies. In the first sections the experimental approach and the empirical results of previous studies on melodic similarity are reported. In the discussion section several questions are raised that concern the nature of similarity or distance measures for melodies and musical material in general. The approach taken here is based on an empirical comparision of a variety of similarity measures with experimentally gathered rating data from human music experts. An optimal measure is constructed on the basis of a linear model. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
7. Generalized N-gram Measures for Melodic Similarity.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
In this paper we propose three generalizations of well-known N-gram approaches for measuring similarity of single-line melodies. In a former paper we compared around 50 similarity measures for melodies with empirical data from music psychological experiments. Similarity measures based on edit distances and N-grams always showed the best results for different contexts. This paper aims at a generalization of N-gram measures that can combine N-gram and other similarity measures in a fairly general way. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
8. Patterns of Associations in Finite Sets of Items.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Mining association rules is well established in quantitative business research literature and makes up an up-and-coming topic in marketing practice. However, reducing the analysis to the assessment and interpretation of a few selected rules does not provide a complete picture of the data structure revealed by the rules. This paper introduces a new approach of visualizing relations between items by assigning them to a rectangular grid with respect to their mutual association. The visualization task leads to a quadratic assignment problem and is tackled by means of a genetic algorithm. The methodology is demonstrated by evaluating a set of rules describing marketing practices in Russia. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
9. Empirical Analysis of Attribute-Aware Recommendation Algorithms with Variable Synthetic Data.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Recommender Systems (RS) have helped achieving success in E-commerce. Delving better RS algorithms has been an ongoing research. However, it has always been difficult to find adequate datasets to help evaluating RS algorithms. Public data suitable for such kind of evaluation is limited, especially for data containing content information (attributes). Previous researches have shown that the performance of RS rely on the characteristics and quality of datasets. Although, a few others have conducted studies on synthetically generated data to mimic the user-product datasets, datasets containing attributes information are rarely investigated. In this paper, we review synthetic datasets used in RS and present our synthetic data generator that considers attributes. Moreover, we conduct empirical evaluations on existing hybrid recommendation algorithms and other state-of-the-art algorithms using these synthetic data and observe the sensitivity of the algorithms when varying qualities of attribute data are applied to the them. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
10. Mining Association Rules in Folksonomies.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Social bookmark tools are rapidly emerging on the Web. In such systems users are setting up lightweight conceptual structures called folksonomies. These systems provide currently relatively few structure. We discuss in this paper, how association rule mining can be adopted to analyze and structure folksonomies, and how the results can be used for ontology learning and supporting emergent semantics. We demonstrate our approach on a large scale dataset stemming from an online system. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
11. kNN Versus SVM in the Collaborative Filtering Framework.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
We present experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While k-Nearest Neighbor is usually used for the collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the quality of the data. Furthermore, we can see that kNN is dominant over SVM on the two standard datasets. On the real-life corporate dataset with high level of sparsity, kNN fails as it is unable to form reliable neighborhoods. In this case SVM outperforms kNN. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
12. Comparison of Two Methods for Detecting and Correcting Systematic Error in High-throughput Screening Data.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
High-throughput screening (HTS) is an efficient technological tool for drug discovery in the modern pharmaceutical industry. It consists of testing thousands of chemical compounds per day to select active ones. This process has many drawbacks that may result in missing a potential drug candidate or in selecting inactive compounds. We describe and compare two statistical methods for correcting systematic errors that may occur during HTS experiments. Namely, the collected HTS measurements and the hit selection procedure are corrected. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
13. Revised Boxplot Based Discretization as the Kernel of Automatic Interpretation of Classes Using Numerical Variables.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
In this paper the impact of improving Boxplot based discretization (BbD) on the methodology of Boxplot based induction rules (BbIR), oriented to the automatic generation of conceptual descriptions of classifications that can support later decision-making is presented. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
14. Sub-species of Homopus Areolatus? Biplots and Small Class Inference with Analysis of Distance.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
A canonical variance analysis (CVA) biplot can visually portray a oneway MANOVA. Both techniques are subject to the assumption of equal class covariance matrices. In the application considered, very small sample sizes resulted in some singular class covariance matrix estimates and furthermore it seemed unlikely that the assumption of homogeneity of covariance matrices would hold. Analysis of distance (AOD) is employed as nonparametric inference tool. In particular, AOD biplots are introduced for a visual display of samples and variables, analogous to the CVA biplot. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
15. Iterated Boosting for Outlier Detection.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
A procedure for detecting outliers in regression problems based on information provided by boosting trees is proposed. Boosting is meant for dealing with observations that are hard to predict, by giving them extra weights. In the present paper, such observations are considered to be possible outliers, and a procedure is proposed that uses the boosting results to diagnose which observations could be outliers. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate boosting after removing it. A lot of well-known bench data sets are considered and a comparative study against two classical competitors allows to show the value of the method. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
16. A Dynamic Clustering Method for Mixed Feature-Type Symbolic Data.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
A dynamic clustering method for mixed feature-type symbolic data is presented. The proposed method needs a previous pre-processing step to transform Boolean symbolic data into modal symbolic data. The presented dynamic clustering method has then as input a set of vectors of modal symbolic data and furnishes a partition and a prototype to each class by optimizing an adequacy criterion based on a suitable squared Euclidean distance. To show the usefulness of this method, examples with symbolic data sets are considered. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
17. Symbolic Clustering of Large Datasets.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
We present an approach to cluster large datasets that integrates the Kohonen Self Organizing Maps (SOM) with a dynamic clustering algorithm of symbolic data (SCLUST). A preliminary data reduction using SOM algorithm is performed. As a result, the individual measurements are replaced by micro-clusters. These micro-clusters are then grouped in a few clusters which are modeled by symbolic objects. By computing the extension of these symbolic objects, symbolic clustering algorithm allows discovering the natural classes. An application on a real data set shows the usefulness of this methodology. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
18. A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Symbolic Data Analysis (SDA) aims to to describe and analyze complex and structured data extracted, for example, from large databases. Such data, which can be expressed as concepts, are modeled by symbolic objects described by multivalued variables. In the present paper we present a new distance, based on the Wasserstein metric, in order to cluster a set of data described by distributions with finite continue support, or, as called in SDA, by "histograms". The proposed distance permits us to define a measure of inertia of data with respect to a barycenter that satisfies the Huygens theorem of decomposition of inertia. We propose to use this measure for an agglomerative hierarchical clustering of histogram data based on the Ward criterion. An application to real data validates the procedure. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
19. Dependence and Interdependence Analysis for Interval-Valued Variables.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Data analysis is often affected by different types of errors as: measurement errors, computation errors, imprecision related to the method adopted for estimating the data. The methods which have been proposed for treating errors in the data, may also be applied to different kinds of data that in real life are of interval type. The uncertainty in the data, which is strictly connected to the above errors, may be treated by considering, rather than a single value for each data, the interval of values in which it may fall: the interval data. The purpose of the present paper is to introduce methods for analyzing the interdependence and dependence among interval-valued variables. Statistical units described by interval-valued variables can be assumed as a special case of Symbolic Object (SO). In Symbolic Data Analysis (SDA), these data are represented as boxes. Accordingly, the purpose of the present work is the extension of the Principal Component Analysis to obtain a visualization of such boxes, on a lower dimensional space. Furthermore, a new method for fitting an interval simple linear regression equation is developed. With difference to other approaches proposed in the literature that work on scalar recoding of the intervals using classical tools of analysis, we make extensively use of the interval algebra tools combined with some optimization techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
20. Multidimensional Scaling of Histogram Dissimilarities.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Multidimensional scaling aims at reconstructing dissimilarities between pairs of objects by distances in a low dimensional space. However, in some cases the dissimilarity itself is unknown, but the range, or a histogram of the dissimilarities is given. This type of data fall in the wider class of symbolic data (see Bock and Diday (2000)). We model a histogram of dissimilarities by a histogram of the distances defined as the minimum and maximum distance between two sets of embedded rectangles representing the objects. In this paper, we provide a new algorithm called Hist-Scal using iterative majorization, that is based on an algorithm, I-Scal developed for the case where the dissimilarities are given by a range of values ie an interval (see Groenen et al. (in press)). The advantage of iterative majorization is that each iteration is guaranteed to improve the solution until no improvement is possible. We present the results on an empirical data set on synthetic musical tones. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
21. Identifying and Classifying Social Groups: A Machine Learning Approach.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The identification of social groups remains one of the main analytical themes in the analysis of social networks and, in more general terms, in the study of social organization. Traditional network approaches to group identification encounter a variety of problems when the data to be analyzed involve two-mode networks, i.e., relations between two distinct sets of objects with no reflexive relation allowed within each set. In this paper we propose a relatively novel approach to the recognition and identification of social groups in data generated by network-based processes in the context of two-mode networks. Our approach is based on a family of learning algorithms called Support Vector Machines (SVM). The analytical framework provided by SVM provides a flexible statistical environment to solve classification tasks, and to reframe regression and density estimation problems. We explore the relative merits of our approach to the analysis of social networks in the context of the well known "Southern women" (SW) data set collected by Davis Gardner and Gardner. We compare our results with those that have been produced by different analytical approaches. We show that our method, which acts as a data-independent preprocessing step, is able to reduce the complexity of the clustering problem enabling the application of simpler configurations of common algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
22. Analyzing the Structure of U.S. Patents Network.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The U.S. patents network is a network of almost 3.8 millions patents (network vertices) from the year 1963 to 1999 (Hall et al. (2001)) and more than 16.5 millions citations (network arcs). It is an example of a very large citation network. We analyzed the U.S. patents network with the tools of network analysis in order to get insight into the structure of the network as an initial step to the study of innovations and technical changes based on patents citation network data. In our approach the SPC (Search Path Count) weights, proposed by Hummon and Doreian (1989), for vertices and arcs are calculated first. Based on these weights vertex and line islands (Batagelj and Zaveršnik (2004)) are determined to identify the main themes of U.S. patents network. All analyses were done with Pajek — a program for analysis and visualization of large networks. As a result of the analysis the obtained main U.S. patents topics are presented. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
23. Spectral Clustering and Multidimensional Scaling: A Unified View.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Spectral clustering is a procedure aimed at partitionning a weighted graph into minimally interacting components. The resulting eigen-structure is determined by a reversible Markov chain, or equivalently by a symmetric transition matrix F. On the other hand, multidimensional scaling procedures (and factorial correspondence analysis in particular) consist in the spectral decomposition of a kernel matrix K. This paper shows how F and K can be related to each other through a linear or even non-linear transformation leaving the eigen-vectors invariant. As illustrated by examples, this circumstance permits to define a transition matrix from a similarity matrix between n objects, to define Euclidean distances between the vertices of a weighted graph, and to elucidate the "flow-induced" nature of spatial auto-covariances. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
24. Some Open Problem Sets for Generalized Blockmodeling.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper provides an introduction to the blockmodeling problem of how to cluster networks, based solely on the structural information contained in the relational ties, and a brief overview of generalized blockmodeling as an approach for solving this problem. Following a formal statement of the core of generalized blockmodeling, a listing of the advantages of adopting this approach to partitioning networks is provided. These advantages, together with some of the disadvantages of this approach, in its current state, form the basis for proposing some open problem sets for generalized blockmodeling. Providing solutions to these problem sets will transform generalized blockmodeling into an even more powerful approach for clustering networks of relations. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
25. Comparing Optimal Individual and Collective Assessment Procedures.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper focuses on the comparison between the optimal cutoff points set on single and multiple tests in predictor-based assessment, that is, assessing applicants as either suitable or unsuitable for a job. Our main result specifies the condition that determines the number of predictor tests, the collective assessment rule (aggregation procedure of predictor tests' recommendations) and the function relating the tests' assessment skills to the predictor cutoff points. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
26. Finding Meaningful and Stable Clusters Using Local Cluster Analysis.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Let us consider the problem of finding clusters in a heterogeneous, high-dimensional setting. Usually a (global) cluster analysis model is applied to reach this aim. As a result, often ten or more clusters are detected in a heterogeneous data set. The idea of this paper is to perform subsequent local cluster analyses. Here the following two main questions arise. Is it possible to improve the stability of some of the clusters? Are there new clusters that are not yet detected by global clustering? The paper presents a methodology for such an iterative clustering that can be a useful tool in discovering stable and meaningful clusters. The proposed methodology is used successfully in the field of archaeometry. Here, without loss of generality, it is applied to hierarchical cluster analysis. The improvements of local cluster analysis will be illustrated by means of multivariate graphics. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
27. Model Selection for the Binary Latent Class Model: A Monte Carlo Simulation.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper addresses model selection using information criteria for binary latent class (LC) models. A Monte Carlo study sets an experimental design to compare the performance of different information criteria for this model, some compared for the first time. Furthermore, the level of separation of latent classes is controlled using a new procedure. The results show that AIC3 (Akaike information criterion with 3 as penalizing factor) has a balanced performance for binary LC models. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
28. Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the k-means Clustering Methods.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
DIVCLUS-T is a descendant hierarchical clustering method based on the same monothetic approach than classification and regression trees but from an unsupervised point of view. The aim is not to predict a continuous variable (regression) or a categorical variable (classification) but to construct a hierarchy. The dendrogram of the hierarchy is easy to interpret and can be read as decision tree. An example of this new type of dendrogram is given on a small categorical dataset. DIVCLUS-T is then compared empirically with two polythetic clustering methods: the Ward ascendant hierarchical clustering method and the k-means partitional method. The three algorithms are applied and compared on six databases of the UCI Machine Learning repository. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
29. Crisp Partitions Induced by a Fuzzy Set.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Relationship between fuzzy sets and crisp partitions defined on the same finite set of objects X is studied. Granular structure of a fuzzy set is described by rough fuzzy sets and the quality of approximation of a fuzzy set by a crisp partition is evaluated. Measure of rough dissimilarity between clusters from a crisp partition of X with respect to a fuzzy set A defined on X is introduced. Properties of this measure are explored and some applications are provided. Classification of membership grades of A into linguistic categories is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
30. Evaluation of Allocation Rules Under Some Cost Constraints.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Allocation of individuals or objects to labels or classes is a central problem in statistics, particularly in supervised classification methods such as Linear and Quadratic Discriminant analysis, Logistic Discrimination, Neural Networks, Support Vector Machines, and so on. Misallocations occur when allocation class and origin class differ. These errors could result from different situations such as quality of data, definition of the explained categorical variable or choice of the learning sample. Generally, the cost is not uniform depending on the type of error and consequently the use only of the percentage of correctly classified objects is not enough informative. In this paper we deal with the evaluation of allocation rules taking into account the error cost. We use a statistical index which generalizes the percentage of correctly classified objects. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
31. Hierarchical Clustering for Boxplot Variables.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
Boxplots are well-known exploratory charts used to extract meaningful information from batches of data at a glance. Their strength lies in their ability to summarize data retaining the key information, which also is a desirable property of symbolic variables. In this paper, boxplots are presented as a new kind of symbolic variable. In addition, two different approaches to measure distances between boxplot variables are proposed. The usefulness of these distances is illustrated by means of a hierarchical clustering of boxplot data. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
32. Properties and Performance of Shape Similarity Measures.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper gives an overview of shape dissimilarity measure properties, such as metric and robustness properties, and of retrieval performance measures. Fifteen shape similarity measures are shortly described and compared. Their retrieval results on the MPEG-7 Core Experiment CE-Shape-1 test set as reported in the literature and obtained by a reimplementation are compared and discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
33. Dissimilarities for Web Usage Mining.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The obtention of a set of homogeneous classes of pages according to the browsing patterns identified in web server log files can be very useful for the analysis of organization of the site and of its adequacy to user needs. Such a set of homogeneous classes is often obtained from a dissimilarity measure between the visited pages defined via the visits extracted from the logs. There are however many possibilities for defined such a measure. This paper presents an analysis of different dissimilarity measures based on the comparison between the semantic structure of the site identified by experts and the clustering constructed with standard algorithms applied to the dissimilarity matrices generated by the chosen measures. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
34. Design of Dissimilarity Measures: A New Dissimilarity Between Species Distribution Areas.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
We give some guidelines for the choice and design of dissimilarity measures and illustrate some of them by the construction of a new dissimilarity measure between species distribution areas in biogeography. Species distribution data can be digitized as presences and absences in certain geographic units. As opposed to all measures already present in the literature, the geco coefficient introduced in the present paper takes the geographic distance between the units into account. The advantages of the new measure are illustrated by a study of the sensitivity against incomplete sampling and changes in the definition of the geographic units in two real data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
35. Comparison of Distance Indices Between Partitions.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
In this paper, we compare five classical distance indices on Pn, the set of partitions on n elements. First, we recall the definition of the transfer distance between partitions and an algorithm to evaluate it. Then, we build sets Pk(P) of partitions at k transfers from an initial partition P. Finally, we compare the distributions of the five index values between P and the elements of Pk(P). [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
36. Improved Fréchet Distance for Time Series.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
This paper focuses on the Fréchet distance introduced by Maurice Fréchet in 1906 to account for the proximity between curves (Fréchet (1906)). The major limitation of this proximity measure is that it is based on the closeness of the values independently of the local trends. To alleviate this set back, we propose a dissimilarity index extending the above estimates to include the information of dependency between local trends. A synthetic dataset is generated to reproduce and show the limited conditions for the Fréchet distance. The proposed dissimilarity index is then compared with the Fréchet estimate and results illustrating its efficiency are reported. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
37. A Tree-Based Similarity for Evaluating Concept Proximities in an Ontology.
- Author
-
Bock, H. -H., Gaul, W., Vichi, M., Arabie, Ph., Baier, D., Critchley, F., Decker, R., Diday, E., Greenacre, M., Lauro, C., Meulman, J., Monari, P., Nishisato, S., Ohsumi, N., Opitz, O., Ritter, G., Schader, M., Weihs, C., Batagelj, Vladimir, and Bock, Hans-Hermann
- Abstract
The problem of evaluating semantic similarity in a network structure knows a noticeable renewal of interest linked to the importance of the ontologies in the semantic Web. Different semantic measures have been proposed in the literature to evaluate the strength of the semantic link between two concepts or two groups of concepts within either two different ontologies or the same ontology. This paper presents a theoretical study synthesis of some semantic measures based on an ontology restricted to subsumption links. We outline some limitations of these measures and introduce a new one: the Proportion of Shared Specificity. This measure which does not depend on an external corpus, takes into account the density of links in the graph between two concepts. A numerical comparison of the different measures has been made on different large size samples from WordNet. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.